Volume 46 Issue 1
Jan.  2025
Turn off MathJax
Article Contents
ZHENG Senwei, KOU Jiaqing, ZHANG Weiwei. A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation[J]. Applied Mathematics and Mechanics, 2025, 46(1): 40-54. doi: 10.21656/1000-0887.450167
Citation: ZHENG Senwei, KOU Jiaqing, ZHANG Weiwei. A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation[J]. Applied Mathematics and Mechanics, 2025, 46(1): 40-54. doi: 10.21656/1000-0887.450167

A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation

doi: 10.21656/1000-0887.450167
  • Received Date: 2024-06-05
  • Rev Recd Date: 2024-07-10
  • Publish Date: 2025-01-01
  • Due to low computational power consumption and high efficiency, GPUs/TPUs/NPUs with single/half-precision computing units make the main computing mode for artificial intelligence, but they can't be directly applied to solve differential equations requiring high floating-point accuracy, nor can they directly replace double-precision units. With the combined advantages of single and double precisions, a mixed-precision solution scheme balancing efficiency and accuracy, was proposed for large sparse linear equations. The sparse GMRES-IR algorithm for large sparse matrices was developed. Firstly, the characteristics of matrix data distributions in fluid dynamics simulation problems were analyzed. With double precision for pre-processing and single precision for detailed iteration, the single precision calculation was applied to the main time-consuming part of the algorithm, to enhance computational efficiency. Solutions of 33 linear equation systems from open-source datasets validate the accuracy and efficiency of the proposed method. The results show that, on a single-core CPU, under the same accuracy requirements, the proposed mixed-precision algorithm can achieve an acceleration effect of up to 2.5 times, and the effect is more prominent for large-scale matrices.
  • (Contributed by KOU Jiaqing, M.AMM Youth Editorial Board & ZHANG Weiwei, M.AMM Editorial Board)
  • loading
  • [1]
    JIMÉNEZ J. Computing high-Reynolds-number turbulence: will simulations ever replace experiments?[J]. Journal of Turbulence, 2003, 4. DOI: 10.1088/1468-5248/4/1/022.
    [2]
    CHOQUETTE J, GANDHI W, GIROUX O, et al. NVIDIA A100 tensor core GPU: performance and innovation[J]. IEEE Micro, 2021, 41 (2): 29-35. doi: 10.1109/MM.2021.3061394
    [3]
    RAVIKUMAR A, SRIRAMAN H. A novel mixed precision distributed TPU GAN for accelerated learning curve[J]. Computer Systems Science and Engineering, 2023, 46 (1): 563-578. doi: 10.32604/csse.2023.034710
    [4]
    NOVITSKIY I M, KUTATELADZE A G. DU8ML: machine learning-augmented density functional theory nuclear magnetic resonance computations for high-throughput in silico solution structure validation and revision of complex alkaloids[J]. Journal of Organic Chemistry, 2022, 87 (7): 4818-4828. doi: 10.1021/acs.joc.2c00169
    [5]
    HAIDAR A, TOMOV S, DONGARRA J, et al. Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers[C]//SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas, TX, USA: IEEE, 2018: 603-613.
    [6]
    DU S, BHATTACHARYA C B, SEN S. Maximizing business returns to corporate social responsibility (CSR): the role of CSR communication[J]. International Journal of Management Reviews, 2010, 12 (1): 8-19. doi: 10.1111/j.1468-2370.2009.00276.x
    [7]
    DENG L, LI G, HAN S, et al. Model compression and hardware acceleration for neural networks: a comprehensive survey[J]. Proceedings of the IEEE, 2020, 108 (4): 485-532. doi: 10.1109/JPROC.2020.2976475
    [8]
    BAI Y, WANG Y X, LIBERTY E. ProxQuant: quantized neural networksvia proximal operators[J/OL]. 2018[2024-07-10]. https://arxiv.org/abs/1810.00861v3.
    [9]
    BUTTARI A, DONGARRA J, KURZAK J, et al. Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy[J]. ACM Transactions on Mathematical Software, 2008, 34 (4): 1-22. http://www.xueshufan.com/publication/2111593426
    [10]
    陈逸, 刘博生, 徐永祺, 等. 混合精度频域卷积神经网络FPGA加速器设计[J]. 计算机工程, 2023, 49 (12): 1-9. doi: 10.3778/j.issn.1002-8331.2210-0108

    CHEN Yi, LIU Bosheng, XU Yongqi, et al. FPGA accelerator design for hybrid precision frequency domain convolutional neural network[J]. Computer Engineering, 2023, 49 (12): 1-9. (in Chinese) doi: 10.3778/j.issn.1002-8331.2210-0108
    [11]
    AMESTOY P R, DUFF I S, L'EXCELLENT J Y. Multifrontal parallel distributed symmetric and unsymmetric solvers[J]. Computer Methods in Applied Mechanics and Engineering, 2000, 184 (2/3/4): 501-520. http://pdfs.semanticscholar.org/2c70/86e4e8d476154d20b271898db23f6bb8a9a3.pdf
    [12]
    LI X S, DEMMEL J W. SuperLU_DIST: a scalable distributed-memory sparse direct solver for unsymmetric linear systems[J]. ACM Transactions on Mathematical Software, 2003, 29 (2): 110-140. doi: 10.1145/779359.779361
    [13]
    HOGG J D, SCOTT J A. A fast and robust mixed-precision solver for the solution of sparse symmetric linear systems[J]. ACM Transactions on Mathematical Software, 2010, 37 (2): 1-24. http://pdfs.semanticscholar.org/e001/343705203a8126a2a01310585458971a7158.pdf
    [14]
    CARSON E, HIGHAM N J. A new analysis of iterative refinement and its application to accurate solution of ill-conditioned sparse linear systems[J]. SIAM Journal on Scientific Computing, 2017, 39 (6): A2834-A2856. doi: 10.1137/17M1122918
    [15]
    HIGHAM N J, PRANESH S. Exploiting lower precision arithmetic in solving symmetric positive definite linear systems and least squares problems[J]. SIAM Journal on Scientific Computing, 2021, 43 (1): A258-A277. doi: 10.1137/19M1298263
    [16]
    LOE J A, GLUSA C A, YAMAZAKI I, et al. A study of mixed precision strategies for GMRES on GPUs[J/OL]. 2021[2024-07-10]. https://arxiv.org/abs/2109.01232v1.
    [17]
    AMESTOY P, BUTTARI A, HIGHAM N J, et al. Five-precision GMRES-based iterative refinement[J]. SIAM Journal on Matrix Analysis and Applications, 2024, 45 (1): 529-552. doi: 10.1137/23M1549079
    [18]
    HAIDAR A, BAYRAKTAR H, TOMOV S, et al. Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems[J]. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2020, 476 (2243): 20200110. doi: 10.1098/rspa.2020.0110
    [19]
    ZOUNON M, HIGHAM N J, LUCAS C, et al. Performance impact of precision reduction in sparse linear systems solvers[J]. PeerJ Computer Science, 2022, 8 : e778. doi: 10.7717/peerj-cs.778
    [20]
    GRATTON S, SIMON E, TITLEY-PELOQUIN D, et al. Exploiting variable precision in GMRES[EB/OL]. 2019[2024-07-10]. https://arxiv.org/abs/1907.10550v2.
    [21]
    GIRAUD L, HAIDAR A, WATSON L T. Mixed-precision preconditioners in parallel domain decomposition solvers[M]//Lecture Notes in Computational Science and Engineering. Berlin: Springer, 2008: 357-364.
    [22]
    GÖBEL F, GRVTZMACHER T, RIBIZEL T, et al. Mixed precision incomplete and factorized sparse approximate inverse preconditioning on GPUs[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2021: 550-564.
    [23]
    陈华, 史悦戎. 基于GPU的重启PGMRES并行算法研究[J]. 计算机工程与应用, 2014, 50 (7): 35-40. doi: 10.3778/j.issn.1002-8331.1308-0008

    CHEN Hua, SHI Yuerong. Study on restarted PGMRES parallel algorithm with GPU[J]. Computer Engineering and Applications, 2014, 50 (7): 35-40. (in Chinese) doi: 10.3778/j.issn.1002-8331.1308-0008
    [24]
    冯选燕, 燕振国, 朱华君, 等. 非精确Newton方法中线性迭代收敛判据研究[J]. 空气动力学学报, 2023, 41 (12): 28-36. doi: 10.7638/kqdlxxb-2023.0001

    FENG Xuanyan, YAN Zhenguo, ZHU Huajun, et al. Study on the convergence criterion of linear iteration in inexact Newton methods[J]. Acta Aerodynamica Sinica, 2023, 41 (12): 28-36. (in Chinese) doi: 10.7638/kqdlxxb-2023.0001
    [25]
    贡伊明, 刘战合, 刘溢浪, 等. 时间谱方法中的高效GMRES算法[J]. 航空学报, 2017, 38 (7): 120894.

    GONG Yiming, LIU Zhanhe, LIU Yilang, et al. Efficient GMRES algorithm in time spectral method[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38 (7): 120894. (in Chinese)
    [26]
    伍康, 吕毅斌, 石允龙, 等. 有界多连通区域数值保角变换的GMRES(m)法[J]. 应用数学和力学, 2022, 43 (9): 1026-1033. doi: 10.21656/1000-0887.420305

    WU Kang, LÜ Yibin, SHI Yunlong, et al. The GMRES(m) method for numerical conformal mapping of bounded multi-connected domains[J]. Applied Mathematics and Mechanics, 2022, 43 (9): 1026-1033. (in Chinese) doi: 10.21656/1000-0887.420305
    [27]
    肖文可, 陈星玎. 求解PageRank问题的重启GMRES修正的多分裂迭代法[J]. 应用数学和力学, 2022, 43 (3): 330-340. doi: 10.21656/1000-0887.420210

    XIAO Wenke, CHEN Xingding. A modified multi-splitting iterative method with the restarted GMRES to solve the PageRank problem[J]. Applied Mathematics and Mechanics, 2022, 43 (3): 330-340. (in Chinese) doi: 10.21656/1000-0887.420210
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(10)  / Tables(6)

    Article Metrics

    Article views (31) PDF downloads(6) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return