A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation

ZHENG Senwei; KOU Jiaqing; ZHANG Weiwei

doi:10.21656/1000-0887.450167

Volume 46 Issue 1

Jan. 2025

Turn off MathJax

Article Contents

Article Navigation > Applied Mathematics and Mechanics > 2025 > 46(1): 40-54

ZHENG Senwei, KOU Jiaqing, ZHANG Weiwei. A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation[J]. Applied Mathematics and Mechanics, 2025, 46(1): 40-54. doi: 10.21656/1000-0887.450167

Citation:

PDF( 3548 KB)

A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation

doi: 10.21656/1000-0887.450167

ZHENG Senwei^1
,,
KOU Jiaqing^{1, 2, 3
,},
ZHANG Weiwei^{1, 2, 3
,
,}

1.
School of Aeronautics, Northwestern Polytechnical University, Xi'an 710072, P.R.China
2.
International Joint Institute of Artificial Intelligence on Fluid Mechanics, Northwestern Polytechnical University, Xi'an 710072, P.R.China
3.
National Key Laboratory of Aircraft Configuration Design, Xi'an 710072, P.R.China

Received Date: 2024-06-05
Rev Recd Date: 2024-07-10
Publish Date: 2025-01-01

Abstract

Abstract

Due to low computational power consumption and high efficiency, GPUs/TPUs/NPUs with single/half-precision computing units make the main computing mode for artificial intelligence, but they can't be directly applied to solve differential equations requiring high floating-point accuracy, nor can they directly replace double-precision units. With the combined advantages of single and double precisions, a mixed-precision solution scheme balancing efficiency and accuracy, was proposed for large sparse linear equations. The sparse GMRES-IR algorithm for large sparse matrices was developed. Firstly, the characteristics of matrix data distributions in fluid dynamics simulation problems were analyzed. With double precision for pre-processing and single precision for detailed iteration, the single precision calculation was applied to the main time-consuming part of the algorithm, to enhance computational efficiency. Solutions of 33 linear equation systems from open-source datasets validate the accuracy and efficiency of the proposed method. The results show that, on a single-core CPU, under the same accuracy requirements, the proposed mixed-precision algorithm can achieve an acceleration effect of up to 2.5 times, and the effect is more prominent for large-scale matrices.
- mixed-precision,
- computational fluid dynamics,
- linear equations,
- GMRES

(Contributed by KOU Jiaqing, M.AMM Youth Editorial Board & ZHANG Weiwei, M.AMM Editorial Board)

FullText(HTML)

References(27)

References

[1]	JIMÉNEZ J. Computing high-Reynolds-number turbulence: will simulations ever replace experiments?[J]. Journal of Turbulence, 2003, 4. DOI: 10.1088/1468-5248/4/1/022.
[2]	CHOQUETTE J, GANDHI W, GIROUX O, et al. NVIDIA A100 tensor core GPU: performance and innovation[J]. IEEE Micro, 2021, 41 (2): 29-35. doi: 10.1109/MM.2021.3061394
[3]	RAVIKUMAR A, SRIRAMAN H. A novel mixed precision distributed TPU GAN for accelerated learning curve[J]. Computer Systems Science and Engineering, 2023, 46 (1): 563-578. doi: 10.32604/csse.2023.034710
[4]	NOVITSKIY I M, KUTATELADZE A G. DU8ML: machine learning-augmented density functional theory nuclear magnetic resonance computations for high-throughput in silico solution structure validation and revision of complex alkaloids[J]. Journal of Organic Chemistry, 2022, 87 (7): 4818-4828. doi: 10.1021/acs.joc.2c00169
[5]	HAIDAR A, TOMOV S, DONGARRA J, et al. Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers[C]//SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas, TX, USA: IEEE, 2018: 603-613.
[6]	DU S, BHATTACHARYA C B, SEN S. Maximizing business returns to corporate social responsibility (CSR): the role of CSR communication[J]. International Journal of Management Reviews, 2010, 12 (1): 8-19. doi: 10.1111/j.1468-2370.2009.00276.x
[7]	DENG L, LI G, HAN S, et al. Model compression and hardware acceleration for neural networks: a comprehensive survey[J]. Proceedings of the IEEE, 2020, 108 (4): 485-532. doi: 10.1109/JPROC.2020.2976475
[8]	BAI Y, WANG Y X, LIBERTY E. ProxQuant: quantized neural networksvia proximal operators[J/OL]. 2018[2024-07-10]. https://arxiv.org/abs/1810.00861v3.
[9]	BUTTARI A, DONGARRA J, KURZAK J, et al. Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy[J]. ACM Transactions on Mathematical Software, 2008, 34 (4): 1-22. http://www.xueshufan.com/publication/2111593426
[10]	陈逸, 刘博生, 徐永祺, 等. 混合精度频域卷积神经网络FPGA加速器设计[J]. 计算机工程, 2023, 49 (12): 1-9. doi: 10.3778/j.issn.1002-8331.2210-0108 CHEN Yi, LIU Bosheng, XU Yongqi, et al. FPGA accelerator design for hybrid precision frequency domain convolutional neural network[J]. Computer Engineering, 2023, 49 (12): 1-9. (in Chinese) doi: 10.3778/j.issn.1002-8331.2210-0108
[11]	AMESTOY P R, DUFF I S, L'EXCELLENT J Y. Multifrontal parallel distributed symmetric and unsymmetric solvers[J]. Computer Methods in Applied Mechanics and Engineering, 2000, 184 (2/3/4): 501-520. http://pdfs.semanticscholar.org/2c70/86e4e8d476154d20b271898db23f6bb8a9a3.pdf
[12]	LI X S, DEMMEL J W. SuperLU_DIST: a scalable distributed-memory sparse direct solver for unsymmetric linear systems[J]. ACM Transactions on Mathematical Software, 2003, 29 (2): 110-140. doi: 10.1145/779359.779361
[13]	HOGG J D, SCOTT J A. A fast and robust mixed-precision solver for the solution of sparse symmetric linear systems[J]. ACM Transactions on Mathematical Software, 2010, 37 (2): 1-24. http://pdfs.semanticscholar.org/e001/343705203a8126a2a01310585458971a7158.pdf
[14]	CARSON E, HIGHAM N J. A new analysis of iterative refinement and its application to accurate solution of ill-conditioned sparse linear systems[J]. SIAM Journal on Scientific Computing, 2017, 39 (6): A2834-A2856. doi: 10.1137/17M1122918
[15]	HIGHAM N J, PRANESH S. Exploiting lower precision arithmetic in solving symmetric positive definite linear systems and least squares problems[J]. SIAM Journal on Scientific Computing, 2021, 43 (1): A258-A277. doi: 10.1137/19M1298263
[16]	LOE J A, GLUSA C A, YAMAZAKI I, et al. A study of mixed precision strategies for GMRES on GPUs[J/OL]. 2021[2024-07-10]. https://arxiv.org/abs/2109.01232v1.
[17]	AMESTOY P, BUTTARI A, HIGHAM N J, et al. Five-precision GMRES-based iterative refinement[J]. SIAM Journal on Matrix Analysis and Applications, 2024, 45 (1): 529-552. doi: 10.1137/23M1549079
[18]	HAIDAR A, BAYRAKTAR H, TOMOV S, et al. Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems[J]. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2020, 476 (2243): 20200110. doi: 10.1098/rspa.2020.0110
[19]	ZOUNON M, HIGHAM N J, LUCAS C, et al. Performance impact of precision reduction in sparse linear systems solvers[J]. PeerJ Computer Science, 2022, 8 : e778. doi: 10.7717/peerj-cs.778
[20]	GRATTON S, SIMON E, TITLEY-PELOQUIN D, et al. Exploiting variable precision in GMRES[EB/OL]. 2019[2024-07-10]. https://arxiv.org/abs/1907.10550v2.
[21]	GIRAUD L, HAIDAR A, WATSON L T. Mixed-precision preconditioners in parallel domain decomposition solvers[M]//Lecture Notes in Computational Science and Engineering. Berlin: Springer, 2008: 357-364.
[22]	GÖBEL F, GRVTZMACHER T, RIBIZEL T, et al. Mixed precision incomplete and factorized sparse approximate inverse preconditioning on GPUs[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2021: 550-564.
[23]	陈华, 史悦戎. 基于GPU的重启PGMRES并行算法研究[J]. 计算机工程与应用, 2014, 50 (7): 35-40. doi: 10.3778/j.issn.1002-8331.1308-0008 CHEN Hua, SHI Yuerong. Study on restarted PGMRES parallel algorithm with GPU[J]. Computer Engineering and Applications, 2014, 50 (7): 35-40. (in Chinese) doi: 10.3778/j.issn.1002-8331.1308-0008
[24]	冯选燕, 燕振国, 朱华君, 等. 非精确Newton方法中线性迭代收敛判据研究[J]. 空气动力学学报, 2023, 41 (12): 28-36. doi: 10.7638/kqdlxxb-2023.0001 FENG Xuanyan, YAN Zhenguo, ZHU Huajun, et al. Study on the convergence criterion of linear iteration in inexact Newton methods[J]. Acta Aerodynamica Sinica, 2023, 41 (12): 28-36. (in Chinese) doi: 10.7638/kqdlxxb-2023.0001
[25]	贡伊明, 刘战合, 刘溢浪, 等. 时间谱方法中的高效GMRES算法[J]. 航空学报, 2017, 38 (7): 120894. GONG Yiming, LIU Zhanhe, LIU Yilang, et al. Efficient GMRES algorithm in time spectral method[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38 (7): 120894. (in Chinese)
[26]	伍康, 吕毅斌, 石允龙, 等. 有界多连通区域数值保角变换的GMRES(m)法[J]. 应用数学和力学, 2022, 43 (9): 1026-1033. doi: 10.21656/1000-0887.420305 WU Kang, LÜ Yibin, SHI Yunlong, et al. The GMRES(m) method for numerical conformal mapping of bounded multi-connected domains[J]. Applied Mathematics and Mechanics, 2022, 43 (9): 1026-1033. (in Chinese) doi: 10.21656/1000-0887.420305
[27]	肖文可, 陈星玎. 求解PageRank问题的重启GMRES修正的多分裂迭代法[J]. 应用数学和力学, 2022, 43 (3): 330-340. doi: 10.21656/1000-0887.420210 XIAO Wenke, CHEN Xingding. A modified multi-splitting iterative method with the restarted GMRES to solve the PageRank problem[J]. Applied Mathematics and Mechanics, 2022, 43 (3): 330-340. (in Chinese) doi: 10.21656/1000-0887.420210