Citation: | ZHENG Senwei, KOU Jiaqing, ZHANG Weiwei. A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation[J]. Applied Mathematics and Mechanics, 2025, 46(1): 40-54. doi: 10.21656/1000-0887.450167 |
[1] |
JIMÉNEZ J. Computing high-Reynolds-number turbulence: will simulations ever replace experiments?[J]. Journal of Turbulence, 2003, 4. DOI: 10.1088/1468-5248/4/1/022.
|
[2] |
CHOQUETTE J, GANDHI W, GIROUX O, et al. NVIDIA A100 tensor core GPU: performance and innovation[J]. IEEE Micro, 2021, 41 (2): 29-35. doi: 10.1109/MM.2021.3061394
|
[3] |
RAVIKUMAR A, SRIRAMAN H. A novel mixed precision distributed TPU GAN for accelerated learning curve[J]. Computer Systems Science and Engineering, 2023, 46 (1): 563-578. doi: 10.32604/csse.2023.034710
|
[4] |
NOVITSKIY I M, KUTATELADZE A G. DU8ML: machine learning-augmented density functional theory nuclear magnetic resonance computations for high-throughput in silico solution structure validation and revision of complex alkaloids[J]. Journal of Organic Chemistry, 2022, 87 (7): 4818-4828. doi: 10.1021/acs.joc.2c00169
|
[5] |
HAIDAR A, TOMOV S, DONGARRA J, et al. Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers[C]//SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas, TX, USA: IEEE, 2018: 603-613.
|
[6] |
DU S, BHATTACHARYA C B, SEN S. Maximizing business returns to corporate social responsibility (CSR): the role of CSR communication[J]. International Journal of Management Reviews, 2010, 12 (1): 8-19. doi: 10.1111/j.1468-2370.2009.00276.x
|
[7] |
DENG L, LI G, HAN S, et al. Model compression and hardware acceleration for neural networks: a comprehensive survey[J]. Proceedings of the IEEE, 2020, 108 (4): 485-532. doi: 10.1109/JPROC.2020.2976475
|
[8] |
BAI Y, WANG Y X, LIBERTY E. ProxQuant: quantized neural networksvia proximal operators[J/OL]. 2018[2024-07-10].
|
[9] |
BUTTARI A, DONGARRA J, KURZAK J, et al. Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy[J]. ACM Transactions on Mathematical Software, 2008, 34 (4): 1-22. http://www.xueshufan.com/publication/2111593426
|
[10] |
陈逸, 刘博生, 徐永祺, 等. 混合精度频域卷积神经网络FPGA加速器设计[J]. 计算机工程, 2023, 49 (12): 1-9. doi: 10.3778/j.issn.1002-8331.2210-0108
CHEN Yi, LIU Bosheng, XU Yongqi, et al. FPGA accelerator design for hybrid precision frequency domain convolutional neural network[J]. Computer Engineering, 2023, 49 (12): 1-9. (in Chinese) doi: 10.3778/j.issn.1002-8331.2210-0108
|
[11] |
AMESTOY P R, DUFF I S, L'EXCELLENT J Y. Multifrontal parallel distributed symmetric and unsymmetric solvers[J]. Computer Methods in Applied Mechanics and Engineering, 2000, 184 (2/3/4): 501-520. http://pdfs.semanticscholar.org/2c70/86e4e8d476154d20b271898db23f6bb8a9a3.pdf
|
[12] |
LI X S, DEMMEL J W. SuperLU_DIST: a scalable distributed-memory sparse direct solver for unsymmetric linear systems[J]. ACM Transactions on Mathematical Software, 2003, 29 (2): 110-140. doi: 10.1145/779359.779361
|
[13] |
HOGG J D, SCOTT J A. A fast and robust mixed-precision solver for the solution of sparse symmetric linear systems[J]. ACM Transactions on Mathematical Software, 2010, 37 (2): 1-24. http://pdfs.semanticscholar.org/e001/343705203a8126a2a01310585458971a7158.pdf
|
[14] |
CARSON E, HIGHAM N J. A new analysis of iterative refinement and its application to accurate solution of ill-conditioned sparse linear systems[J]. SIAM Journal on Scientific Computing, 2017, 39 (6): A2834-A2856. doi: 10.1137/17M1122918
|
[15] |
HIGHAM N J, PRANESH S. Exploiting lower precision arithmetic in solving symmetric positive definite linear systems and least squares problems[J]. SIAM Journal on Scientific Computing, 2021, 43 (1): A258-A277. doi: 10.1137/19M1298263
|
[16] |
LOE J A, GLUSA C A, YAMAZAKI I, et al. A study of mixed precision strategies for GMRES on GPUs[J/OL]. 2021[2024-07-10].
|
[17] |
AMESTOY P, BUTTARI A, HIGHAM N J, et al. Five-precision GMRES-based iterative refinement[J]. SIAM Journal on Matrix Analysis and Applications, 2024, 45 (1): 529-552. doi: 10.1137/23M1549079
|
[18] |
HAIDAR A, BAYRAKTAR H, TOMOV S, et al. Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems[J]. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2020, 476 (2243): 20200110. doi: 10.1098/rspa.2020.0110
|
[19] |
ZOUNON M, HIGHAM N J, LUCAS C, et al. Performance impact of precision reduction in sparse linear systems solvers[J]. PeerJ Computer Science, 2022, 8 : e778. doi: 10.7717/peerj-cs.778
|
[20] |
GRATTON S, SIMON E, TITLEY-PELOQUIN D, et al. Exploiting variable precision in GMRES[EB/OL]. 2019[2024-07-10].
|
[21] |
GIRAUD L, HAIDAR A, WATSON L T. Mixed-precision preconditioners in parallel domain decomposition solvers[M]//Lecture Notes in Computational Science and Engineering. Berlin: Springer, 2008: 357-364.
|
[22] |
GÖBEL F, GRVTZMACHER T, RIBIZEL T, et al. Mixed precision incomplete and factorized sparse approximate inverse preconditioning on GPUs[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2021: 550-564.
|
[23] |
陈华, 史悦戎. 基于GPU的重启PGMRES并行算法研究[J]. 计算机工程与应用, 2014, 50 (7): 35-40. doi: 10.3778/j.issn.1002-8331.1308-0008
CHEN Hua, SHI Yuerong. Study on restarted PGMRES parallel algorithm with GPU[J]. Computer Engineering and Applications, 2014, 50 (7): 35-40. (in Chinese) doi: 10.3778/j.issn.1002-8331.1308-0008
|
[24] |
冯选燕, 燕振国, 朱华君, 等. 非精确Newton方法中线性迭代收敛判据研究[J]. 空气动力学学报, 2023, 41 (12): 28-36. doi: 10.7638/kqdlxxb-2023.0001
FENG Xuanyan, YAN Zhenguo, ZHU Huajun, et al. Study on the convergence criterion of linear iteration in inexact Newton methods[J]. Acta Aerodynamica Sinica, 2023, 41 (12): 28-36. (in Chinese) doi: 10.7638/kqdlxxb-2023.0001
|
[25] |
贡伊明, 刘战合, 刘溢浪, 等. 时间谱方法中的高效GMRES算法[J]. 航空学报, 2017, 38 (7): 120894.
GONG Yiming, LIU Zhanhe, LIU Yilang, et al. Efficient GMRES algorithm in time spectral method[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38 (7): 120894. (in Chinese)
|
[26] |
伍康, 吕毅斌, 石允龙, 等. 有界多连通区域数值保角变换的GMRES(m)法[J]. 应用数学和力学, 2022, 43 (9): 1026-1033. doi: 10.21656/1000-0887.420305
WU Kang, LÜ Yibin, SHI Yunlong, et al. The GMRES(m) method for numerical conformal mapping of bounded multi-connected domains[J]. Applied Mathematics and Mechanics, 2022, 43 (9): 1026-1033. (in Chinese) doi: 10.21656/1000-0887.420305
|
[27] |
肖文可, 陈星玎. 求解PageRank问题的重启GMRES修正的多分裂迭代法[J]. 应用数学和力学, 2022, 43 (3): 330-340. doi: 10.21656/1000-0887.420210
XIAO Wenke, CHEN Xingding. A modified multi-splitting iterative method with the restarted GMRES to solve the PageRank problem[J]. Applied Mathematics and Mechanics, 2022, 43 (3): 330-340. (in Chinese) doi: 10.21656/1000-0887.420210
|