面向流体力学仿真的大型稀疏矩阵混合精度GMRES加速算法

郑森炜; 寇家庆; 张伟伟

doi:10.21656/1000-0887.450167

面向流体力学仿真的大型稀疏矩阵混合精度GMRES加速算法

doi: 10.21656/1000-0887.450167

郑森炜^1,,
寇家庆^{1, 2, 3,},
张伟伟^{1, 2, 3, ,}

1.
西北工业大学航空学院，西安 710072
2.
西北工业大学流体力学智能化国际联合研究所，西安 710072
3.
飞行器基础布局全国重点实验室，西安 710072

(我刊青年编委寇家庆、编委张伟伟来稿)

详细信息

作者简介:
郑森炜(2001—)，男，硕士生(E-mail: senweiz@mail.nwpu.edu.cn)

寇家庆(1993—)，男，教授，博士生导师(E-mail: jqkou@nwpu.edu.cn)

通讯作者:
张伟伟(1979—)，男，教授，博士生导师(通讯作者. E-mail: aeroelastic@nwpu.edu.cn)

中图分类号: O35
计量
- 文章访问数: 691
- HTML全文浏览量: 434
- PDF下载量: 72
- 被引次数: 0
出版历程
- 收稿日期: 2024-06-05
- 修回日期: 2024-07-10
- 刊出日期: 2025-01-01

A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation

ZHENG Senwei^1
,,
KOU Jiaqing^{1, 2, 3
,},
ZHANG Weiwei^{1, 2, 3
, ,}

1.
School of Aeronautics, Northwestern Polytechnical University, Xi'an 710072, P.R.China
2.
International Joint Institute of Artificial Intelligence on Fluid Mechanics, Northwestern Polytechnical University, Xi'an 710072, P.R.China
3.
National Key Laboratory of Aircraft Configuration Design, Xi'an 710072, P.R.China

(Contributed by KOU Jiaqing, M.AMM Youth Editorial Board & ZHANG Weiwei, M.AMM Editorial Board)

摘要

摘要: 由于计算能耗低、效率高，以单精度/半精度计算单元为主的GPU/TPU/NPU等算力已成为人工智能计算的主要模式，但无法直接应用于浮点精度需求高的微分方程求解，不能直接替代双精度算力. 通过结合单/双精度各自的优势，提出了兼顾效率和精度的大型稀疏线性方程组的混合精度求解格式. 发展了面向稀疏大型矩阵的GMRES细化迭代算法(sparse GMRES-IR). 首先分析了流体力学仿真问题中的矩阵数据分布特点，通过双精度做预处理，单精度细化迭代，使单精度计算应用于算法主要耗时部分，发挥了计算效率优势. 通过求解开源数据集提供的33个线性方程组验证了所提出方法的精度和效率. 结果表明，在单核CPU上，相同精度要求下，提出的单双混合精度算法可以实现最高2.5倍的加速效果，且在大规模矩阵下效果更突出.
- 混合精度 /
- 计算流体力学 /
- 线性方程组 /
- GMRES
Abstract: Due to low computational power consumption and high efficiency, GPUs/TPUs/NPUs with single/half-precision computing units make the main computing mode for artificial intelligence, but they can't be directly applied to solve differential equations requiring high floating-point accuracy, nor can they directly replace double-precision units. With the combined advantages of single and double precisions, a mixed-precision solution scheme balancing efficiency and accuracy, was proposed for large sparse linear equations. The sparse GMRES-IR algorithm for large sparse matrices was developed. Firstly, the characteristics of matrix data distributions in fluid dynamics simulation problems were analyzed. With double precision for pre-processing and single precision for detailed iteration, the single precision calculation was applied to the main time-consuming part of the algorithm, to enhance computational efficiency. Solutions of 33 linear equation systems from open-source datasets validate the accuracy and efficiency of the proposed method. The results show that, on a single-core CPU, under the same accuracy requirements, the proposed mixed-precision algorithm can achieve an acceleration effect of up to 2.5 times, and the effect is more prominent for large-scale matrices.
- mixed-precision /
- computational fluid dynamics /
- linear equations /
- GMRES
edited-by

edited-by
注释:

1) (我刊青年编委寇家庆、编委张伟伟来稿)

HTML全文

图 1 Sparse GMRES-IR流程图

Figure 1. The sparse GMRES-IR program flowchart

下载: 全尺寸图片幻灯片

图 2 稀疏矩阵非零元素分布

注为了解释图中的颜色，读者可以参考本文的电子网页版本，后同.

Figure 2. Distribution of non-zero elements in sparse matrices

下载: 全尺寸图片幻灯片

图 3 混合精度算法和双精度算法RMSE比较

Figure 3. Comparison of RMSE between mixed-precision and double-precision algorithms

下载: 全尺寸图片幻灯片

图 4 混合精度方法的加速效果

Figure 4. Acceleration ratios of mixed-precision algorithms

下载: 全尺寸图片幻灯片

图 5 算例求解时间

Figure 5. Computation time costs for test cases

下载: 全尺寸图片幻灯片

图 6 求解时间与加速比关系散点图

Figure 6. The scatter plot of the computation time vs. the acceleration ratio

下载: 全尺寸图片幻灯片

图 7 矩阵维度同求解时间关系图

Figure 7. The relationship between the matrix dimension and the solution time

下载: 全尺寸图片幻灯片

图 8 Pearson相关系数分析结果

Figure 8. Pearson correlation coefficient analysis results

下载: 全尺寸图片幻灯片

图 9 条件数、矩阵维度对加速比影响

Figure 9. The influences of the condition number and the matrix dimension on the acceleration ratio

下载: 全尺寸图片幻灯片

图 10 混合精度及双精度求解器的误差与条件数关系图

Figure 10. The error vs. the condition number for mixed-precision and double-precision solvers

下载: 全尺寸图片幻灯片

表 1 IEEE754双精度和单精度的有效位数和取值范围

Table 1. IEEE754 FP64/FP32 significant digit and value ranges

floating-point type	significant digit	maximum positive value	minimum positive normal
FP64	16	1.80E+308	4.94E-324
FP32	7	3.40E+38	1.40E-45

下载: 导出CSV

算法1 经典GMRES-IR算法
input: matrix A_n×n; vector b
output: solution vector of linear equations Ax = b
1. LU factors of A : A ≈ M = L_fU_f	lower precision
2. solve Mx₀= b, store x₀ as working precision	lower precision
3. store L_f, U_f as working precision L_w, U_w
4. for i=0, …, i_max－1 do
5. r_i=b － Ax_i	higher precision
6. solve AM^－1 Md_i+1= r_i by GMRES	higher precision
7. if ‖ r_i‖₂= ‖ b － Ax_i‖₂ < ε_tol then
8. x_i+1= x_i+ d_i+1	higher precision
9. else
10. break
11. end if
12. end for.

下载: 导出CSV

算法2 IJK格式的ILU分解
input: sparse matrix A_n×n；nonzero element position P
output: ILU factorization L
1. for i=2, …, n do
2. for k=1, …, i－1
3. if (i, k) in P
4. a_ik=a_ik/a_kk
5. for j=k+1, …, n
6. if (i, k) in P
7. a_ij=a_ij－a_ika_kj
8. end if
9. end for
10. end if
11. end for.

下载: 导出CSV

表 2 测试算例中使用的矩阵参数信息

Table 2. The distribution of all parameters of the matrices used in the test case

	matrix dimension	non-zero elements count	condition number	sparsity ratio/%	absolute value range
maximum	109 460	2 374 949	5.63E12	1.979 938	3.57E10
minimum	1 080	14 133	885.246 1	0.004 111	-1.74E-10

下载: 导出CSV

表 3 部分算例在预处理花费时间同总求解时间比较及精度比较

Table 3. Comparison of preprocessing time to total computation time and accuracy for selected test cases

name	ILU time /s	double precision time /s	mixed precision time /s	double precision RMSE	mixed precision RMSE
Poisson3Db	0.695	55.039	40.5	1.91E-11	4.59E-12
Venkat01	0.189	34.801	13.726	9.60E-14	3.54E-14
Raefsky1	0.092	2.526	1.757	6.91E-12	4.62E-12
Poisson3Da	0.09	7.855	4.426	1.85E-12	8.76E-13
Goodwin_013	0.013	1.133	0.772	3.20E-11	2.92E-13
shallow_water1	0.012	40.603	19.853	3.51E-13	6.14E-13
shallow_water2	0.012	40.087	19.937	2.30E-12	1.27E-13

下载: 导出CSV

表 4 未收敛算例不同迭代参数下求解精度和加速比结果

Table 4. Solutions precisions and speedup ratios of non-convergent numerical examples with different iteration parameters

case	solver precision	maximum outer iteration	maximum inner iteration	precision RMSE	iteration	CPU time/s
case 1	double	2(restart count)	300	8.18E-12	433	1.688
case 2	mixed	10	100	3.51E-4	1 000	2.203
case 3	mixed	10	200	8.49E-12	1 554	5.484
case 4	mixed	20	100	9.36E-12	1 600	3.859
case 5	mixed	40	50	9.289	2 000	3.248
case 6	mixed	4	500	2.415E-5	1 988	14.812 5

下载: 导出CSV

参考文献(27)

[1]	JIMÉNEZ J. Computing high-Reynolds-number turbulence: will simulations ever replace experiments?[J]. Journal of Turbulence, 2003, 4. DOI: 10.1088/1468-5248/4/1/022.
[2]	CHOQUETTE J, GANDHI W, GIROUX O, et al. NVIDIA A100 tensor core GPU: performance and innovation[J]. IEEE Micro, 2021, 41 (2): 29-35. doi: 10.1109/MM.2021.3061394
[3]	RAVIKUMAR A, SRIRAMAN H. A novel mixed precision distributed TPU GAN for accelerated learning curve[J]. Computer Systems Science and Engineering, 2023, 46 (1): 563-578. doi: 10.32604/csse.2023.034710
[4]	NOVITSKIY I M, KUTATELADZE A G. DU8ML: machine learning-augmented density functional theory nuclear magnetic resonance computations for high-throughput in silico solution structure validation and revision of complex alkaloids[J]. Journal of Organic Chemistry, 2022, 87 (7): 4818-4828. doi: 10.1021/acs.joc.2c00169
[5]	HAIDAR A, TOMOV S, DONGARRA J, et al. Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers[C]//SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas, TX, USA: IEEE, 2018: 603-613.
[6]	DU S, BHATTACHARYA C B, SEN S. Maximizing business returns to corporate social responsibility (CSR): the role of CSR communication[J]. International Journal of Management Reviews, 2010, 12 (1): 8-19. doi: 10.1111/j.1468-2370.2009.00276.x
[7]	DENG L, LI G, HAN S, et al. Model compression and hardware acceleration for neural networks: a comprehensive survey[J]. Proceedings of the IEEE, 2020, 108 (4): 485-532. doi: 10.1109/JPROC.2020.2976475
[8]	BAI Y, WANG Y X, LIBERTY E. ProxQuant: quantized neural networksvia proximal operators[J/OL]. 2018[2024-07-10]. https://arxiv.org/abs/1810.00861v3.
[9]	BUTTARI A, DONGARRA J, KURZAK J, et al. Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy[J]. ACM Transactions on Mathematical Software, 2008, 34 (4): 1-22. http://www.xueshufan.com/publication/2111593426
[10]	陈逸, 刘博生, 徐永祺, 等. 混合精度频域卷积神经网络FPGA加速器设计[J]. 计算机工程, 2023, 49 (12): 1-9. doi: 10.3778/j.issn.1002-8331.2210-0108 CHEN Yi, LIU Bosheng, XU Yongqi, et al. FPGA accelerator design for hybrid precision frequency domain convolutional neural network[J]. Computer Engineering, 2023, 49 (12): 1-9. (in Chinese) doi: 10.3778/j.issn.1002-8331.2210-0108
[11]	AMESTOY P R, DUFF I S, L'EXCELLENT J Y. Multifrontal parallel distributed symmetric and unsymmetric solvers[J]. Computer Methods in Applied Mechanics and Engineering, 2000, 184 (2/3/4): 501-520. http://pdfs.semanticscholar.org/2c70/86e4e8d476154d20b271898db23f6bb8a9a3.pdf
[12]	LI X S, DEMMEL J W. SuperLU_DIST: a scalable distributed-memory sparse direct solver for unsymmetric linear systems[J]. ACM Transactions on Mathematical Software, 2003, 29 (2): 110-140. doi: 10.1145/779359.779361
[13]	HOGG J D, SCOTT J A. A fast and robust mixed-precision solver for the solution of sparse symmetric linear systems[J]. ACM Transactions on Mathematical Software, 2010, 37 (2): 1-24. http://pdfs.semanticscholar.org/e001/343705203a8126a2a01310585458971a7158.pdf
[14]	CARSON E, HIGHAM N J. A new analysis of iterative refinement and its application to accurate solution of ill-conditioned sparse linear systems[J]. SIAM Journal on Scientific Computing, 2017, 39 (6): A2834-A2856. doi: 10.1137/17M1122918
[15]	HIGHAM N J, PRANESH S. Exploiting lower precision arithmetic in solving symmetric positive definite linear systems and least squares problems[J]. SIAM Journal on Scientific Computing, 2021, 43 (1): A258-A277. doi: 10.1137/19M1298263
[16]	LOE J A, GLUSA C A, YAMAZAKI I, et al. A study of mixed precision strategies for GMRES on GPUs[J/OL]. 2021[2024-07-10]. https://arxiv.org/abs/2109.01232v1.
[17]	AMESTOY P, BUTTARI A, HIGHAM N J, et al. Five-precision GMRES-based iterative refinement[J]. SIAM Journal on Matrix Analysis and Applications, 2024, 45 (1): 529-552. doi: 10.1137/23M1549079
[18]	HAIDAR A, BAYRAKTAR H, TOMOV S, et al. Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems[J]. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2020, 476 (2243): 20200110. doi: 10.1098/rspa.2020.0110
[19]	ZOUNON M, HIGHAM N J, LUCAS C, et al. Performance impact of precision reduction in sparse linear systems solvers[J]. PeerJ Computer Science, 2022, 8 : e778. doi: 10.7717/peerj-cs.778
[20]	GRATTON S, SIMON E, TITLEY-PELOQUIN D, et al. Exploiting variable precision in GMRES[EB/OL]. 2019[2024-07-10]. https://arxiv.org/abs/1907.10550v2.
[21]	GIRAUD L, HAIDAR A, WATSON L T. Mixed-precision preconditioners in parallel domain decomposition solvers[M]//Lecture Notes in Computational Science and Engineering. Berlin: Springer, 2008: 357-364.
[22]	GÖBEL F, GRVTZMACHER T, RIBIZEL T, et al. Mixed precision incomplete and factorized sparse approximate inverse preconditioning on GPUs[M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2021: 550-564.
[23]	陈华, 史悦戎. 基于GPU的重启PGMRES并行算法研究[J]. 计算机工程与应用, 2014, 50 (7): 35-40. doi: 10.3778/j.issn.1002-8331.1308-0008 CHEN Hua, SHI Yuerong. Study on restarted PGMRES parallel algorithm with GPU[J]. Computer Engineering and Applications, 2014, 50 (7): 35-40. (in Chinese) doi: 10.3778/j.issn.1002-8331.1308-0008
[24]	冯选燕, 燕振国, 朱华君, 等. 非精确Newton方法中线性迭代收敛判据研究[J]. 空气动力学学报, 2023, 41 (12): 28-36. doi: 10.7638/kqdlxxb-2023.0001 FENG Xuanyan, YAN Zhenguo, ZHU Huajun, et al. Study on the convergence criterion of linear iteration in inexact Newton methods[J]. Acta Aerodynamica Sinica, 2023, 41 (12): 28-36. (in Chinese) doi: 10.7638/kqdlxxb-2023.0001
[25]	贡伊明, 刘战合, 刘溢浪, 等. 时间谱方法中的高效GMRES算法[J]. 航空学报, 2017, 38 (7): 120894. GONG Yiming, LIU Zhanhe, LIU Yilang, et al. Efficient GMRES algorithm in time spectral method[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38 (7): 120894. (in Chinese)
[26]	伍康, 吕毅斌, 石允龙, 等. 有界多连通区域数值保角变换的GMRES(m)法[J]. 应用数学和力学, 2022, 43 (9): 1026-1033. doi: 10.21656/1000-0887.420305 WU Kang, LÜ Yibin, SHI Yunlong, et al. The GMRES(m) method for numerical conformal mapping of bounded multi-connected domains[J]. Applied Mathematics and Mechanics, 2022, 43 (9): 1026-1033. (in Chinese) doi: 10.21656/1000-0887.420305
[27]	肖文可, 陈星玎. 求解PageRank问题的重启GMRES修正的多分裂迭代法[J]. 应用数学和力学, 2022, 43 (3): 330-340. doi: 10.21656/1000-0887.420210 XIAO Wenke, CHEN Xingding. A modified multi-splitting iterative method with the restarted GMRES to solve the PageRank problem[J]. Applied Mathematics and Mechanics, 2022, 43 (3): 330-340. (in Chinese) doi: 10.21656/1000-0887.420210

施引文献

资源附件(0)

访问统计

图(10) / 表(6)

计量

文章访问数: 691
HTML全文浏览量: 434
PDF下载量: 72
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

面向流体力学仿真的大型稀疏矩阵混合精度GMRES加速算法

doi: 10.21656/1000-0887.450167

作者简介:
郑森炜(2001—)，男，硕士生(E-mail: senweiz@mail.nwpu.edu.cn)

寇家庆(1993—)，男，教授，博士生导师(E-mail: jqkou@nwpu.edu.cn)

通讯作者:
张伟伟(1979—)，男，教授，博士生导师(通讯作者. E-mail: aeroelastic@nwpu.edu.cn)

计量

A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation

计量

目录

留言板

面向流体力学仿真的大型稀疏矩阵混合精度GMRES加速算法

doi: 10.21656/1000-0887.450167

作者简介: 郑森炜(2001—)，男，硕士生(E-mail: senweiz@mail.nwpu.edu.cn) 寇家庆(1993—)，男，教授，博士生导师(E-mail: jqkou@nwpu.edu.cn)

通讯作者: 张伟伟(1979—)，男，教授，博士生导师(通讯作者. E-mail: aeroelastic@nwpu.edu.cn)

计量

出版历程

A Mixed-Precision GMRES Acceleration Algorithm for Large Sparse Matrices in Fluid Dynamics Simulation

计量

出版历程

目录

作者简介:
郑森炜(2001—)，男，硕士生(E-mail: senweiz@mail.nwpu.edu.cn)

寇家庆(1993—)，男，教授，博士生导师(E-mail: jqkou@nwpu.edu.cn)

通讯作者:
张伟伟(1979—)，男，教授，博士生导师(通讯作者. E-mail: aeroelastic@nwpu.edu.cn)