Efficient Algebraic Multigrid Preconditioners on Clusters of GPUs
Abstract
Many scientific applications require the solution of large and sparse linear systems of equations using Krylov subspace methods; in this case, the choice of an effective preconditioner may be crucial for the convergence of the Krylov solver. Algebraic MultiGrid (AMG) methods are widely used as preconditioners, because of their optimal computational cost and their algorithmic scalability. The wide availability of GPUs, now found in many of the fastest supercomputers, poses the problem of implementing efficiently these methods on high-throughput processors. In this work we focus on the application phase of AMG preconditioners, and in particular on the choice and implementation of smoothers and coarsest-level solvers capable of exploiting the computational power of clusters of GPUs. We consider block-Jacobi smoothers using sparse approximate inverses in the solve phase associated with the local blocks. The choice of approximate inverses instead of sparse matrix factorizations is driven by the large amount of parallelism exposed by the matrix-vector product as compared to the solution of large triangular systems on GPUs. The selected smoothers and solvers are implemented within the AMG preconditioning framework provided by the MLD2P4 library, using suitable sparse matrix data structures from the PSBLAS library. Their behaviour is illustrated in terms of execution speed and scalability, on a test case concerning groundwater modelling, provided by the Jülich Supercomputing Center within the Horizon 2020 Project EoCoE.
References
- 1. , Iterative Methods for Sparse Linear Systems, 2nd edition (SIAM, Philadelphia, PA, 2003). Crossref, Google Scholar
- 2. , Multilevel Block Factorization Preconditioners: Matrix-based Analysis and Algorithms for Solving Finite Element Equations (Springer, 2008). Google Scholar
- 3. , A review of algebraic multigrid, J. Comput. Appl. Math. 128(1) (2001) 281–309. Crossref, ISI, Google Scholar
- 4. , Algebraic multigrid methods, Acta Numerica 26 (2017) 591–721. Crossref, ISI, Google Scholar
- 5. , High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems, in Proc. of Int’l. Conf. on High Performance Computing, Networking, Storage and Analysis (SC’15) (
2015 ). Google Scholar - 6. , Sparse approximate inverse preconditioners on high performance GPU platforms, Comput. Math. Appl. 71(3) (2016) 693–711. Crossref, ISI, Google Scholar
- 7. , Sparse approximate inverse preconditioners for iterative solvers on GPUs, in Proc. of 2012 Symp. on High Performance Computing (HPC’12) (
2012 ), pages 13:1–13:8. Google Scholar - 8. , On the development of PSBLAS-based parallel two-level Schwarz preconditioners, Appl. Numer. Math. 57(11) (2007) 1181–1196. Crossref, ISI, Google Scholar
- 9. , 2LEV-D2P4: A package of high-performance preconditioners for scientific and engineering applications, Appl. Algebra Engrg. Comm. Comput. 18(3) (2007) 223–239. Crossref, ISI, Google Scholar
- 10. , MLD2P4: A package of parallel algebraic multilevel domain decomposition preconditioners in Fortran 95, ACM Trans. Math. Softw. 37(3) (2010) 7–23. Crossref, ISI, Google Scholar
- 11. , Object-oriented techniques for sparse matrix computations in Fortran 2003, ACM Trans. Math. Softw. 38(4) (2012) 23:1–23:20. Crossref, ISI, Google Scholar
- 12. ,
A parallel algebraic multigrid solver on graphics processing units , in High Performance Computing and Applications (Springer, 2010), pages 38–47. Crossref, Google Scholar - 13. , Exposing fine-grained parallelism in algebraic multigrid methods, SIAM J. Sci. Comput. 34(4) (2012) C123–C152. Crossref, ISI, Google Scholar
- 14. , Steps towards GPU accelerated aggregation AMG, in Proc. of 11th Int’l. Symp. on Parallel and Distributed Computing (ISPDC 2012 ) (
2012 ), pages 79–86. Google Scholar - 15. , A comparison of algebraic multigrid preconditioners using graphics processing units and multi-core central processing units, in Proc. of 2012 Int’l. Symp. on High Performance Computing (HPC’12) (
2012 ), pages 2:1–2:7. Google Scholar - 16. , A GPU accelerated aggregation algebraic multigrid method, Comput. Math. Appl. 68(10) (2014) 1151–1160. Crossref, ISI, Google Scholar
- 17. , AmgX: A library for GPU accelerated algebraic multigrid and preconditioned iterative methods, SIAM J. Sci. Comput. 37(5) (2015 S602–S626. Crossref, ISI, Google Scholar
- 18. , Accelerating algebraic multigrid solvers on NVIDIA GPUs, Comput. Math. Appl. 70(5) (2015) 1162–1181. Crossref, ISI, Google Scholar
- 19. , Algebraic multigrid by smoothed aggregation for second and fourth order elliptic problems, Computing 56 (1996) 179–196. Crossref, ISI, Google Scholar
- 20. , A black-box iterative solver based on a two-level Schwarz method, Computing 63 (1999) 233–263. Crossref, ISI, Google Scholar
- 21. P. D’Ambra, D. di Serafino and S. Filippone, MLD2P4 User’s and Reference Guide, version 2.1 (2017). Available from https://github.com/sfilippone/mld2p4-2. Google Scholar
- 22. , Design patterns for sparse-matrix computations on hybrid CPU/GPU platforms, Sci. Program. 22(1) (2014) 1–19. Google Scholar
- 23. , Sparse matrix-vector multiplication on GPGPUs, ACM Trans. Math. Softw. 43(4) (2017) 30:1–30:49. Crossref, ISI, Google Scholar
- 24. D. Barbieri, V. Cardellini, A. Fanfarillo and S. Filippone, Three storage formats for sparse matrices on GPGPUs, Technical Report DICII RR-15.6 (Università di Roma Tor Vergata, 2015). http://hdl.handle.net/2108/113393. Google Scholar
- 25. , Parallel algebraic multilevel Schwarz preconditioners for a class of elliptic PDE systems, Comput. Vis. Sci. 16(1) (2013) 1–14. Crossref, Google Scholar
- 26. , Performance analysis of parallel Schwarz preconditioners in the LES of turbulent channel flows, Comput. Math. Appl. 65(3) (2013) 352–361. Crossref, ISI, Google Scholar
- 27. , SParC-LES: Enabling large eddy simulations with parallel sparse matrix computation tools, Comput. Math. Appl. 70(11) (2015) 2688–2700. Crossref, ISI, Google Scholar
- 28. , A parallel generalized relaxation method for high-performance image segmentation on GPUs, Journal of Computational and Applied Mathematics 293(C) (2016) 35–44. Crossref, ISI, Google Scholar
- 29. M. Naumov, Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU. Technical Report NVR-2011 1, NVIDIA Co., 2011. Google Scholar
- 30. , Assessing sparse triangular linear system solvers on GPUs, in Proc. of 2017 Int’l. Symp. on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) (IEEE, 2017), pages 37–42. Crossref, Google Scholar
- 31. ,
Approximate inverse preconditioners for Krylov methods on heterogeneous parallel computers , in Parallel Computing: Accelerating Computational Science and Engineering, eds. M. BaderA. BodeH. BungartzM. GerndtG. JoubertF. Peters (IOS Press, 2014), pages 183–192. Google Scholar - 32. , Scalable parallel preconditioning with the sparse approximate inverse of triangular matrices, SIAM J. Matrix Anal. Appl. 20(4) (1999) 987–1006. Crossref, ISI, Google Scholar
- 33. , A sparse approximate inverse preconditioner for nonsymmetric linear systems, SIAM J. Sci. Comput. 19(3) (1998) 968–994. Crossref, ISI, Google Scholar
- 34. , Robust approximate inverse preconditioning for the conjugate gradient method, SIAM J. Sci. Comput. 22(4) (2000) 1318–1332. Crossref, ISI, Google Scholar
- 35. ,
An introduction to the numerics of flow in porous media using Matlab , in Geometric Modelling, Numerical Simulation and Optimization, eds. G. HasleK.-A. LieE. Quak (Springer, 2007), pages 265–306. Crossref, Google Scholar


