World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Efficient Algebraic Multigrid Preconditioners on Clusters of GPUs

    Many scientific applications require the solution of large and sparse linear systems of equations using Krylov subspace methods; in this case, the choice of an effective preconditioner may be crucial for the convergence of the Krylov solver. Algebraic MultiGrid (AMG) methods are widely used as preconditioners, because of their optimal computational cost and their algorithmic scalability. The wide availability of GPUs, now found in many of the fastest supercomputers, poses the problem of implementing efficiently these methods on high-throughput processors. In this work we focus on the application phase of AMG preconditioners, and in particular on the choice and implementation of smoothers and coarsest-level solvers capable of exploiting the computational power of clusters of GPUs. We consider block-Jacobi smoothers using sparse approximate inverses in the solve phase associated with the local blocks. The choice of approximate inverses instead of sparse matrix factorizations is driven by the large amount of parallelism exposed by the matrix-vector product as compared to the solution of large triangular systems on GPUs. The selected smoothers and solvers are implemented within the AMG preconditioning framework provided by the MLD2P4 library, using suitable sparse matrix data structures from the PSBLAS library. Their behaviour is illustrated in terms of execution speed and scalability, on a test case concerning groundwater modelling, provided by the Jülich Supercomputing Center within the Horizon 2020 Project EoCoE.

    References

    • 1. Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd edition (SIAM, Philadelphia, PA, 2003). CrossrefGoogle Scholar
    • 2. P. S. Vassilevski, Multilevel Block Factorization Preconditioners: Matrix-based Analysis and Algorithms for Solving Finite Element Equations (Springer, 2008). Google Scholar
    • 3. K. Stüben, A review of algebraic multigrid, J. Comput. Appl. Math. 128(1) (2001) 281–309. Crossref, ISIGoogle Scholar
    • 4. J. Xu and L. Zikatanov, Algebraic multigrid methods, Acta Numerica 26 (2017) 591–721. Crossref, ISIGoogle Scholar
    • 5. J. Park, M. Smelyanskiy, U. M. Yang, D. Mudigere and P. Dubey, High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems, in Proc. of Int’l. Conf. on High Performance Computing, Networking, Storage and Analysis (SC’15) (2015). Google Scholar
    • 6. D. Bertaccini and S. Filippone, Sparse approximate inverse preconditioners on high performance GPU platforms, Comput. Math. Appl. 71(3) (2016) 693–711. Crossref, ISIGoogle Scholar
    • 7. M. Lukash, K. Rupp and S. Selberherr, Sparse approximate inverse preconditioners for iterative solvers on GPUs, in Proc. of 2012 Symp. on High Performance Computing (HPC’12) (2012), pages 13:1–13:8. Google Scholar
    • 8. P. D’Ambra, D. di Serafino and S. Filippone, On the development of PSBLAS-based parallel two-level Schwarz preconditioners, Appl. Numer. Math. 57(11) (2007) 1181–1196. Crossref, ISIGoogle Scholar
    • 9. A. Buttari, P. D’Ambra, D. di Serafino and S. Filippone, 2LEV-D2P4: A package of high-performance preconditioners for scientific and engineering applications, Appl. Algebra Engrg. Comm. Comput. 18(3) (2007) 223–239. Crossref, ISIGoogle Scholar
    • 10. P. D’Ambra, D. di Serafino and S. Filippone, MLD2P4: A package of parallel algebraic multilevel domain decomposition preconditioners in Fortran 95, ACM Trans. Math. Softw. 37(3) (2010) 7–23. Crossref, ISIGoogle Scholar
    • 11. S. Filippone and A. Buttari, Object-oriented techniques for sparse matrix computations in Fortran 2003, ACM Trans. Math. Softw. 38(4) (2012) 23:1–23:20. Crossref, ISIGoogle Scholar
    • 12. G. Haase, M. Liebmann, C. C. Douglas and G. Plank, A parallel algebraic multigrid solver on graphics processing units, in High Performance Computing and Applications (Springer, 2010), pages 38–47. CrossrefGoogle Scholar
    • 13. N. Bell, S. Dalton and L. N. Olson, Exposing fine-grained parallelism in algebraic multigrid methods, SIAM J. Sci. Comput. 34(4) (2012) C123–C152. Crossref, ISIGoogle Scholar
    • 14. M. Emans, M. Liebmann and B. Basara, Steps towards GPU accelerated aggregation AMG, in Proc. of 11th Int’l. Symp. on Parallel and Distributed Computing (ISPDC 2012 ) (2012), pages 79–86. Google Scholar
    • 15. M. Wagner, K. Rupp and J. Weinbub, A comparison of algebraic multigrid preconditioners using graphics processing units and multi-core central processing units, in Proc. of 2012 Int’l. Symp. on High Performance Computing (HPC’12) (2012), pages 2:1–2:7. Google Scholar
    • 16. R. Gandham, K. Esler and Y. Zhang, A GPU accelerated aggregation algebraic multigrid method, Comput. Math. Appl. 68(10) (2014) 1151–1160. Crossref, ISIGoogle Scholar
    • 17. M. Naumov, M. Arsaev, P. Castonguay, J. Cohen, J. Demouth, J. Eaton, S. Layton, N. Markovskiy, I. Reguly, N. Sakharnykh, V. Sellappan and R. Strzodka, AmgX: A library for GPU accelerated algebraic multigrid and preconditioned iterative methods, SIAM J. Sci. Comput. 37(5) (2015 S602–S626. Crossref, ISIGoogle Scholar
    • 18. H. Liu, B. Yang and Z. Chen, Accelerating algebraic multigrid solvers on NVIDIA GPUs, Comput. Math. Appl. 70(5) (2015) 1162–1181. Crossref, ISIGoogle Scholar
    • 19. J. Mandel, P. Vaněk and M. Brezina, Algebraic multigrid by smoothed aggregation for second and fourth order elliptic problems, Computing 56 (1996) 179–196. Crossref, ISIGoogle Scholar
    • 20. M. Brezina and P. Vaněk, A black-box iterative solver based on a two-level Schwarz method, Computing 63 (1999) 233–263. Crossref, ISIGoogle Scholar
    • 21. P. D’Ambra, D. di Serafino and S. Filippone, MLD2P4 User’s and Reference Guide, version 2.1 (2017). Available from https://github.com/sfilippone/mld2p4-2. Google Scholar
    • 22. V. Cardellini, S. Filippone and D. Rouson, Design patterns for sparse-matrix computations on hybrid CPU/GPU platforms, Sci. Program. 22(1) (2014) 1–19. Google Scholar
    • 23. S. Filippone, V. Cardellini, D. Barbieri and A. Fanfarillo, Sparse matrix-vector multiplication on GPGPUs, ACM Trans. Math. Softw. 43(4) (2017) 30:1–30:49. Crossref, ISIGoogle Scholar
    • 24. D. Barbieri, V. Cardellini, A. Fanfarillo and S. Filippone, Three storage formats for sparse matrices on GPGPUs, Technical Report DICII RR-15.6 (Università di Roma Tor Vergata, 2015). http://hdl.handle.net/2108/113393. Google Scholar
    • 25. A. Borzì, V. De Simone and D. di Serafino, Parallel algebraic multilevel Schwarz preconditioners for a class of elliptic PDE systems, Comput. Vis. Sci. 16(1) (2013) 1–14. CrossrefGoogle Scholar
    • 26. P. D’Ambra, D. di Serafino and S. Filippone, Performance analysis of parallel Schwarz preconditioners in the LES of turbulent channel flows, Comput. Math. Appl. 65(3) (2013) 352–361. Crossref, ISIGoogle Scholar
    • 27. A. Aprovitola, P. D’Ambra, F. M. Denaro, D. di Serafino and S. Filippone, SParC-LES: Enabling large eddy simulations with parallel sparse matrix computation tools, Comput. Math. Appl. 70(11) (2015) 2688–2700. Crossref, ISIGoogle Scholar
    • 28. P. D’Ambra and S. Filippone, A parallel generalized relaxation method for high-performance image segmentation on GPUs, Journal of Computational and Applied Mathematics 293(C) (2016) 35–44. Crossref, ISIGoogle Scholar
    • 29. M. Naumov, Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU. Technical Report NVR-2011 1, NVIDIA Co., 2011. Google Scholar
    • 30. D. Erguiz, E. Dufrechou and P. Ezzatti, Assessing sparse triangular linear system solvers on GPUs, in Proc. of 2017 Int’l. Symp. on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) (IEEE, 2017), pages 37–42. CrossrefGoogle Scholar
    • 31. D. Bertaccini and S. Filippone, Approximate inverse preconditioners for Krylov methods on heterogeneous parallel computers, in Parallel Computing: Accelerating Computational Science and Engineering, eds. M. BaderA. BodeH. BungartzM. GerndtG. JoubertF. Peters (IOS Press, 2014), pages 183–192. Google Scholar
    • 32. A. C. N. van Duin, Scalable parallel preconditioning with the sparse approximate inverse of triangular matrices, SIAM J. Matrix Anal. Appl. 20(4) (1999) 987–1006. Crossref, ISIGoogle Scholar
    • 33. M. Benzi and M. Tůma, A sparse approximate inverse preconditioner for nonsymmetric linear systems, SIAM J. Sci. Comput. 19(3) (1998) 968–994. Crossref, ISIGoogle Scholar
    • 34. M. Benzi, J. K. Cullum and M. Tůma, Robust approximate inverse preconditioning for the conjugate gradient method, SIAM J. Sci. Comput. 22(4) (2000) 1318–1332. Crossref, ISIGoogle Scholar
    • 35. J. E. Aarnes, T. Gimse and K.-A. Lie, An introduction to the numerics of flow in porous media using Matlab, in Geometric Modelling, Numerical Simulation and Optimization, eds. G. HasleK.-A. LieE. Quak (Springer, 2007), pages 265–306. CrossrefGoogle Scholar