Reconfigurable Hardware Generation of Multigrid Solvers with Conjugate Gradient Coarse-Grid Solution
Abstract
Not only in the field of high-performance computing (HPC), field programmable gate arrays (FPGAs) are a soaringly popular accelerator technology. However, they use a completely different programming paradigm and tool set compared to central processing units (CPUs) or even graphics processing units (GPUs), adding extra development steps and requiring special knowledge, hindering widespread use in scientific computing. To bridge this programmability gap, domain-specific languages (DSLs) are a popular choice to generate low-level implementations from an abstract algorithm description. In this work, we demonstrate our approach for the generation of numerical solver implementations based on the multigrid method for FPGAs from the same code base that is also used to generate code for CPUs using a hybrid parallelization of MPI and OpenMP. Our approach yields in a hardware design that can compute up to 11 V-cycles per second with an input grid size of 40964096 and solution on the coarsest using the conjugate gradient (CG) method on a mid-range FPGA, beating vectorized, multi-threaded execution on an Intel Xeon processor.
References
- 1. , ExaStencils: Advanced stencil-code engineering, in Proc. of Euro-Par 2014: Parallel Processing Workshops, Vol. 8806 of
Lecture Notes in Computer Science (LNCS) (Springer, 2014), pp. 553–564. Google Scholar - 2. , Multi-Grid Methods and Applications (Springer-Verlag, 1985). Crossref, Google Scholar
- 3. , Multigrid (Academic Press, 2001). Google Scholar
- 4. Y. Gu, FPGA Acceleration of Molecular Dynamics Simulations, PhD thesis, Boston University, Boston, MA, USA (2008). Google Scholar
- 5. J. Hu, Solution of partial differential equations using reconfigurable computing, PhD thesis, University of Birmingham (December 2011). Google Scholar
- 6. , A fast poisson solver for hybrid reconfigurable system, in Proc. Int. Conf. on Reconfigurable Computing: Architectures, Tools, and Applications (ARC)
Berlin, Heidelberg ,2013 (Springer-Verlag), pp. 47–58. Google Scholar - 7. ,
AutoPilot: A platform-based ESL synthesis system , in High-Level Synthesis: From Algorithm to Digital Circuit, eds. P. CoussyA. Morawiec, Chap. 6 (Springer, 2008), pp. 99–112. Crossref, Google Scholar - 8. , High-Level SystemC Synthesis with Forte’s Cynthesizer, Chap. 5 (Springer, 2008). Google Scholar
- 9. , C-based SoC design flow and EDA tools: An ASIC and system vendor perspective, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 19(12) (December 2000) 1507–1522. Crossref, Google Scholar
- 10. ,
Algorithmic synthesis using PICO: An integrated framework for application engine synthesis and verification from high level C algorithms , in High-Level Synthesis: From Algorithm to Digital Circuit, eds. P. CoussyA. Morawiec, Chap. 4 (Springer, 2008), pp. 53–74. Crossref, Google Scholar - 11. , PARO: Synthesis of hardware accelerators for multi-dimensional dataflow-intensive applications, in Proc. Int. Workshop on Applied Reconfigurable Computing (ARC), Vol. 4943 of
Lecture Notes in Computer Science (LNCS) (Springer, 2008), pp. 287–293. Google Scholar - 12. , A deeply pipelined and parallel architecture for denoising medical images, in Proc. IEEE Int. Conf. on Field Programmable Technology (FPT) (IEEE, 2010), pp. 485–490. Google Scholar
- 13. ,
SPIRAL , in Encyclopedia of Parallel Computing, ed. D. A. Padua (Springer, 2011), pp. 1920–1933. Google Scholar - 14. , Automated empirical optimization of software and the ATLAS project, Parallel Computing 27(1) (2001) 3–35. Crossref, Google Scholar
- 15. , The design and implementation of FFTW3, Proc. IEEE 93(2) (2005) 216–231. Crossref, ISI, Google Scholar
- 16. , Liszt: A domain specific language for building portable mesh-based PDE solvers, in Proc. Conf. on High Performance Computing Networking, Storage and Analysis (SC) (ACM, 2011), Paper 9, 12 pp. Google Scholar
- 17. , The Pochoir stencil compiler, in Proc. ACM Symp. on Parallelism in Algorithms and Architectures (SPAA) (ACM, 2011), pp. 117–128. Google Scholar
- 18. , PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures, in Proc. IEEE Int. Parallel & Distributed Processing Symp. (IPDPS) (IEEE, 2011), pp. 676–687. Google Scholar
- 19. , Decoupling algorithms from schedules for easy optimization of image processing pipelines, ACM Transactions on Graphics 31(4) (2012) 32:1–32:12. Crossref, Google Scholar
- 20. , Darkroom: Compiling high-level image processing code into hardware pipelines, ACM Transactions on Graphics 33(4) (2014) 144:1–144:11. Crossref, Google Scholar
- 21. , HIPAcc: A domain-specific language and compiler for image processing, IEEE Transactions on Parallel and Distributed Systems, PP(99), (2015). Google Scholar
- 22. , Towards a performance-portable description of geometric multigrid algorithms using a domain-specific language, Journal of Parallel and Distributed Computing (2014). Crossref, Google Scholar
- 23. , Code generation for high-level synthesis of multiresolution applications on FPGAs, in Proc. Int. Workshop on FPGAs for Software Programmers (FSP) (2014), pp. 21–26. Google Scholar
- 24. , ExaSlang: A domain-specific language for highly scalable multigrid solvers, in Proc. Int. Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC) (IEEE Computer Society, 2014), pp. 42–51. Google Scholar
- 25. , Systems of partial differential equations in ExaSlang, in Software for Exascale Computing – SPPEXA 2013–2015, Vol. 113 of
Lecture Notes in Computational Science and Engineering (Springer, 2016). Google Scholar - 26. , Automating the development of high-performance multigrid solvers, Proc. of the IEEE 106(11) (2018) 1969–1984. Crossref, Google Scholar
- 27. , An image processing library for C-based high-level synthesis, in Proc. Int. Conf. on Field Programmable Logic and Applications (FPL) (2014). Google Scholar
- 28. . Generation of multigrid-based numerical solvers for FPGA accelerators, in Proc. Int. Workshop on High-Performance Stencil Computations (HiStencils) (2015), pp. 9–15. Google Scholar


