World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Reconfigurable Hardware Generation of Multigrid Solvers with Conjugate Gradient Coarse-Grid Solution

    Not only in the field of high-performance computing (HPC), field programmable gate arrays (FPGAs) are a soaringly popular accelerator technology. However, they use a completely different programming paradigm and tool set compared to central processing units (CPUs) or even graphics processing units (GPUs), adding extra development steps and requiring special knowledge, hindering widespread use in scientific computing. To bridge this programmability gap, domain-specific languages (DSLs) are a popular choice to generate low-level implementations from an abstract algorithm description. In this work, we demonstrate our approach for the generation of numerical solver implementations based on the multigrid method for FPGAs from the same code base that is also used to generate code for CPUs using a hybrid parallelization of MPI and OpenMP. Our approach yields in a hardware design that can compute up to 11 V-cycles per second with an input grid size of 4096×4096 and solution on the coarsest using the conjugate gradient (CG) method on a mid-range FPGA, beating vectorized, multi-threaded execution on an Intel Xeon processor.

    References

    • 1. C. Lengauer, S. Apel, M. Bolten, A. Größlinger, F. Hannig, H. Köstler, U. Rüde, J. Teich, A. Grebhahn, S. Kronawitter, S. Kuckuk, H. Rittich and C. Schmitt, ExaStencils: Advanced stencil-code engineering, in Proc. of Euro-Par 2014: Parallel Processing Workshops, Vol. 8806 of Lecture Notes in Computer Science (LNCS) (Springer, 2014), pp. 553–564. Google Scholar
    • 2. W. Hackbusch, Multi-Grid Methods and Applications (Springer-Verlag, 1985). CrossrefGoogle Scholar
    • 3. U. Trottenberg, C. W. Oosterlee and A. Schüller, Multigrid (Academic Press, 2001). Google Scholar
    • 4. Y. Gu, FPGA Acceleration of Molecular Dynamics Simulations, PhD thesis, Boston University, Boston, MA, USA (2008). Google Scholar
    • 5. J. Hu, Solution of partial differential equations using reconfigurable computing, PhD thesis, University of Birmingham (December 2011). Google Scholar
    • 6. V. Gomes, H. C. Velho and A. Charão, A fast poisson solver for hybrid reconfigurable system, in Proc. Int. Conf. on Reconfigurable Computing: Architectures, Tools, and Applications (ARC) Berlin, Heidelberg, 2013 (Springer-Verlag), pp. 47–58. Google Scholar
    • 7. Z. Zhang, Y. Fan, W. Jiang, G. Han, C. Yang and J. Cong, AutoPilot: A platform-based ESL synthesis system, in High-Level Synthesis: From Algorithm to Digital Circuit, eds. P. CoussyA. Morawiec, Chap. 6 (Springer, 2008), pp. 99–112. CrossrefGoogle Scholar
    • 8. M. Meredith, High-Level SystemC Synthesis with Forte’s Cynthesizer, Chap. 5 (Springer, 2008). Google Scholar
    • 9. K. Wakabayashi and T. Okamoto, C-based SoC design flow and EDA tools: An ASIC and system vendor perspective, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 19(12) (December 2000) 1507–1522. CrossrefGoogle Scholar
    • 10. S. Aditya and V. Kathail, Algorithmic synthesis using PICO: An integrated framework for application engine synthesis and verification from high level C algorithms, in High-Level Synthesis: From Algorithm to Digital Circuit, eds. P. CoussyA. Morawiec, Chap. 4 (Springer, 2008), pp. 53–74. CrossrefGoogle Scholar
    • 11. F. Hannig, H. Ruckdeschel, H. Dutta and J. Teich, PARO: Synthesis of hardware accelerators for multi-dimensional dataflow-intensive applications, in Proc. Int. Workshop on Applied Reconfigurable Computing (ARC), Vol. 4943 of Lecture Notes in Computer Science (LNCS) (Springer, 2008), pp. 287–293. Google Scholar
    • 12. F. Hannig, Moritz Schmid, J. Teich and H. Hornegger, A deeply pipelined and parallel architecture for denoising medical images, in Proc. IEEE Int. Conf. on Field Programmable Technology (FPT) (IEEE, 2010), pp. 485–490. Google Scholar
    • 13. M. Püschel, F. Franchetti and Y. Voronenko, SPIRAL, in Encyclopedia of Parallel Computing, ed. D. A. Padua (Springer, 2011), pp. 1920–1933. Google Scholar
    • 14. R. Clint Whaley, A. Petitet and J. J. Dongarra, Automated empirical optimization of software and the ATLAS project, Parallel Computing 27(1) (2001) 3–35. CrossrefGoogle Scholar
    • 15. M. Frigo and S. G. Johnson, The design and implementation of FFTW3, Proc. IEEE 93(2) (2005) 216–231. Crossref, ISIGoogle Scholar
    • 16. Z. DeVito, N. Joubert, F. Palaciosy, S. Oakleyz, M. Medinaz, M. Barrientos, E. Elsenz, F. Hamz, A. Aiken, K. Duraisamy, E. Darvez, J. Alonso and P. Hanrahan, Liszt: A domain specific language for building portable mesh-based PDE solvers, in Proc. Conf. on High Performance Computing Networking, Storage and Analysis (SC) (ACM, 2011), Paper 9, 12 pp. Google Scholar
    • 17. Y. Tang, R. A. Chowdhury, B. C. Kuszmaul, C.-K. Luk and Ch. E. Leiserson, The Pochoir stencil compiler, in Proc. ACM Symp. on Parallelism in Algorithms and Architectures (SPAA) (ACM, 2011), pp. 117–128. Google Scholar
    • 18. M. Christen, O. Schenk and H. Burkhart, PATUS: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures, in Proc. IEEE Int. Parallel & Distributed Processing Symp. (IPDPS) (IEEE, 2011), pp. 676–687. Google Scholar
    • 19. J. Ragan-Kelley, A. Adams, S. Paris, M. Levoy, S. Amarasinghe and F. Durand, Decoupling algorithms from schedules for easy optimization of image processing pipelines, ACM Transactions on Graphics 31(4) (2012) 32:1–32:12. CrossrefGoogle Scholar
    • 20. J. Hegarty, J. Brunhaver, Z. DeVito, J. Ragan-Kelley, N. Cohen, S. Bell, A. Vasilyev, M. Horowitz and P. Hanrahan, Darkroom: Compiling high-level image processing code into hardware pipelines, ACM Transactions on Graphics 33(4) (2014) 144:1–144:11. CrossrefGoogle Scholar
    • 21. R. Membarth, O. Reiche, F. Hannig, J. Teich, M. Körner and W. Eckert, HIPAcc: A domain-specific language and compiler for image processing, IEEE Transactions on Parallel and Distributed Systems, PP(99), (2015). Google Scholar
    • 22. R. Membarth, O. Reiche, C. Schmitt, F. Hannig, J. Teich, M. Stürmer and H. Köstler, Towards a performance-portable description of geometric multigrid algorithms using a domain-specific language, Journal of Parallel and Distributed Computing (2014). CrossrefGoogle Scholar
    • 23. M. Schmid, O. Reiche, C. Schmitt, F. Hannig and J. Teich, Code generation for high-level synthesis of multiresolution applications on FPGAs, in Proc. Int. Workshop on FPGAs for Software Programmers (FSP) (2014), pp. 21–26. Google Scholar
    • 24. C. Schmitt, S. Kuckuk, F. Hannig, H. Köstler and J. Teich, ExaSlang: A domain-specific language for highly scalable multigrid solvers, in Proc. Int. Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC) (IEEE Computer Society, 2014), pp. 42–51. Google Scholar
    • 25. C. Schmitt, S. Kuckuk, F. Hannig, J. Teich, H. Köstler, U. Rüde and C. Lengauer, Systems of partial differential equations in ExaSlang, in Software for Exascale Computing – SPPEXA 2013–2015, Vol. 113 of Lecture Notes in Computational Science and Engineering (Springer, 2016). Google Scholar
    • 26. C. Schmitt, S. Kronawitter, F. Hannig, J. Teich and C. Lengauer, Automating the development of high-performance multigrid solvers, Proc. of the IEEE 106(11) (2018) 1969–1984. CrossrefGoogle Scholar
    • 27. M. Schmid, N. Apelt, F. Hannig and J. Teich, An image processing library for C-based high-level synthesis, in Proc. Int. Conf. on Field Programmable Logic and Applications (FPL) (2014). Google Scholar
    • 28. C. Schmitt, M. Schmid, F. Hannig, J. Teich, S. Kuckuk and H. Köstler. Generation of multigrid-based numerical solvers for FPGA accelerators, in Proc. Int. Workshop on High-Performance Stencil Computations (HiStencils) (2015), pp. 9–15. Google Scholar