Automatic Data Layout Transformations in the ExaStencils Code Generator
Abstract
Performance optimizations should focus not only on the computations of an application, but also on the internal data layout. A well-known problem is whether a struct of arrays or an array of structs results in a higher performance for a particular application. Even though the switch from the one to the other is fairly simple to implement, testing both transformations can become laborious and error-prone. Additionally, there are more complex data layout transformations, such as a color splitting for multi-color kernels in the domain of stencil codes, that are manually difficult. As a remedy, we propose new flexible layout transformation statements for our domain-specific language ExaSlang that support arbitrary affine transformations. Since our code generator applies them automatically to the generated code, these statements enable the simple adaptation of the data layout without the need for any other modifications of the application code. This constitutes a big advance in the ease of testing and evaluating different memory layout schemes in order to identify the best.
References
- 1. , Multi-Grid Methods and Applications (Springer-Verlag, 1985). Google Scholar
- 2. Multigrid (Academic Press, 2000). Google Scholar
- 3. , Performance engineering to achieve real-time high dynamic range imaging, J. Real-Time Image Processing 11(1) (2016) 127–139. Google Scholar
- 4. , Cache-aware multigrid methods for solving Poisson’s equation in two dimensions, Computing 64(4) (2000) 381–399. Google Scholar
- 5. , ExaStencils: Advanced stencil-code engineering in Euro-Par 2014: Parallel Processing Workshops, Part II, eds. L. Lopes, Volume 8806 of
Lecture Notes in Computer Science (Springer, 2014), pp. 553–564. Google Scholar - 6. , ExaSlang: A domain-specific language for highly scalable multigrid solvers, in Proc. 4th Int. Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC) (ACM, November 2014), pp. 42–51. Google Scholar
- 7. , Systems of partial differential equations in ExaSlang, in Software for Exascale Computing — SPPEXA 2013–2015, eds. H. J. BungartzP. NeumannW. E. Nagel, Volume 113 of
Lecture Notes in Computational Science and Engineering (Springer, 2016), pp. 47–67. Google Scholar - 8. , isl: An integer set library for the polyhedral model, in Mathematical Software (ICMS 2010), eds. K. FukudaJ. van der HoevenM. JoswigM. Takayama, Volume 6327,
LNCS (Springer, 2010), pp. 299–302. Google Scholar - 9. ,
Polyhedron model , in Encyclopedia of Parallel Computing, eds. D. Padua (Springer, September 2011), pp. 1581–1592. Google Scholar - 10. S. Kronawitter and C. Lengauer, Optimizations applied by the ExaStencils code generator. Technical Report MIP-1502, Faculty of Computer Science and Mathematics, University of Passau, January 2015, 10 pages. Google Scholar
- 11. , Redundancy elimination in the ExaStencils code generator, in Algorithms and Architectures for Parallel Processing (ICA3PP), Collocated Workshops, eds. J. Carretero, Volume 10049 of
Lecture Notes in Computer Science (Springer, 2016), pp. 159–173. First Int. Workshop on Data Locality in Modern Computing Systems (DLMCS). Google Scholar - 12. , Hybrid hexagonal/classical tiling for GPUs, in Proc. 12th Int. Symp. on Code Generation and Optimization (CGO) (ACM, 2014). Google Scholar
- 13. , Determining optical flow, Artificial Intelligence 17(1–3) (1981) 185–203. ISI, Google Scholar
- 14. , Variational optical flow computation in real time, IEEE Trans. on Image Processing (TIP), 14(5) (2005) 608–615. Google Scholar
- 15. , 3D optical flow computation using a parallel variational multigrid scheme with application to cardiac C-arm CT motion, Image and Vision Computing 25(9) (2007) 1482–1494. Google Scholar
- 16. , Nonsingular data transformations: Definition, validity, and applications, Int. J. Parallel Programming, 27(3) (June 1999) 131–159. Google Scholar
- 17. P. Clauss and B. Meister, Automatic memory layout transformations to optimize spatial locality in parameterized loop nests, SIGARCH Computer Architecture News 28:11–19, March 2000. Google Scholar
- 18. , Kokkos: Enabling manycore performance portability through polymorphic memory access patterns, J. Parallel and Distributed Computing, 74(12) (2014) 3202–3216. Google Scholar
- 19. , YASK — Yet another stencil kernel: A framework for HPC stencil code-generation and tuning, in Proc. 6th Int. Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC) (IEEE Press, 2016), pp. 30–39. Google Scholar
- 20. , A constraint network-based approach to memory layout optimization, in Proc. Conf. on Design, Automation and Test in Europe (DATE), Vol. 2 (IEEE Computer Society, 2005), pp. 1156–1161. Google Scholar
- 21. S. A. Leung, Array Restructuring for Cache Locality. PhD thesis, University of Washington, Department of Computer Science & Engineering, 1996. Google Scholar
- 22. , Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions, J. Parallel and Distributed Computing, 72(3) (2012) 338–352. Google Scholar
- 23. , Minimizing strides in loops with affine array references, in Proc. Compilers for Parallel Computers (CPC) (
June 2001 ), 12 pp. Google Scholar - 24. , Precise data locality optimization of nested loops, J. Supercomputing, 21 (January 2002) 37–76. ISI, Google Scholar
- 25. , Data locality and parallelism optimization using a constraint-based approach, J. Parallel and Distributed Computing, 71(2) (2011) 280–287. Google Scholar


