World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Automatic Data Layout Transformations in the ExaStencils Code Generator

    Performance optimizations should focus not only on the computations of an application, but also on the internal data layout. A well-known problem is whether a struct of arrays or an array of structs results in a higher performance for a particular application. Even though the switch from the one to the other is fairly simple to implement, testing both transformations can become laborious and error-prone. Additionally, there are more complex data layout transformations, such as a color splitting for multi-color kernels in the domain of stencil codes, that are manually difficult. As a remedy, we propose new flexible layout transformation statements for our domain-specific language ExaSlang that support arbitrary affine transformations. Since our code generator applies them automatically to the generated code, these statements enable the simple adaptation of the data layout without the need for any other modifications of the application code. This constitutes a big advance in the ease of testing and evaluating different memory layout schemes in order to identify the best.

    References

    • 1. W. Hackbusch, Multi-Grid Methods and Applications (Springer-Verlag, 1985). Google Scholar
    • 2. U. Trottenberg, C. W. Osterlee and A. Schüller Multigrid (Academic Press, 2000). Google Scholar
    • 3. H. Köstler, M. Stürmer and T. Pohl, Performance engineering to achieve real-time high dynamic range imaging, J. Real-Time Image Processing 11(1) (2016) 127–139. Google Scholar
    • 4. M. Kowarschik, U. Rüde, C. Weiss and W. Karl, Cache-aware multigrid methods for solving Poisson’s equation in two dimensions, Computing 64(4) (2000) 381–399. Google Scholar
    • 5. C. Lengauer et al., ExaStencils: Advanced stencil-code engineering in Euro-Par 2014: Parallel Processing Workshops, Part II, eds. L. Lopeset al., Volume 8806 of Lecture Notes in Computer Science (Springer, 2014), pp. 553–564. Google Scholar
    • 6. C. Schmitt, S. Kuckuk, F. Hannig, H. Köstler and J. Teich, ExaSlang: A domain-specific language for highly scalable multigrid solvers, in Proc. 4th Int. Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC) (ACM, November 2014), pp. 42–51. Google Scholar
    • 7. C. Schmitt, S. Kuckuk, F. Hannig, J. Teich, H. Köstler, U. Rüde and C. Lengauer, Systems of partial differential equations in ExaSlang, in Software for Exascale Computing — SPPEXA 2013–2015, eds. H. J. BungartzP. NeumannW. E. Nagel, Volume 113 of Lecture Notes in Computational Science and Engineering (Springer, 2016), pp. 47–67. Google Scholar
    • 8. S. Verdoolaege, isl: An integer set library for the polyhedral model, in Mathematical Software (ICMS 2010), eds. K. FukudaJ. van der HoevenM. JoswigM. Takayama, Volume 6327, LNCS (Springer, 2010), pp. 299–302. Google Scholar
    • 9. P. Feautrier and C. Lengauer, Polyhedron model, in Encyclopedia of Parallel Computing, eds. D. Paduaet al. (Springer, September 2011), pp. 1581–1592. Google Scholar
    • 10. S. Kronawitter and C. Lengauer, Optimizations applied by the ExaStencils code generator. Technical Report MIP-1502, Faculty of Computer Science and Mathematics, University of Passau, January 2015, 10 pages. Google Scholar
    • 11. S. Kronawitter, S. Kuckuk and C. Lengauer, Redundancy elimination in the ExaStencils code generator, in Algorithms and Architectures for Parallel Processing (ICA3PP), Collocated Workshops, eds. J. Carreteroet al., Volume 10049 of Lecture Notes in Computer Science (Springer, 2016), pp. 159–173. First Int. Workshop on Data Locality in Modern Computing Systems (DLMCS). Google Scholar
    • 12. T. Grosser, A. Cohen, J. Holewinski, P. Sadayappan and S. Verdoolaege, Hybrid hexagonal/classical tiling for GPUs, in Proc. 12th Int. Symp. on Code Generation and Optimization (CGO) (ACM, 2014). Google Scholar
    • 13. B. K. P. Horn and B. G. Schunck, Determining optical flow, Artificial Intelligence 17(1–3) (1981) 185–203. ISIGoogle Scholar
    • 14. A. Bruhn, J. Weickert, C. Feddern, T. Kohlberger and C. Schnorr, Variational optical flow computation in real time, IEEE Trans. on Image Processing (TIP), 14(5) (2005) 608–615. Google Scholar
    • 15. E. M. Kalmoun, H. Köstler and U. Rüde, 3D optical flow computation using a parallel variational multigrid scheme with application to cardiac C-arm CT motion, Image and Vision Computing 25(9) (2007) 1482–1494. Google Scholar
    • 16. M. F. P. O’Boyle and P. M. W. Knijnenburg, Nonsingular data transformations: Definition, validity, and applications, Int. J. Parallel Programming, 27(3) (June 1999) 131–159. Google Scholar
    • 17. P. Clauss and B. Meister, Automatic memory layout transformations to optimize spatial locality in parameterized loop nests, SIGARCH Computer Architecture News 28:11–19, March 2000. Google Scholar
    • 18. H. C. Edwards, C. R. Trott and D. Sunderland, Kokkos: Enabling manycore performance portability through polymorphic memory access patterns, J. Parallel and Distributed Computing, 74(12) (2014) 3202–3216. Google Scholar
    • 19. C. Yount, J. Tobin, A. Breuer and A. Duran, YASK — Yet another stencil kernel: A framework for HPC stencil code-generation and tuning, in Proc. 6th Int. Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC) (IEEE Press, 2016), pp. 30–39. Google Scholar
    • 20. G. Chen, M. Kandemir and M. Karakoy, A constraint network-based approach to memory layout optimization, in Proc. Conf. on Design, Automation and Test in Europe (DATE), Vol. 2 (IEEE Computer Society, 2005), pp. 1156–1161. Google Scholar
    • 21. S. A. Leung, Array Restructuring for Cache Locality. PhD thesis, University of Washington, Department of Computer Science & Engineering, 1996. Google Scholar
    • 22. Q. Lu, X. Gao, S. Krishnamoorthy, G. Baumgartner, J. Ramanujam and P. Sadayappan, Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions, J. Parallel and Distributed Computing, 72(3) (2012) 338–352. Google Scholar
    • 23. P. Clauss, V. Loechner and B. Meister, Minimizing strides in loops with affine array references, in Proc. Compilers for Parallel Computers (CPC) (June 2001), 12 pp. Google Scholar
    • 24. V. Loechner, B. Meister and P. Clauss, Precise data locality optimization of nested loops, J. Supercomputing, 21 (January 2002) 37–76. ISIGoogle Scholar
    • 25. O. Ozturk, Data locality and parallelism optimization using a constraint-based approach, J. Parallel and Distributed Computing, 71(2) (2011) 280–287. Google Scholar