World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

LEVERAGING SHARED CACHES FOR PARALLEL TEMPORAL BLOCKING OF STENCIL CODES ON MULTICORE PROCESSORS AND CLUSTERS

    Bandwidth-starved multicore chips have become ubiquitous. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the pressure on the memory interface. We introduce a new pipelined approach that makes explicit use of shared caches in multicore environments and minimizes synchronization and boundary overhead. Benchmark results are presented for three current x86-based microprocessors, showing clearly that our optimization works best on designs with high-speed shared caches and low memory bandwidth per core. We furthermore demonstrate that simple bandwidth-based performance models are inaccurate for this kind of algorithm and employ a more elaborate, synthetic modeling procedure. Finally we show that temporal blocking can be employed successfully in a hybrid shared/distributed-memory environment, albeit with limited benefit at strong scaling.

    References

    • B.   Bergen , F.   Hülsemann and U.   Rüde , Is 1.7×1010 Unknowns the Largest Finite Element System that Can Be Solved Today? , Proceedings of the ACM/IEEE SC 2005 Conference (Supercomputing Conference '05 . Google Scholar
    • K. Datta et al. , Stencil Computation Optimization and Auto-tuning on State-of-the-art Multicore Architectures , Proc. SC2008 . Google Scholar
    • K. Dattaet al., SIAM Review 51(1), 129 (2009), DOI: 10.1137/070693199. Crossref, ISIGoogle Scholar
    • G.   Wellein et al. , Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization , Proc. COMPSAC 2009 . Google Scholar
    • J.   Treibig and G.   Hager , Introducing a Performance Model for Bandwidth Limited Loop Kernels , Workshop on Memory Issues on Multi- and Manycore Platforms, Proc. PPAM 2009 ,   http://arxiv.org/abs/0905.0792 . Google Scholar
    • D.   Wonnacott , Using Time Skewing to Eliminate Idle Time due to Memory Bandwidth and Network Limitations , Proc. IPDPS 2000 . Google Scholar
    • G.   Jin , J.   Mellor-Crummey and R.   Fowler , Increasing temporal locality with skewing and recursive blocking , Proc. SC2001 . Google Scholar
    • M. Kowarschik: Data Locality Optimizations for Iterative Numerical Algorithms and Cellular Automata on Hierarchical Memory Architectures. Dissertation, SCS Publishing House, Erlangen (2004) , http://www10.informatik.uni-erlangen.de/Publications/Dissertations/Diss_Kowarschik_2004.pdf . Google Scholar
    • M.   Frigo and V.   Strumpen , Cache oblivious stencil computations , Proc. ICS 2005 . Google Scholar
    • S.   Kamil et al. , Implicit and explicit optimizations for stencil computations , Proc. MSPC 2006 . Google Scholar
    • T. Zeiseret al., Progress in CFD 8(1–4), 179 (2008). Google Scholar
    • J. Treibig, G. Hager, G. Wellein: Multi-core architectures: Complexities of performance prediction and the impact of cache topology. Under review , http://arxiv.org/abs/0910.4865 . Google Scholar
    • C.   Ding and Y.   He , A ghost cell expansion method for reducing communications in solving PDE problems , Proc. SC2001 . Google Scholar
    • M. Wittmann: Potentials of temporal blocking for stencil-based computations on multi-core systems. Master's Thesis, University of Applied Sciences Nuremberg, March 2009 , http://www.hpc.rrze.uni-erlangen.de/Projekte/stencil.shtml . Google Scholar