World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

HIERARCHICAL MAPPING FOR HPC APPLICATIONS

    As the high performance computing systems scale up, mapping the tasks of a parallel application onto physical processors to allow efficient communication becomes one of the critical performance issues. Existing algorithms were usually designed to map applications with regular communication patterns. Their mapping criterion usually overlooks the size of communicated messages, which is the primary factor of communication time. In addition, most of their time complexities are too high to process large scale problems.

    In this paper, we present a hierarchical mapping algorithm (HMA), which is capable of mapping applications with irregular communication patterns. It first partitions tasks according to their run-time communication information. The tasks that communicate with each other more frequently are regarded as strongly connected. Based on their connectivity strength, the tasks are partitioned into supernodes based on the algorithms in spectral graph theory. The hierarchical partitioning reduces the mapping algorithm complexity to achieve scalability. Finally, the run-time communication information will be used again in fine tuning to explore better mappings. With the experiments, we show how the mapping algorithm helps to reduce the point-to-point communication time for the PDGEMM, a ScaLAPACK matrix multiplication computation kernel, up to 20% and the AMG2006, a tier 1 application of the Sequoia benchmark, up to 7%.

    References

    • The AMG Benchmark , https://asc.llnl.gov/sequoia/benchmarks/#amg . Google Scholar
    • The Community Climate System Model (CCSM) , http://www.cesm.ucar.edu/models/ccsm4.0/ . Google Scholar
    • The Sequoia Benchmark , https://asc.llnl.gov/sequoia/benchmarks/ . Google Scholar
    • Space-Filling Curves. Springer-Verlag, 1994 . Google Scholar
    • Masood Ahmed and Shahid Bokhari, IEEE Trans. Parallel Distrib. Syst. 18, 1258 (2007). Crossref, ISIGoogle Scholar
    • Romas Aleliunas and Arnold L. Rosenberg, IEEE Transactions on Computers 31(9), 907 (1982). ISIGoogle Scholar
    • Zhaojun   Bai et al. , Templates for the Solution of Algebraic Eigenvalue Problems: a Practical Guide ( SIAM , 1987 ) . Google Scholar
    • Satish Balay, Kris Buschelman, William D. Gropp, Dinesh Kaushik, Matthew G. Knepley, Lois Curfman McInnes, Barry F. Smith, and Hong Zhang. PETSc Web page, 2001 . Google Scholar
    • G. Bhanotet al., IBM Journal of Research and Development 49(2), 489 (2005). Crossref, ISIGoogle Scholar
    • Abhinav Bhatelé, Eric Bohm and Laxmikant V. Kalé, A case study of communication optimizations on 3d mesh interconnects, Euro-Par '09: Proceedings of the 15th International Euro-Par Conference on Parallel Processing (Springer-Verlag, Berlin, Heidelberg, 2009) pp. 1015–1028. Google Scholar
    • Abhinav Bhatele, I-Hsin Chung, and Laxmikant V. Kale. Automated mapping of structured communication graphs onto mesh interconnects. 2010 . Google Scholar
    • L. S.   Blackford et al. , ScaLAPACK Users' Guide ( Society for Industrial and Applied Mathematics , Philadelphia, PA , 1997 ) . CrossrefGoogle Scholar
    • Thomas H.   Cormen et al. , Introduction to Algorithms ( McGraw-Hill , 1990 ) . Google Scholar
    • John A. Ellis, IEEE Trans. Comput. 40(1), 46 (1991). Crossref, ISIGoogle Scholar
    • M. Fiedler, Czechoslovak Mathematical Journal 23(98), (1973). Google Scholar
    • S. L.   Graham , M.   Snir and C. A.   Patterson (eds.) , Getting Up To Speed: The Future Of Supercomputing ( National Academies Press , Washington, D.C. , 2005 ) . Google Scholar
    • J.   Kleinberg and E.   Tardos , Algorithm Design ( Addison Wesley , 2005 ) . Google Scholar
    • Rami G. Melhem and Ghil-Young Hwang, IEEE Trans. Comput. 39(12), 1446 (1990). Crossref, ISIGoogle Scholar
    • Sangman   Moh et al. , Mapping strategies for switch-based cluster systems of irregular topology , 8th IEEE International Conference on Parallel and Distributed Systems . Google Scholar
    • Jesper Larsson Träff. Implementing the mpi process topology mechanism. In Supercomputing, pages 1–14, 2002 . Google Scholar
    • H. Wenet al., A productivity centered tools framework for application performance tuning, QEST '07: Proceedings of the Fourth International Conference on the Quantitative Evaluation of Systems (QEST 2007) (IEEE Computer Society, Washington, DC, USA, 2007) pp. 273–274. Google Scholar
    • R. Wu and Z. Leahy, IEEE Transactions on Pattern Analysis and Machine Intelligence 15(11), 1101 (1993). Crossref, ISIGoogle Scholar
    • Hao Yu, I-Hsin Chung and Jose Moreira, Topology mapping for blue gene/l supercomputer, SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing (ACM, New York, NY, USA, 2006) p. 116. Google Scholar