World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Benefits of Topology Aware Mapping for Mesh Interconnects

    The fastest supercomputers today such as Blue Gene/L, Blue Gene/P, Cray XT3 and XT4 are connected by a three-dimensional torus/mesh interconnect. Applications running on these machines can benefit from topology-awareness while mapping tasks to processors at runtime. By co-locating communicating tasks on nearby processors, the distance traveled by messages and hence the communication traffic can be minimized, thereby reducing communication latency and contention on the network. This paper describes preliminary work utilizing this technique and performance improvements resulting from it in the context of a n-dimensional k-point stencil program. It shows that even for simple benchmarks, topology-aware mapping can have a significant impact on performance. Automated topology-aware mapping by the runtime using similar ideas can relieve the application writer from this burden and result in better performance. Preliminary work towards achieving this for a molecular dynamics application, NAMD, is also presented. Results on up to 32,768 processors of IBM's Blue Gene/L, 4,096 processors of IBM's Blue Gene/P and 2,048 processors of Cray's XT3 support the ideas discussed in the paper.

    References

    • Deborah   Weisser et al. , Optimizing Job Placement on the Cray XT3 , 48th Cray User Group Proceedings ( 2006 ) . Google Scholar
    • M.Blumrich, D.Chen, P.Coteus, A.Gara, M.Giampapa, P.Heidelberger, S.Singh, B.Steinmacher-Burow, T.Takken, and P.Vranas. Design and Analysis of the Blue Gene/L Torus Interconnection Network. IBM Research Report, December 2003 . Google Scholar
    • Tarun   Agarwal , Amit   Sharma and Laxmikant V.   Kalé , Topology-aware task mapping for reducing communication contention on large parallel machines , Proceedings of IEEE International Parallel and Distributed Processing Symposium 2006 ( 2006 ) . Google Scholar
    • Laxmikant V. Kaleet al., Petascale Computing: Algorithms and Applications, ed. D. Bader (Chapman & Hall, CRC Press, 2008) pp. 421–441. Google Scholar
    • L. V. Kale and Sanjeev Krishnan, Parallel Programming using C++, eds. Gregory V. Wilson and Paul Lu (MIT Press, 1996) pp. 175–213. Google Scholar
    • Klaus Schultenet al., Petascale Computing: Algorithms and Applications, ed. D. Bader (Chapman & Hall, CRC Press, 2008) pp. 165–181. Google Scholar
    • Abhinav   Bhatele et al. , Overcoming Scaling Challenges in Biomolecular Simulations across Multiple Platforms , Proceedings of IEEE International Parallel and Distributed Processing Symposium 2008 ( 2008 ) . Google Scholar
    • Shahid H. Bokhari, IEEE Trans. Computers 30(3), 207 (1981). ISIGoogle Scholar
    • S. Wayne Bollinger and Scott F. Midkiff, Processor and link assignment in multicomputers using simulated annealing, ICPP (1) (1988) pp. 1–7. Google Scholar
    • Soo-Young Lee and J. K. Aggarwal, IEEE Trans. Computers 36(4), 433 (1987). ISIGoogle Scholar
    • P. Sadayappan and F. Ercal, IEEE Trans. Computers 36(12), 1408 (1987). ISIGoogle Scholar
    • F. Ercal, J. Ramanujam and P. Sadayappan, Task allocation onto a hypercube by recursive mincut bipartitioning, Proceedings of the 3rd conference on Hypercube concurrent computers and applications (ACM Press, 1988) pp. 210–221. Google Scholar
    • S. Wayne Bollinger and Scott F. Midkiff, IEEE Trans. Comput. 40(3), 325 (1991), DOI: 10.1109/12.76410. Crossref, ISIGoogle Scholar
    • Francine Berman and Lawrence Snyder, Journal of Parallel and Distributed Computing 4(5), 439 (1987), DOI: 10.1016/0743-7315(87)90018-9. Crossref, ISIGoogle Scholar
    • N. Mansouret al., Graph contraction for physical optimization methods: a quality-cost tradeoff for mapping data on parallel computers, ICS '93: Proceedings of the 7th international conference on Supercomputing (ACM, 1993) pp. 1–10. Google Scholar
    • S. Arunkumar and T. Chockalingam, International Journal of High Speed Computing (IJHSC) 4(4), 289 (1992), DOI: 10.1142/S0129053392000134. Link, ISIGoogle Scholar
    • M.   Muller and Michael   Resch , Pe mapping and the congestion problem in the t3e , Proceedings of the Fourth European Cray-SGI MPP Workshop ( 1998 ) . Google Scholar
    • Eduardo Huedoet al., Impact of pe mapping on cray t3e message-passing performance, Euro-Par '00: Proceedings from the 6th International Euro-Par Conference on Parallel Processing (Springer-Verlag, 2000) pp. 199–207. Google Scholar
    • Thierry Cornu and Michel Pahud, Contention in the cray t3d communication network, Euro-Par '96: Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II (Springer-Verlag, 1996) pp. 689–696. Google Scholar
    • George Almasiet al., Unlocking the Performance of the Blue Gene/L Supercomputer, SC '04: Proceedings of the 2004 ACM/IEEE conference on Super computing (IEEE Computer Society, 2004) p. 57. Google Scholar
    • Kei Daviset al., A Performance and Scalability Analysis of the Blue Gene/L Architecture, SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing (IEEE Computer Society, 2004) p. 41. Google Scholar
    • G. Bhanotet al., IBM Journal of Research and Development 49(3), 489 (2005). Crossref, ISIGoogle Scholar
    • Hao Yu, I-Hsin Chung and Jose Moreira, Topology mapping for Blue Gene/L supercomputer, SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing (ACM, 2006) p. 116. Google Scholar
    • Brian E. Smith and Brett Bode, Performance Effects of Node Mappings on the IBM Blue Gene/L Machine, Euro-Par (2005) pp. 1005–1013. Google Scholar
    • Eric Bohmet al., IBM Journal of Research and Development: Applications of Massively Parallel Systems 52(2), 159 (2008). Crossref, ISIGoogle Scholar
    • Sameer Kumaret al., IBM Journal of Research and Development: Applications of Massively Parallel Systems 52(2), 177 (2008). Crossref, ISIGoogle Scholar
    • Abhinav Bhatele. Application specific topology aware mapping and load balancing for three dimensional torus topologies. Master's thesis, Dept. of Computer Science, University of Illinois, 2007. http://charm.cs.uiuc.edu/papers/BhateleMSThesis07.shtml . Google Scholar
    • N. R. Adiga, G. Almasi,, Y. Aridor, R. Barik, D. Beece, R. Bellofatto, G. Bhanot, R. Bickford, M. Blumrich, and A. A. Bright. An Overview of the Blue Gene/L Supercomputer. In Supercomputing 2002 Technical Papers, Baltimore, Maryland, 2002. The Blue Gene/L Team, IBM and Lawrence Livermore National Laboratory . Google Scholar
    • Michael   Blocksome et al. , Design and Implementation of a One-Sided Communication Interface for the IBM eServer Blue Gene Supercomputer , SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing ( 2006 ) . Google Scholar
    • Jeffrey S.   Vetter et al. , Early evaluation of the cray xt3 , Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS) ( 2006 ) . Google Scholar
    • Laxmikant V. Kaleet al., Future Generation Computer Systems Special Issue on: Large-Scale System Performance Modeling and Analysis 22, 347 (2006). ISIGoogle Scholar
    • C. Catlett et al. , UPC and Grids in Action , ed. Lucio   Grandinetti ( IOS Press , Amsterdam , 2007 ) . Google Scholar