World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.


    In the early years of parallel computing research, significant theoretical studies were done on interconnect topologies and topology aware mapping for parallel computers. With the deployment of virtual cut-through, wormhole routing and faster interconnects, message latencies reduced and research in the area died down. This article shows that network topology has become important again with the emergence of very large supercomputers, typically connected as a 3D torus or mesh. It presents a quantitative study on the effect of contention on message latencies on torus and mesh networks.

    Several MPI benchmarks are used to evaluate the effect of hops (links) traversed by messages, on their latencies. The benchmarks demonstrate that when multiple messages compete for network resources, link occupancy or contention can increase message latencies by up to a factor of 8 times on some architectures. Results are shown for three parallel machines – ANL's IBM Blue Gene/P (Surveyor), RNL's Cray XT4 (Jaguar) and PSC's Cray XT3 (BigBen). Findings in this article suggest that application developers should now consider interconnect topologies when mapping tasks to processors in order to obtain the best performance on large parallel machines.


    • Shahid H. Bokhari, IEEE Trans. Computers 30(3), 207 (1981). ISIGoogle Scholar
    • S. Wayne Bollinger and Scott F. Midkiff, ICPP (1) 1 (1988). Google Scholar
    • P. Sadayappan and F. Ercal, IEEE Trans. Computers 36(12), 1408 (1987). ISIGoogle Scholar
    • Dilip D. Kandlur and Kang G. Shin, IEEE Trans. Comput. 41(10), 1257 (1992), DOI: 10.1109/12.166603. Crossref, ISIGoogle Scholar
    • Lionel M. Ni and Philip K. McKinley, Computer 26(2), 62 (1993), DOI: 10.1109/2.191995. Crossref, ISIGoogle Scholar
    • Abhinav   Bhatele et al. , Overcoming Scaling Challenges in Biomolecular Simulations across Multiple Platforms , Proceedings of IEEE International Parallel and Distributed Processing Symposium 2008 ( 2008 ) . Google Scholar
    • Eric Bohmet al., IBM Journal of Research and Development: Applications of Massively Parallel Systems 52(2), 159 (2008). Crossref, ISIGoogle Scholar
    • Thierry Cornu and Michel Pahud, Contention in the Cray T3D Communication Network, Euro-Par '96: Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II (Springer-Verlag, London, UK, 1996) pp. 689–696. Google Scholar
    • M.   Muller and Michael   Resch , Pe mapping and the congestion problem in the t3e , Proceedings of the Fourth European Cray-SGI MPP Workshop ( 1998 ) . Google Scholar
    • Eduardo Huedoet al., Impact of pe mapping on cray t3e message-passing performance, Euro-Par '00: Proceedings from the 6th International Euro-Par Conference on Parallel Processing (Springer-Verlag, London, UK, 2000) pp. 199–207. Google Scholar
    • N. R. Adigaet al., IBM Journal of Research and Development 49(2/3), (2005). Google Scholar
    • IBM System Blue Gene Solution. Blue Gene/P Application Development Redbook, 2008. . Google Scholar
    • Francois   Gygi et al. , Large-Scale Electronic Structure Calculations of High-Z Metals on the Blue Gene/L Platform , Proceedings of the International Conference in Supercomputing ( ACM Press , 2006 ) . Google Scholar
    • Abhinav   Bhatelé , Laxmikant V.   Kalé and Sameer   Kumar , Dynamic Topology Aware Load Balancing Algorithms for Molecular Dynamics Applications , 23rd ACM International Conference on Supercomputing ( 2009 ) . Google Scholar
    • Abhinav Bhatelé and Laxmikant V. Kalé, Parallel Processing Letters (Special issue on Large-Scale Parallel Processing) 18(4), 549 (2008). Google Scholar
    • Abhinav Bhatele, Eric Bohm and Laxmikant V. Kale, Euro-Par 2009, LNCS 5704 (2009) pp. 1015–1028. Google Scholar
    • Adolfy Hoisieet al., A performance comparison through benchmarking and modeling of three leading supercomputers: Blue gene/l, red storm, and purple, SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing (2006) p. 74. Google Scholar
    • Jack Dongarra and P Luszczek. Introduction to the HPC Challenge Benchmark Suite. Technical Report UT-CS-05-544, University of Tennessee, Dept. of Computer Science, 2005 . Google Scholar
    • IBM Journal of Research and Development 52(2), (2008). Google Scholar
    • Cray Inc. Scalable Computing at Work: Cray XT4 Datasheet, 2006. . Google Scholar
    • Claude Bernardet al., Physical Review D 61, (2000). Google Scholar
    • Mark A. Taylor, Susan Kurien and Gregory L. Eyink, Physical Review E 68, (2003), DOI: 10.1103/PhysRevE.68.026310. Google Scholar
    • T. Hoefler, T. Schneider and A. Lumsdaine, Multistage switches are not crossbars: Effects of static routing in high-performance networks, Cluster Computing, 2008 IEEE International Conference on (2008) pp. 116–125. Google Scholar
    • C. Catlett et al. , HPC and Grids in Action , ed. Lucio   Grandinetti ( IOS Press , Amsterdam , 2007 ) . Google Scholar