QUANTIFYING NETWORK CONTENTION ON LARGE PARALLEL MACHINES
Abstract
In the early years of parallel computing research, significant theoretical studies were done on interconnect topologies and topology aware mapping for parallel computers. With the deployment of virtual cut-through, wormhole routing and faster interconnects, message latencies reduced and research in the area died down. This article shows that network topology has become important again with the emergence of very large supercomputers, typically connected as a 3D torus or mesh. It presents a quantitative study on the effect of contention on message latencies on torus and mesh networks.
Several MPI benchmarks are used to evaluate the effect of hops (links) traversed by messages, on their latencies. The benchmarks demonstrate that when multiple messages compete for network resources, link occupancy or contention can increase message latencies by up to a factor of 8 times on some architectures. Results are shown for three parallel machines – ANL's IBM Blue Gene/P (Surveyor), RNL's Cray XT4 (Jaguar) and PSC's Cray XT3 (BigBen). Findings in this article suggest that application developers should now consider interconnect topologies when mapping tasks to processors in order to obtain the best performance on large parallel machines.
References
- IEEE Trans. Computers 30(3), 207 (1981). ISI, Google Scholar
- ICPP (1) 1 (1988). Google Scholar
- IEEE Trans. Computers 36(12), 1408 (1987). ISI, Google Scholar
- IEEE Trans. Comput. 41(10), 1257 (1992), DOI: 10.1109/12.166603. Crossref, ISI, Google Scholar
- Computer 26(2), 62 (1993), DOI: 10.1109/2.191995. Crossref, ISI, Google Scholar
-
Abhinav Bhatele , Overcoming Scaling Challenges in Biomolecular Simulations across Multiple Platforms , Proceedings of IEEE International Parallel and Distributed Processing Symposium 2008 ( 2008 ) . Google Scholar - IBM Journal of Research and Development: Applications of Massively Parallel Systems 52(2), 159 (2008). Crossref, ISI, Google Scholar
Thierry Cornu and Michel Pahud , Contention in the Cray T3D Communication Network, Euro-Par '96: Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II (Springer-Verlag, London, UK, 1996) pp. 689–696. Google Scholar-
M. Muller and Michael Resch , Pe mapping and the congestion problem in the t3e , Proceedings of the Fourth European Cray-SGI MPP Workshop ( 1998 ) . Google Scholar Eduardo Huedo , Impact of pe mapping on cray t3e message-passing performance, Euro-Par '00: Proceedings from the 6th International Euro-Par Conference on Parallel Processing (Springer-Verlag, London, UK, 2000) pp. 199–207. Google Scholar- IBM Journal of Research and Development 49(2/3), (2005). Google Scholar
- IBM System Blue Gene Solution. Blue Gene/P Application Development Redbook, 2008. http://www.redbooks.ibm.com/abstracts/sg247287.html . Google Scholar
-
Francois Gygi , Large-Scale Electronic Structure Calculations of High-Z Metals on the Blue Gene/L Platform , Proceedings of the International Conference in Supercomputing ( ACM Press , 2006 ) . Google Scholar -
Abhinav Bhatelé , Laxmikant V. Kalé and Sameer Kumar , Dynamic Topology Aware Load Balancing Algorithms for Molecular Dynamics Applications , 23rd ACM International Conference on Supercomputing ( 2009 ) . Google Scholar - Parallel Processing Letters (Special issue on Large-Scale Parallel Processing) 18(4), 549 (2008). Google Scholar
Abhinav Bhatele , Eric Bohm and Laxmikant V. Kale , Euro-Par 2009,LNCS 5704 (2009) pp. 1015–1028. Google ScholarAdolfy Hoisie , A performance comparison through benchmarking and modeling of three leading supercomputers: Blue gene/l, red storm, and purple, SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing (2006) p. 74. Google Scholar- Jack Dongarra and P Luszczek. Introduction to the HPC Challenge Benchmark Suite. Technical Report UT-CS-05-544, University of Tennessee, Dept. of Computer Science, 2005 . Google Scholar
- IBM Journal of Research and Development 52(2), (2008). Google Scholar
- Cray Inc. Scalable Computing at Work: Cray XT4 Datasheet, 2006. www.cray.com/downloads/Cray_XT4_Datasheet.pdf . Google Scholar
- Physical Review D 61, (2000). Google Scholar
- Physical Review E 68, (2003), DOI: 10.1103/PhysRevE.68.026310. Google Scholar
T. Hoefler , T. Schneider and A. Lumsdaine , Multistage switches are not crossbars: Effects of static routing in high-performance networks, Cluster Computing, 2008 IEEE International Conference on (2008) pp. 116–125. Google Scholar- , HPC and Grids in Action , ed.
Lucio Grandinetti ( IOS Press , Amsterdam , 2007 ) . Google Scholar


