LARGE-SCALE PERFORMANCE ANALYSIS OF SWEEP3D WITH THE SCALASCA TOOLSET
Abstract
Cray XT and IBM Blue Gene systems present current alternative approaches to constructing leadership computer systems relying on applications being able to exploit very large configurations of processor cores, and associated analysis tools must also scale commensurately to isolate and quantify performance issues that manifest at the largest scales. In studying the scalability of the Scalasca performance analysis toolset to several hundred thousand MPI processes on XT5 and BG/P systems, we investigated a progressive execution performance deterioration of the well-known ASCI Sweep3D compact application. Scalasca runtime summarization analysis quantified MPI communication time that correlated with computational imbalance, and automated trace analysis confirmed growing amounts of MPI waiting times. Further instrumentation, measurement and analyses pinpointed a conditional section of highly imbalanced computation which amplified waiting times inherent in the associated wavefront communication that seriously degraded overall execution efficiency at very large scales. By employing effective data collation, management and graphical presentation, in a portable and straightforward to use toolset, Scalasca was thereby able to demonstrate performance measurements and analyses with 294,912 processes.
References
- Jülich Supercomputing Centre, Jülich, Germany, "Scalasca toolset for scalable performance analysis of large-scale parallel applications" , http://www.scalasca.org/ . Google Scholar
- Concurrency and Computation: Practice and Experience 22(6), 702 (2010). ISI, Google Scholar
F. Wolf , Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications, Proc. 2nd HLRS Parallel Tools Workshop (Springer, 2008) pp. 157–167. Google Scholar- Journal of Scientific Programming 16(2–3), 167 (2008). Crossref, ISI, Google Scholar
- Cray Inc., "Cray XT," , http://www.cray.com/Products/XT/Systems . Google Scholar
- IBM Journal of Research and Development 52(1/2), 199 (2008), DOI: 10.1147/rd.521.0199. Crossref, ISI, Google Scholar
- Los Alamos National Laboratory, Los Alamos, NM, USA, "ASCI SWEEP3D v2.2b: Three-dimensional discrete ordinates neutron transport benchmark," 1995 , http://wwwc3.lanl.gov/pal/software/sweep3d/ . Google Scholar
A. Hoisie , O. M. Lubeck and H. J. Wasserman , Performance analysis of wavefront algorithms on very-large scale distributed systems, Proc. Workshop on Wide Area Networks and High Performance Computing,ser. Lecture Notes in Control and Information Sciences 249 (Springer, 1999) pp. 171–187. Google Scholar- Int'l. J. of High Performance Computing Applications 14, 330 (2000), DOI: 10.1177/109434200001400405. Crossref, ISI, Google Scholar
-
D. H. Ahn and J. S. Vetter , Scalable analysis techniques for microprocessor performance counter metrics , Proc. ACM/IEEE SC 2002 conference ( IEEE Computer Society Press , 2002 ) . Google Scholar -
K. Davis , A performance and scalability analysis of the Blue Gene/L architecture , Proc. ACM/IEEE SC 2004 conference ( IEEE Computer Society Press , 2004 ) . Google Scholar - Journal of Supercomputing 34(2), 181 (2005), DOI: 10.1007/s11227-005-2339-8. Crossref, ISI, Google Scholar
-
A. Hoisie , A performance comparison through benchmarking and modeling of three leading supercomputers: Blue Gene/L, Red Storm, and Purple , Proc. ACM/IEEE SC 2006 conference ( IEEE Computer Society Press , 2006 ) . Google Scholar - A. Bordelon, "Developing a scalable, extensible parallel performance analysis toolkit," Master's thesis, Rice University, Houston, TX, USA, Apr. 2007 . Google Scholar
-
W. Frings , F. Wolf and V. Petkov , Scalable massively parallel I/O to task-local files , Proc. 21st ACM/IEEE SC 2009 conference ( IEEE Computer Society Press , 2009 ) . Google Scholar -
M. Geimer , Further improving the scalability of the Scalasca toolset , Proc. PARA 2010 . Google Scholar -
Z. Szebenyi , F. Wolf and B. J. N. Wylie , Space-efficient time-series call-path profiling of parallel applications , Proc. 21st ACM/IEEE SC 2009 conference ( IEEE Computer Society Press , 2009 ) . Google Scholar -
D. Böhme , Identifying the root causes of wait states in large-scale parallel applications , Proc. 39th Int'l Conf. on Parallel Processing (ICPP ( 2010 ) . Google Scholar - Concurrency and Computation: Practice and Experience 22, (2010), DOI: 10.1002/cpe.1585. Google Scholar


