World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

LARGE-SCALE PERFORMANCE ANALYSIS OF SWEEP3D WITH THE SCALASCA TOOLSET

    Cray XT and IBM Blue Gene systems present current alternative approaches to constructing leadership computer systems relying on applications being able to exploit very large configurations of processor cores, and associated analysis tools must also scale commensurately to isolate and quantify performance issues that manifest at the largest scales. In studying the scalability of the Scalasca performance analysis toolset to several hundred thousand MPI processes on XT5 and BG/P systems, we investigated a progressive execution performance deterioration of the well-known ASCI Sweep3D compact application. Scalasca runtime summarization analysis quantified MPI communication time that correlated with computational imbalance, and automated trace analysis confirmed growing amounts of MPI waiting times. Further instrumentation, measurement and analyses pinpointed a conditional section of highly imbalanced computation which amplified waiting times inherent in the associated wavefront communication that seriously degraded overall execution efficiency at very large scales. By employing effective data collation, management and graphical presentation, in a portable and straightforward to use toolset, Scalasca was thereby able to demonstrate performance measurements and analyses with 294,912 processes.

    References

    • Jülich Supercomputing Centre, Jülich, Germany, "Scalasca toolset for scalable performance analysis of large-scale parallel applications" , http://www.scalasca.org/ . Google Scholar
    • M. Geimeret al., Concurrency and Computation: Practice and Experience 22(6), 702 (2010). ISIGoogle Scholar
    • F. Wolfet al., Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications, Proc. 2nd HLRS Parallel Tools Workshop (Springer, 2008) pp. 157–167. Google Scholar
    • B. J. N. Wylie, M. Geimer and F. Wolf, Journal of Scientific Programming 16(2–3), 167 (2008). Crossref, ISIGoogle Scholar
    • Cray Inc., "Cray XT," , http://www.cray.com/Products/XT/Systems . Google Scholar
    • IBM Blue Gene teamIBM Journal of Research and Development 52(1/2), 199 (2008), DOI: 10.1147/rd.521.0199. Crossref, ISIGoogle Scholar
    • Los Alamos National Laboratory, Los Alamos, NM, USA, "ASCI SWEEP3D v2.2b: Three-dimensional discrete ordinates neutron transport benchmark," 1995 , http://wwwc3.lanl.gov/pal/software/sweep3d/ . Google Scholar
    • A. Hoisie, O. M. Lubeck and H. J. Wasserman, Performance analysis of wavefront algorithms on very-large scale distributed systems, Proc. Workshop on Wide Area Networks and High Performance Computing, ser. Lecture Notes in Control and Information Sciences 249 (Springer, 1999) pp. 171–187. Google Scholar
    • A. Hoisie, O. Lubeck and H. Wasserman, Int'l. J. of High Performance Computing Applications 14, 330 (2000), DOI: 10.1177/109434200001400405. Crossref, ISIGoogle Scholar
    • D. H.   Ahn and J. S.   Vetter , Scalable analysis techniques for microprocessor performance counter metrics , Proc. ACM/IEEE SC 2002 conference ( IEEE Computer Society Press , 2002 ) . Google Scholar
    • K.   Davis et al. , A performance and scalability analysis of the Blue Gene/L architecture , Proc. ACM/IEEE SC 2004 conference ( IEEE Computer Society Press , 2004 ) . Google Scholar
    • M. M. Mathis and D. J. Kerbyson, Journal of Supercomputing 34(2), 181 (2005), DOI: 10.1007/s11227-005-2339-8. Crossref, ISIGoogle Scholar
    • A.   Hoisie et al. , A performance comparison through benchmarking and modeling of three leading supercomputers: Blue Gene/L, Red Storm, and Purple , Proc. ACM/IEEE SC 2006 conference ( IEEE Computer Society Press , 2006 ) . Google Scholar
    • A. Bordelon, "Developing a scalable, extensible parallel performance analysis toolkit," Master's thesis, Rice University, Houston, TX, USA, Apr. 2007 . Google Scholar
    • W.   Frings , F.   Wolf and V.   Petkov , Scalable massively parallel I/O to task-local files , Proc. 21st ACM/IEEE SC 2009 conference ( IEEE Computer Society Press , 2009 ) . Google Scholar
    • M.   Geimer et al. , Further improving the scalability of the Scalasca toolset , Proc. PARA 2010 . Google Scholar
    • Z.   Szebenyi , F.   Wolf and B. J. N.   Wylie , Space-efficient time-series call-path profiling of parallel applications , Proc. 21st ACM/IEEE SC 2009 conference ( IEEE Computer Society Press , 2009 ) . Google Scholar
    • D.   Böhme et al. , Identifying the root causes of wait states in large-scale parallel applications , Proc. 39th Int'l Conf. on Parallel Processing (ICPP ( 2010 ) . Google Scholar
    • B. Mohr, B. J. N. Wylie and F. Wolf, Concurrency and Computation: Practice and Experience 22, (2010), DOI: 10.1002/cpe.1585. Google Scholar