Fast Approximate Evaluation of Parallel Overhead from a Minimal Set of Measured Execution Times
Abstract
Porting scientific key algorithms to HPC architectures requires a thorough understanding of the subtle balance between gain in performance and introduced overhead. Here we continue the development of our recently proposed technique that uses plain execution times to predict the extent of parallel overhead. The focus here is on an analytic solution that takes into account as many data points as there are unknowns, i.e. model parameters. A test set of 9 applications frequently used in scientific computing can be well described by the suggested model even including atypical cases that were originally not considered part of the development. However, the choice about which particular set of explicit data points will lead to an optimal prediction cannot be made a-priori.
References
- 1. M. Seager, Metallic Liquid Hydrogen, https://str.llnl.gov/str/JanFeb05/Seager.html, 2005. Google Scholar
- 2. T. Whyntie and J. Coles, Number-crunching Higgs boson: Meet the world’s largest distributed computer grid, http://theconversation.com/number-crunching-higgs-boson-meet-the-worlds-largest-distributed-computer-grid-38696, 2015. Google Scholar
- 3. E. Seidel, Supercomputing Gravitational Waves at NCSA, http://insidehpc.com/2016/06/second-recorded-gravitational-wave-event-analysed-using-illinois-supercomputer, 2016. Google Scholar
- 4. J. Craig Venter et al., The Sequence of the Human Genome, http://science.sciencemag.org/content/291/5507/1304, 2016. Google Scholar
- 5. P. Vranas, T. Luu and R. Soltz, Quark Theory and Today’s Supercomputers: It’s a Match, https://str.llnl.gov/str/JanFeb08/soltz.html, 2008. Google Scholar
- 6. , Distributed Monitoring and Management of Exascale Systems in the Argo Project, Volume 9038 (Springer, 2015), pages 173–178. Crossref, Google Scholar
- 7. , Roofline: An insightful visual performance model for multicore architectures, Commun. ACM 52(4) :65–76, 2009. Crossref, ISI, Google Scholar
- 8. MPI: A Message-Passing Interface Standard Version 3.1, High Performance Computing Center Stuttgart (HLRS), Stuttgart, 2015. Google Scholar
- 9. , A low-communication, parallel algorithm for solving PDEs based on range decomposition, Numer. Linear Algebr. 2016. ISI, Google Scholar
- 10. , Introduction to Parallel Computing (Addison Wesley, 2003), 2nd edition. Google Scholar
- 11. ,
Scalasca , in Entwicklung und Evolution von Forschungssoftware, Rolduc, November 2011, Volume 14 ofAachener Informatik-Berichte, Software Engineering , Shaker, 2012, pages 43–48. Google Scholar - 12. J. Vetter and C. Chambreau, mpiP: Lightweight, Scalable MPI Profiling, version 3.4.1, http://mpip.sourceforge.net, 2014. Google Scholar
- 13. Allinea Software Ltd., MAP, release 6.0.6, http://www.allinea.com/products/map, 2016. Google Scholar
- 14. , LogP: Towards a realistic model of parallel computation, in PPOPP ’93 Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, 1993, pages 1–12. Google Scholar
- 15. , Fast measurement of LogP parameters for message passing platforms, in IPDPS ’00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing, Springer, UK, 2000, pages 1176–1183. Google Scholar
- 16. , Performance modeling and evaluation of MPI, J. Parallel Distrib. Comput. 61 :202–223, 2001. Crossref, ISI, Google Scholar
- 17. , LogGP: Incorporating long messages into the LogP model — One step closer towards a realistic model for parallel computation, in Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures, ACM, 1995, pages 95–105. Google Scholar
- 18. , LogGPC: Modeling network contention in message-passing programs, IEEE Trans. Parallel Distrib. Syst. 12(4) :404–415, 2001. Crossref, ISI, Google Scholar
- 19. , A framework for performance modeling and prediction, in SC ’02 : Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, IEEE, 2002. Google Scholar
- 20. , Performance modeling of communication and computation in hybrid MPI and OpenMP applications, Simul. Model. Pract. Theory 15(4) :481–491, 2007. Crossref, ISI, Google Scholar
- 21. , Using Performance modeling to design large-scale systems, Computer 42(11) :42–49, 2009. Crossref, ISI, Google Scholar
- 22. , A source code analyzer for performance prediction, in 18th International Parallel and Distributed Processing Symposium 2004, IEEE, 2004. Google Scholar
- 23. , Modelling parallel overhead from simple run-time records, J. Supercomput. 2017. Crossref, ISI, Google Scholar
- 24. , Validity of the single processor approach to achieving large scale computing capabilities, in Proceedings AFIPS ’67 (Spring) Joint Computer Conference, ACM, 1967, pages 483–485. Google Scholar
- 25. , A First Course in Computational Physics (John Wiley & Sons, 1994). Google Scholar
- 26. Vienna Scientific Cluster, generation 3, http://vsc.ac.at/systems/vsc-3, 2015. Google Scholar
- 27. Green Revolution Cooling, http://www.grcooling.com, 2015. Google Scholar
- 28. , WIEN2k: An Augmented Plane Wave + Local Orbitals Program for Calculating Crystal Properties (Institute of Materials Chemistry, Getreidemarkt 9/165-TC, A-1060 Vienna, Austria, 2016), 16.1 edition. Google Scholar
- 29. , NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations, Comput. Phys. Commun. 181(9) :1477–1489, 2010. Crossref, ISI, Google Scholar
- 30. , Evaluation of the partitioned global address space PGAS model for an inviscid Euler solver, Parallel. Comput. 60 :22–40, 2016. Crossref, ISI, Google Scholar
- 31. , PGAS10, in Hybrid PGAS Runtime Support for Multicore Nodes, 4th Conference on Partitioned Global Address Space Programming Models, 2010. Google Scholar
- 32. , PGAS11, in UPC Collectives Library 2.0, 5th Conference on Partitioned Global Address Space Programming Models, 2011. Google Scholar
- 33. , Dash: Data structures and algorithms with support for hierarchical locality, in Euro-Par 2014, 2014. Google Scholar
- 34. , Advances, applications and performance of the global arrays shared memory programming toolkit, Int. J. High Perform. Comput. Appl. 20(2) :203–231, 2006. Crossref, ISI, Google Scholar
- 35. H. B. Bröker, J. Campbell, R. Cunningham, D. Denholm, G. Elber, R. Fearick, C. Grammes, L. Hart, L. Heckingu, P. Juhász, T. Koenig, D. Kotz, E. Kubaitis, R. Lang, T. Lecomte, A. Lehmann, A. Mai, B. Märkisch, E. A. Merritt, P. Mikulík, C. Steger, S. Takeno, T. Tkacik, J. V. der Woude, J. R. V. Zandt, A. Woo, and J. Zellner, gnuplot 4.6 — An Interactive Plotting Program, 2014. Google Scholar
- 36. A. Petitet, C. Whaley, J. Dongarra, A. Cleary and P. Luszczek, HPL 2.1 – A portable implementation of the high-performance Linpack benchmark for distributed-memory computers, http://www.netlib.org/benchmark/hpl/hpl-2.1.tar.gz, 2012. Google Scholar
- 37. , GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers, SoftwareX, 1-2 :19–25, 2015. Crossref, Google Scholar
- 38. , The amber biomolecular simulation programs, J. Comput. Chem. 26(16) :1668–1688, 2005. Crossref, ISI, Google Scholar
- 39. , Performance analysis and derived parallelization strategy for a SCF program at the Hartree Fock level, Lect. Notes Comp. Sci. 1557 :163–172, 1999. Crossref, Google Scholar
- 40. , Detecting secondary bottlenecks in parallel quantum chemistry applications using MPI, Int. J. Mod. Phys. C 19(1) :1–13, 2008. Link, ISI, Google Scholar
- 41. , Ab initio molecular dynamics for liquid metals, Phys. Rev. B 47(1) :558–561, 1993. Crossref, ISI, Google Scholar
- 42. , QUANTUM ESPRESSO: A modular and open-source software project for quantum simulations of materials, J. Phys.: Condens. Matter 21(39) :395502, 2009. Crossref, ISI, Google Scholar
- 43. , Fast parallel algorithms for short-range molecular dynamics, J. Comput. Phys. 117(1) :1–19, 1995. Crossref, ISI, Google Scholar


