World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Fast Approximate Evaluation of Parallel Overhead from a Minimal Set of Measured Execution Times

    Porting scientific key algorithms to HPC architectures requires a thorough understanding of the subtle balance between gain in performance and introduced overhead. Here we continue the development of our recently proposed technique that uses plain execution times to predict the extent of parallel overhead. The focus here is on an analytic solution that takes into account as many data points as there are unknowns, i.e. model parameters. A test set of 9 applications frequently used in scientific computing can be well described by the suggested model even including atypical cases that were originally not considered part of the development. However, the choice about which particular set of explicit data points will lead to an optimal prediction cannot be made a-priori.

    References

    • 1. M. Seager, Metallic Liquid Hydrogen, https://str.llnl.gov/str/JanFeb05/Seager.html, 2005. Google Scholar
    • 2. T. Whyntie and J. Coles, Number-crunching Higgs boson: Meet the world’s largest distributed computer grid, http://theconversation.com/number-crunching-higgs-boson-meet-the-worlds-largest-distributed-computer-grid-38696, 2015. Google Scholar
    • 3. E. Seidel, Supercomputing Gravitational Waves at NCSA, http://insidehpc.com/2016/06/second-recorded-gravitational-wave-event-analysed-using-illinois-supercomputer, 2016. Google Scholar
    • 4. J. Craig Venter et al., The Sequence of the Human Genome, http://science.sciencemag.org/content/291/5507/1304, 2016. Google Scholar
    • 5. P. Vranas, T. Luu and R. Soltz, Quark Theory and Today’s Supercomputers: It’s a Match, https://str.llnl.gov/str/JanFeb08/soltz.html, 2008. Google Scholar
    • 6. S.  Perarnau, R.  Thakur, K.  Iskra, K.  Raffenetti, F.  Cappello, R.  Gupta, P. H. Beckman, M.  Snir, H.  Hoffmann, M.  Schulz and B.  Rountree, Distributed Monitoring and Management of Exascale Systems in the Argo Project, Volume 9038 (Springer, 2015), pages 173–178. CrossrefGoogle Scholar
    • 7. S.  Williams, A.  Waterman and D.  Patterson, Roofline: An insightful visual performance model for multicore architectures, Commun. ACM 52(4) :65–76, 2009. Crossref, ISIGoogle Scholar
    • 8. MPI: A Message-Passing Interface Standard Version 3.1, High Performance Computing Center Stuttgart (HLRS), Stuttgart, 2015. Google Scholar
    • 9. D. J. Appelhans, T.  Manteuffel, S.  McCormick and J.  Ruge, A low-communication, parallel algorithm for solving PDEs based on range decomposition, Numer. Linear Algebr. 2016. ISIGoogle Scholar
    • 10. A.  Grama, A.  Gupta, G.  Karypis and V.  Kumar, Introduction to Parallel Computing (Addison Wesley, 2003), 2nd edition. Google Scholar
    • 11. D.  Böhme, M.-A. Hermanns and F.  Wolf, Scalasca, in Entwicklung und Evolution von Forschungssoftware, Rolduc, November 2011, Volume 14 of Aachener Informatik-Berichte, Software Engineering, Shaker, 2012, pages 43–48. Google Scholar
    • 12. J. Vetter and C. Chambreau, mpiP: Lightweight, Scalable MPI Profiling, version 3.4.1, http://mpip.sourceforge.net, 2014. Google Scholar
    • 13. Allinea Software Ltd., MAP, release 6.0.6, http://www.allinea.com/products/map, 2016. Google Scholar
    • 14. D.  Culler, R.  Karp, D.  Patterson, A.  Sahay, K. E.  Schauser, E.  Santos, R.  Subramonian and T.  von Eicken, LogP: Towards a realistic model of parallel computation, in PPOPP ’93 Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, 1993, pages 1–12. Google Scholar
    • 15. T.  Kielmann, H. E. Bal and K.  Verstoep, Fast measurement of LogP parameters for message passing platforms, in IPDPS ’00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing, Springer, UK, 2000, pages 1176–1183. Google Scholar
    • 16. K.  Al-Tawil and C. A. Moritz, Performance modeling and evaluation of MPI, J. Parallel Distrib. Comput. 61 :202–223, 2001. Crossref, ISIGoogle Scholar
    • 17. A.  Alexandrov, M. F. Ionescu, K. E. Schauser and C.  Scheiman, LogGP: Incorporating long messages into the LogP model — One step closer towards a realistic model for parallel computation, in Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures, ACM, 1995, pages 95–105. Google Scholar
    • 18. C. A. Moritz and M. I. Frank, LogGPC: Modeling network contention in message-passing programs, IEEE Trans. Parallel Distrib. Syst. 12(4) :404–415, 2001. Crossref, ISIGoogle Scholar
    • 19. A.  Snavely, L.  Carrington, N.  Wolter, J.  Labarta, R.  Badia and A.  Purkayastha, A framework for performance modeling and prediction, in SC ’02 : Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, IEEE, 2002. Google Scholar
    • 20. L.  Adhianto and B.  Chapman, Performance modeling of communication and computation in hybrid MPI and OpenMP applications, Simul. Model. Pract. Theory 15(4) :481–491, 2007. Crossref, ISIGoogle Scholar
    • 21. K. J. Barker, K. A. Hoisi, D. J. Kerbyson, M.  Lang, S.  Pakin and J. C. Sancho, Using Performance modeling to design large-scale systems, Computer 42(11) :42–49, 2009. Crossref, ISIGoogle Scholar
    • 22. M.  Kühnemann, T.  Rauber and G.  Rünger, A source code analyzer for performance prediction, in 18th International Parallel and Distributed Processing Symposium 2004, IEEE, 2004. Google Scholar
    • 23. S. Höfinger and E. Haunschmid, Modelling parallel overhead from simple run-time records, J. Supercomput. 2017. Crossref, ISIGoogle Scholar
    • 24. G. M. Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, in Proceedings AFIPS ’67 (Spring) Joint Computer Conference, ACM, 1967, pages 483–485. Google Scholar
    • 25. P. L. DeVries, A First Course in Computational Physics (John Wiley & Sons, 1994). Google Scholar
    • 26. Vienna Scientific Cluster, generation 3, http://vsc.ac.at/systems/vsc-3, 2015. Google Scholar
    • 27. Green Revolution Cooling, http://www.grcooling.com, 2015. Google Scholar
    • 28. P.  Blaha, K.  Schwarz, G.  Madsen, D.  Kvasnicka and J.  Luitz, WIEN2k: An Augmented Plane Wave + Local Orbitals Program for Calculating Crystal Properties (Institute of Materials Chemistry, Getreidemarkt 9/165-TC, A-1060 Vienna, Austria, 2016), 16.1 edition. Google Scholar
    • 29. M.  Valiev, E. J.  Bylaska, N.  Govind, K.  Kowalski, T. P.  Straatsma, H. J. J. van Dam, D.  Wang, J.  Nieplocha, E.  Apra, T. L. Windus and W. A. de Jong, NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations, Comput. Phys. Commun. 181(9) :1477–1489, 2010. Crossref, ISIGoogle Scholar
    • 30. M.  Prugger, L.  Einkemmer and A.  Ostermann, Evaluation of the partitioned global address space PGAS model for an inviscid Euler solver, Parallel. Comput. 60 :22–40, 2016. Crossref, ISIGoogle Scholar
    • 31. F.  Blagojevic, P.  Hargrove, C.  Iancu and K.  Yelick, PGAS10, in Hybrid PGAS Runtime Support for Multicore Nodes, 4th Conference on Partitioned Global Address Space Programming Models, 2010. Google Scholar
    • 32. G.  Almasi, P.  Hargrove, G.  Tanase and Y.  Zheng, PGAS11, in UPC Collectives Library 2.0, 5th Conference on Partitioned Global Address Space Programming Models, 2011. Google Scholar
    • 33. K.  Fürlinger, C.  Glass, J.  Gracia, A.  Knüpfer, J.  Tao, D.  Hünich, K.  Idrees, M.  Maiterth, Y.  Mbedheb and H.  Zhou, Dash: Data structures and algorithms with support for hierarchical locality, in Euro-Par 2014, 2014. Google Scholar
    • 34. J.  Nieplocha, B.  Palmer, V.  Tipparaju, M.  Krishnan, H.  Trease and E.  Apra, Advances, applications and performance of the global arrays shared memory programming toolkit, Int. J. High Perform. Comput. Appl. 20(2) :203–231, 2006. Crossref, ISIGoogle Scholar
    • 35. H. B. Bröker, J. Campbell, R. Cunningham, D. Denholm, G. Elber, R. Fearick, C. Grammes, L. Hart, L. Heckingu, P. Juhász, T. Koenig, D. Kotz, E. Kubaitis, R. Lang, T. Lecomte, A. Lehmann, A. Mai, B. Märkisch, E. A. Merritt, P. Mikulík, C. Steger, S. Takeno, T. Tkacik, J. V. der Woude, J. R. V. Zandt, A. Woo, and J. Zellner, gnuplot 4.6 — An Interactive Plotting Program, 2014. Google Scholar
    • 36. A. Petitet, C. Whaley, J. Dongarra, A. Cleary and P. Luszczek, HPL 2.1 – A portable implementation of the high-performance Linpack benchmark for distributed-memory computers, http://www.netlib.org/benchmark/hpl/hpl-2.1.tar.gz, 2012. Google Scholar
    • 37. M. J. Abraham, T.  Murtola, R.  Schulz, S.  Páll, J. C.  Smith, B.  Hess and E.  Lindahl, GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers, SoftwareX, 1-2 :19–25, 2015. CrossrefGoogle Scholar
    • 38. D. A. Case, T. E. Cheatham-III, T.  Darden, H.  Gohlke, R.  Luo, K. M. Merz, Jr., A.  Onufriev, C.  Simmerling, B.  Wang and R. J. Woods, The amber biomolecular simulation programs, J. Comput. Chem. 26(16) :1668–1688, 2005. Crossref, ISIGoogle Scholar
    • 39. S.  Höfinger, O.  Steinhauser and P.  Zinterhof, Performance analysis and derived parallelization strategy for a SCF program at the Hartree Fock level, Lect. Notes Comp. Sci. 1557 :163–172, 1999. CrossrefGoogle Scholar
    • 40. R.  Mahajan, D.  Kranzlmüller, J.  Volkert, U. H. Hansmann and S.  Höfinger, Detecting secondary bottlenecks in parallel quantum chemistry applications using MPI, Int. J. Mod. Phys. C 19(1) :1–13, 2008. Link, ISIGoogle Scholar
    • 41. G.  Kresse and J.  Hafner, Ab initio molecular dynamics for liquid metals, Phys. Rev. B 47(1) :558–561, 1993. Crossref, ISIGoogle Scholar
    • 42. P.  Giannozzi, S.  Baroni, N.  Bonini, M.  Calandra, R.  Car, C.  Cavazzoni, D.  Ceresoli, G. L. Chiarotti, M.  Cococcioni, I.  Dabo, A. D. Corso, S.  de Gironcoli, S.  Fabris, G.  Fratesi, R.  Gebauer, U.  Gerstmann, C.  Gougoussis, A.  Kokalj, M.  Lazzeri, L.  Martin-Samos, N.  Marzari, F.  Mauri, R.  Mazzarello, S.  Paolini, A.  Pasquarello, L.  Paulatto, C.  Sbraccia, S.  Scandolo, G.  Sclauzero, A. P. Seitsonen, A.  Smogunov, P.  Umari and R. M. Wentzcovitch, QUANTUM ESPRESSO: A modular and open-source software project for quantum simulations of materials, J. Phys.: Condens. Matter 21(39) :395502, 2009. Crossref, ISIGoogle Scholar
    • 43. S.  Plimpton, Fast parallel algorithms for short-range molecular dynamics, J. Comput. Phys. 117(1) :1–19, 1995. Crossref, ISIGoogle Scholar