World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Elastic Parallel Systems for High Performance Cloud Computing: State-of-the-Art and Future Directions

    With on-demand access to compute resources, pay-per-use, and elasticity, the cloud evolved into an attractive execution environment for High Performance Computing (HPC). Whereas elasticity, which is often referred to as the most beneficial cloud-specific property, has been heavily used in the context of interactive (multi-tier) applications, elasticity-related research in the HPC domain is still in its infancy. Existing parallel computing theory as well as traditional metrics to analytically evaluate parallel systems do not comprehensively consider elasticity, i.e., the ability to control the number of processing units at runtime. To address these issues, we introduce a conceptual framework to understand elasticity in the context of parallel systems, define the term elastic parallel system, and discuss novel metrics for both elasticity control at runtime as well as the ex-post performance evaluation of elastic parallel systems. Based on the conceptual framework, we provide an in-depth analysis of existing research in the field to describe the state-of-the-art and compile our findings into a research agenda for future research on elastic parallel systems.

    References

    • 1. G. Galante et al., An analysis of public clouds elasticity in the execution of scientific applications: A survey, Journal of Grid Computing 14 (2016) 193–216. Crossref, ISIGoogle Scholar
    • 2. M. A. S. Netto et al., HPC cloud for scientific and business applications: Taxonomy, vision, and research challenges, ACM Computing Surveys (CSUR) 51 (2018) 8:1–8:29. Crossref, ISIGoogle Scholar
    • 3. S. Kehrer and W. Blochinger, TASKWORK: A cloud-aware runtime system for elastic task-parallel HPC applications, in Proceedings of the 9th International Conference on Cloud Computing and Services Science (SciTePress, 2019), 198–209. CrossrefGoogle Scholar
    • 4. J. Zhang, X. Lu and D. K. D. Panda, Designing locality and NUMA aware MPI runtime for nested virtualization based HPC cloud with SR-IOV enabled InfiniBand, in Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE ’17) (ACM, New York, NY, USA, 2017), pp. 187–200. CrossrefGoogle Scholar
    • 5. R. Aljamal, A. El-Mousa and F. Jubair, A comparative review of high-performance computing major cloud service providers, in 2018 9th International Conference on Information and Communication Systems (ICICS) (April 2018), pp. 181–186. Google Scholar
    • 6. B. Varghese and R. Buyya, Next generation cloud computing: New trends and research directions, Future Generation Computer Systems 79 (2018) 849–861. Crossref, ISIGoogle Scholar
    • 7. C. Vecchiola, S. Pandey and R. Buyya, High-performance cloud computing: A view of scientific applications, in 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN) (IEEE, 2009), pp. 4–16. CrossrefGoogle Scholar
    • 8. G. Galante and L. C. E. de Bona, A survey on cloud computing elasticity, in 2012 IEEE Fifth International Conference on Utility and Cloud Computing November 2012, pp. 263–270. Google Scholar
    • 9. Y. Al-Dhuraibi et al., Elasticity in cloud computing: State of the art and research challenges, IEEE Transactions on Services Computing 11 (2018) 430–447. Crossref, ISIGoogle Scholar
    • 10. D. Rajan and D. Thain, Designing self-tuning split-map-merge applications for high cost-efficiency in the cloud, IEEE Transactions on Cloud Computing 5 (2017) 303–316. Crossref, ISIGoogle Scholar
    • 11. R. da Rosa Righi et al., Autoelastic: Automatic resource elasticity for high performance applications in the cloud, IEEE Transactions on Cloud Computing 4 (2016) 6–19. Crossref, ISIGoogle Scholar
    • 12. J. Haussmann, W. Blochinger and W. Kuechlin, Cost-efficient parallel processing of irregularly structured problems in cloud computing environments, Cluster Computing (December 2018). Crossref, ISIGoogle Scholar
    • 13. C. Fehling et al., Cloud Computing Patterns: Fundamentals to Design, Build, and Manage Cloud Applications (Springer Publishing Company, Incorporated, 2014). CrossrefGoogle Scholar
    • 14. N. R. Herbst, S. Kounev and R. Reussner, Elasticity in cloud computing: What it is, and what it is not, in Proceedings of the 10th International Conference on Autonomic Computing (ICAC 13) (USENIX, San Jose, CA, 2013), pp. 23–27. Google Scholar
    • 15. D. Moldovan et al., MELA: Monitoring and analyzing elasticity of cloud services, in 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (2013), pp. 80–87. Google Scholar
    • 16. A. Gupta et al., HPC-Aware VM placement in infrastructure clouds, in 2013 IEEE International Conference on Cloud Engineering (IC2E) (March 2013), pp. 11–20. Google Scholar
    • 17. X. Yang et al., Cloud computing in e-Science: Research challenges and opportunities, The Journal of Supercomputing 70 (2014) 408–464. Crossref, ISIGoogle Scholar
    • 18. A. Gupta and D. Milojicic, Evaluation of HPC applications on cloud, in 2011 Sixth Open Cirrus Summit (October 2011), pp. 22–26. Google Scholar
    • 19. A. Gupta et al., The who, what, why, and how of high performance computing in the cloud, in IEEE 5th International Conference on Cloud Computing Technology and Science (December 2013), pp. 306–314. Google Scholar
    • 20. V. Mauch, M. Kunze and M. Hillenbrand, High performance cloud computing, Future Generation Computer Systems 29(6) (2013) 1408–1416. Crossref, ISIGoogle Scholar
    • 21. K. B. Ferreira, P. Bridges and R. Brightwell, Characterizing application sensitivity to OS interference using kernel-level noise injection, in 2008 SC – International Conference for High Performance Computing, Networking, Storage and Analysis (November 2008), pp. 1–12. Google Scholar
    • 22. A. Gupta et al., Evaluating and improving the performance and scheduling of HPC applications in cloud, IEEE Transactions on Cloud Computing 4 (2016) 307–321. Crossref, ISIGoogle Scholar
    • 23. A. Grama et al., Introduction to Parallel Computing, 2nd edn. (Pearson Education, 2003). Google Scholar
    • 24. L. E. Jordan and G. Alaghband, Fundamentals of Parallel Processing (Prentice Hall Professional Technical Reference, 2002). Google Scholar
    • 25. D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods (Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1989). Google Scholar
    • 26. D. L. Eager, J. Zahorjan and E. D. Lazowska, Speedup versus efficiency in parallel systems, IEEE Transactions on Computers 38 (1989) 408–423. Crossref, ISIGoogle Scholar
    • 27. R. da Rosa Righi et al., Joint-analysis of performance and energy consumption when enabling cloud elasticity for synchronous HPC applications, Concurrency and Computation: Practice & Experience 28 (2016) 1548–1571. Crossref, ISIGoogle Scholar
    • 28. D. Chandler et al., Report on cloud computing to the OSG steering committee, Tech. Rep., SPEC OSG Cloud Computing Working Group (2012). Google Scholar
    • 29. P. Brereton et al., Lessons from applying the systematic literature review process within the software engineering domain, Journal of Systems and Software 80(4) (2007) 571–583. Crossref, ISIGoogle Scholar
    • 30. J. Webster and R. T. Watson, Analyzing the past to prepare for the future: Writing a literature review, MIS Quarterly 26(2) (2002) xiii–xxiii. ISIGoogle Scholar
    • 31. W. Gropp, R. Thakur and E. Lusk, Using MPI-2: Advanced Features of the Message Passing Interface (MIT Press, 1999). Google Scholar
    • 32. R. da Rosa Righi et al., Towards cloud-based asynchronous elasticity for iterative HPC applications, Journal of Physics: Conference Series 649 (2015) 012006. CrossrefGoogle Scholar
    • 33. V. F. Rodrigues et al., Towards combining reactive and proactive cloud elasticity on running HPC applications, in Proceedings of the 3rd International Conference on Internet of Things, Big Data and Security: IoTBDS (SciTePress, 2018), pp. 261–268. CrossrefGoogle Scholar
    • 34. R. da Rosa Righi et al., A lightweight plug-and-play elasticity service for self-organizing resource provisioning on parallel applications, Future Generation Computer Systems 78 (2018) 176–190. Crossref, ISIGoogle Scholar
    • 35. V. Shankar et al., Numpywren: Serverless linear algebra, Tech. Rep. UCB/EECS-2018-137, EECS Department, University of California, Berkeley (October 2018). Google Scholar
    • 36. A. Raveendran, T. Bicer and G. Agrawal, A framework for elastic execution of existing MPI programs, in 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (May 2011), pp. 940–947. Google Scholar
    • 37. T. Gautier, J. L. Roch and G. Villard, Regular versus irregular problems and algorithms, in Parallel Algorithms for Irregularly Structured Problems, eds. A. FerreiraJ. Rolim (Springer Berlin Heidelberg, Berlin, Heidelberg, 1995), pp. 1–25. CrossrefGoogle Scholar
    • 38. Y. Sun and C.-L. Wang, Solving irregularly structured problems based on distributed object model, Parallel Computing 29 (2003) 1539–1562. Crossref, ISIGoogle Scholar
    • 39. B. L. Massingill, T. G. Mattson and B. A. Sanders, Reengineering for parallelism: An entry point into PLPP for legacy applications, Concurrency and Computation: Practice and Experience 19(4) (2007) 503–529. Crossref, ISIGoogle Scholar
    • 40. K. Keutzer et al., A design pattern language for engineering (parallel) software: Merging the PLPP and OPL projects, in Proceedings of the 2010 Workshop on Parallel Programming Patterns (ACM, 2010). CrossrefGoogle Scholar
    • 41. M. Parashar et al., Cloud paradigms and practices for computational and data-enabled science and engineering, Computing in Science Engineering 15 (2013) 10–18. Crossref, ISIGoogle Scholar
    • 42. S. Kehrer and W. Blochinger, Migrating parallel applications to the cloud: Assessing cloud readiness based on parallel design decisions, SICS Software-Intensive Cyber-Physical Systems 34 (2019) 73–84. CrossrefGoogle Scholar
    • 43. A. Gupta et al., Improving HPC application performance in cloud through dynamic load balancing, in 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (May 2013), p. 402–409. Google Scholar
    • 44. M. Kuperberg et al., Defining and quantifying elasticity of resources in cloud computing and scalable platforms, Tech. Rep. 16, Karlsruher Institut für Technologie (KIT) (2011). Google Scholar
    • 45. N. R. Herbst et al., BUNGEE: An elasticity benchmark for self-adaptive IaaS cloud environments, in 2015 IEEE/ACM 10th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (May 2015), pp. 46–56. Google Scholar
    • 46. A. Bauer et al., Chameleon: A hybrid, proactive auto-scaling mechanism on a level-playing field, IEEE Transactions on Parallel and Distributed Systems 30 (2019) 800–813. Crossref, ISIGoogle Scholar