OBSERVING THE IMPACT OF MULTIPLE METRICS AND RUNTIME ADAPTATIONS ON BSP PROCESS RESCHEDULING
Abstract
Process rescheduling is an useful mechanism to offer runtime load balancing, mainly in dynamic and heterogeneous environments. In this context, we developed a model called MigBSP which controls the process migration on BSP (Bulk Synchronous Parallel) applications. A BSP application is divided in one or more supersteps, each one containing both computation and communication phases followed by a barrier synchronization. Since the barrier waits for the slowest process, MigBSP's final objective is to adjust the processes location in order to reduce the supersteps' times. Its novel ideas are twofold. The former is represented by the combination of three metrics - Memory, Computation and Communication - in order to measure the Potential of Migration of each BSP process. The second idea consists in offering efficient adaptations that work on the rescheduling frequency. Both ideas turn MigBSP a viable model for getting performance on BSP applications. Meanwhile, it provides a low overhead on application execution when migrations do not take place. This paper presents MigBSP's algorithms, the parallel machine organization, some experimental results and related work.
References
Gagan Aggarwal , Rajeev Motwani and An Zhu , The load rebalancing problem, SPAA '2003: Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures (ACM Press, New York, NY, USA, 2003) pp. 258–265. Google ScholarMilind A. Bhandarkar , Robert Brunner and Laxmikant V. Kale , Run-time support for adaptive load balancing, IPDPS '2000: Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing (Springer-Verlag, London, UK, 2000) pp. 1152–1159. Google ScholarOlaf Bonorden , Load balancing in the bulk-synchronous-parallel setting using process migrations, 21th International Parallel and Distributed Processing Symposium (IPDPS 2007) (IEEE, 2007) pp. 1–9. Google ScholarOlaf Bonorden , Joachim Gehweiler and Friedhelm Meyer auf der Heide , Load balancing strategies in a web computing environment, Proceeedings of International Conference on Parallel Processing and Applied Mathematics (PPAM) (Poznan, Poland, 2005) pp. 839–846. Google Scholar- Journal of Grid Computing 7, 73 (2009), DOI: 10.1007/s10723-008-9107-y. Crossref, ISI, Google Scholar
Raphael Y. De Camargo , Fabio Kon and Alfredo Goldman , Portable checkpointing and communication for bsp applications on dynamic heterogeneous grid environments, SBAC-PAD '05: Proceedings of the 17th International Symposium on Computer Architecture on High Performance Computing (IEEE Computer Society, Washington, DC, USA, 2005) pp. 226–234. Google ScholarHenri Casanova , Arnaud Legrand and Martin Quinson , Simgrid: A generic framework for large-scale distributed experiments, Tenth International Conference on Computer Modeling and Simulation (uksim) (IEEE Computer Society, Los Alamitos, CA, USA, 2008) pp. 126–131. Google Scholar- IEEE Trans. Softw. Eng. 14(2), 141 (1988), DOI: 10.1109/32.4634. Crossref, ISI, Google Scholar
-
Rodrigo da Rosa Righi , Alexandre Carissimi and Philippe Olivier Alexandre Navaux , Load rebalancing in bsp applications using three metrics: Computation, communication and memory , 8th IEEE International Symposium on Cluster Computing and the Grid - Poster Section . Google Scholar - Rodrigo da Rosa Righi, Laércio Lima Pilla, Alexandre Silva Carissimi, Philippe O. A. Navaux, and Hans-Ulrich Heiss. Applying processes rescheduling over irregular bsp application. In Gabrielle Allen, Jarosław Nabrzyski, Edward Seidel, Geert Dick van Albada, Jack Dongarra, and Peter M. A. Sloot, editors, Computational Science – ICCS 2009, volume 5544 of LNCS, pages 213–223. Springer, 2009 . Google Scholar
Cong Du , Xian-He Sun and Ming Wu , Dynamic scheduling with process migration, CCGRID '2007: Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (IEEE Computer Society, Washington, DC, USA, 2007) pp. 92–99. Google Scholar- Job Scheduling Strategies for Parallel Processing 4942, 1 (2008), DOI: 10.1007/978-3-540-78699-3_1. Crossref, Google Scholar
- Ismael Galindo, Francisco Almeida, and José Manuel Badía-Contelles. Dynamic load balancing on dedicated heterogeneous systems. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings, volume 5205 of Lecture Notes in Computer Science, pages 64–74. Springer, 2008 . Google Scholar
- Concurr. Comput. : Pract. Exper. 16(5), 449 (2004), DOI: 10.1002/cpe.824. Crossref, ISI, Google Scholar
- Inf. Sci. Inf. Comput. Sci. 84(1-2), 115 (1995), DOI: 10.1002/cpe.824. Google Scholar
Chao Huang , Performance evaluation of adaptive mpi, PPoPP '2006: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming (ACM Press, New York, NY, USA, 2006) pp. 12–21. Google ScholarDerrick Kondo , Models and scheduling mechanisms for global computing applications, IPDPS '2002: Proceedings of the 16th International Symposium on Parallel and Distributed Processing (IEEE Computer Society, Washington, DC, USA, 2002) p. 79.2. Google Scholar- Jorunal of Intelligent Information Systems 15(3), 221 (2000), DOI: 10.1023/A:1008728429669. Crossref, ISI, Google Scholar
Weikai Miao and Weiqin Tong , Agent based servicebsp model with superstep service for grid computing, Sixth International Conference on Grid and Cooperative Computing (GCC 2007)00 (2007) pp. 255–260. Google Scholar- Concurr. Comput. : Pract. Exper 20(13), 1485 (2008). Crossref, ISI, Google Scholar
- Rafael Moreno-Vozmediano and Ana B. Alonso-Conde. Influence of grid economic factors on scheduling and migration. In High Performance Computing for Computational Science - VECPAR, volume 3402 of Lecture Notes in Computer Science, pages 274–287. Springer, 2005 . Google Scholar
Nicolas Schepke and Claudio Maillard , Performance improvement of the parallel lattice boltzmann method through blocked data distributions, 19th International Symposium on Computer Architecture and High Performance Computing (2007) pp. 71–78. Google ScholarAndrew Tanenbaum , Computer Networks, 4th edn. (Prentice Hall PTR, Upper Saddle River, New Jersey, 2003). Google Scholar- Concurr. Comput. : Pract. Exper 17(2-4), 235 (2005), DOI: 10.1002/cpe.927. Crossref, ISI, Google Scholar
- Commun. ACM 33(8), 103 (1990), DOI: 10.1145/79173.79181. Crossref, ISI, Google Scholar
- Parallel Processing Letters 13(3), 329 (2003), DOI: 10.1142/S0129626403001318. Link, ISI, Google Scholar


