Efficient Communication Induced Checkpointing Protocol for Broadcast Network-based Distributed Systems
Abstract
This paper proposes an enhanced Fully Informed Communication-Induced Checkpointing (FI-CIC) protocol to highly improve the possibility of detecting Z-cycle free patterns with no extra control message by utilizing the advantageous feature of the broadcast network in an effective way compared with the original FI-CIC protocol. Experimental results show that our protocol outperforms the previous one in terms of the number of forced checkpoints per process.
References
- 1. , Parsec: A parallel simulation environments for complex systems, IEEE Comput. 31(10) (1998) 77–85. Crossref, ISI, Google Scholar
- 2. , Impossibility of scalar clock-based communication-induced checkpointing protocols ensuring the RDT property, Inf. Process. Lett. 80(2) (2001) 105–111. Crossref, ISI, Google Scholar
- 3. , A survey of rollback-recovery protocols in message-passing systems, ACM Comput Surv. 34(3) (2002) 375–408. Crossref, ISI, Google Scholar
- 4. I. C. Garcia, G. M. D. Vieira, and L. E. Buzato, A rollback in the history of communication-induced checkpointing, submitted (arXiv:1702.06167 [cs.DC]), 2017. Google Scholar
- 5. , Communication-based prevention of useless checkpoints in distributed computations, Distrib. Comput. 13(1) (2000) 29–43. Crossref, ISI, Google Scholar
- 6. , Time, clocks, and the ordering of events in a distributed system, Commun. ACM 21(7) (1978) 558–565. Crossref, ISI, Google Scholar
- 7. , FINE: A fully informed and efficient communication-induced checkpointing protocol for distributed systems, J Parallel. Distrib. Comput. 69(2) (2009) 153–167. Crossref, ISI, Google Scholar
- 8. , Theoretical and experimental evaluation of commu- nication-induced checkpointing protocols in and families, Perform. Evaluation 68(5) (2011) 429–445. Crossref, ISI, Google Scholar
- 9. , Necessary and sufficient conditions for consistent global snapshots, IEEE Trans. Parallel Distrib. Syst. 6(2) (1995) 165–169. Crossref, ISI, Google Scholar
- 10. , Compiler supported interval optimisation for communication induced checkpointing, in Proc. International Conference on Parallel and Distributed Processing Techniques and Applications,
Las Vegas, NV ,June 2007 , 2, pp. 550–556. Google Scholar - 11. , Fail-stop processors: an approach to designing fault-tolerant distributed computing systems, ACM Trans. Comput. Syst. 1(3) (1985) 222–238. Crossref, ISI, Google Scholar
- 12. , Self-healing in autonomic distributed systems based on delayed communication-induced checkpointing, Int. J. of Auton. Adapt. Comm. Syst. 9(3-4) (2016) 183–200. Crossref, Google Scholar
- 13. , A communication-induced checkpointing and asynchronous recovery protocol for mobile computing systems, in Proc. 6th International Conference on Parallel and Distributed Computing Applications and Technologies,
Dalian, China ,December 2005 , pp. 70–74. Google Scholar - 14. , Adaptive communication-induced checkpointing protocols with domino-effect freedom, J. Inf. Sci. Eng. 20(5) (2004) 885–901. Google Scholar
- 15. ,
Communication-induced checkpointing protocols and rollback-dependency trackability: A survey , in Wiley Encyclopedia of Computer Science and Engineering (John Wiley and Sons, Inc., 2008). Crossref, Google Scholar


