An Eventually Perfect Failure Detector for Networks of Arbitrary Topology Connected with ADD Channels Using Time-To-Live Values
Abstract
We present an implementation of an eventually perfect failure detector in an arbitrarily connected, partitionable network. We assume ADD channels: for each one there exist constants , , not known to the processes, such that for every consecutive messages sent in one direction, at least one is delivered within time . The best previous implementation used messages of bounded size, but exponential in , the number of nodes. The main contribution of this paper is a novel use of time-to-live values in the design of failure detectors, obtaining a flexible implementation that uses messages of size .
A preliminary version of this paper appeared in the 49th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2019).1
References
- 1. , An eventually perfect failure detector for networks of arbitrary topology connected with ADD channels using time-to-live values, in 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN),
Portland, OR, USA (IEEE, 2019), pp. 264–275. Crossref, Google Scholar - 2. , Unreliable failure detectors for reliable distributed systems, Journal of the ACM (JACM) 43(2) (1996) 225–267. Crossref, ISI, Google Scholar
- 3. , Stable leader election, in Proceedings of the 15th International Conference on Distributed Computing (DISC),
Lecture Notes in Computer Science Book Series , Vol. 2180 (Springer-Verlag, London, UK, 2001), pp. 108–122. Crossref, Google Scholar - 4. , On quiescent reliable communication, SIAM J. Comput. 29 (April 2000) 2040–2073. Crossref, ISI, Google Scholar
- 5. , The weakest failure detectors to boost obstruction-freedom, in Proceedings of the 20th International Conference on Distributed Computing (DISC),
Lecture Notes in Computer Science , Vol. 4167 (Springer-Verlag, Berlin, Heidelberg, 2006). pp. 399–412. Crossref, Google Scholar - 6. ,
Wait-free dining under eventual weak exclusion , in Distributed Computing and Networking (ICDCN), eds. S. RaoM. ChatterjeeP. JayantiC. S. R. MurthyS. K. Saha,Lecture Notes in Computer Science , Vol. 4904 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2008), pp. 135–146. Crossref, Google Scholar - 7. , Dining philosophers with crash locality 1, in 24th International Conference on Distributed Computing Systems,
24–26 March 2004 ,Hachioji, Tokyo, Japan (2004), pp. 22–29. Crossref, Google Scholar - 8. , Eventually k-bounded wait-free distributed daemons, in The 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2007),
25–28 June 2007 ,Edinburgh, UK (2007), pp. 645–655. Crossref, Google Scholar - 9. , A realistic look at failure detectors, in Proceedings International Conference on Dependable Systems and Networks (IEEE, 2002), pp. 345–353. Crossref, Google Scholar
- 10. , On the hardness of failure-sensitive agreement problems, Inf. Process. Lett. 79 (June 2001) 99–104. Crossref, ISI, Google Scholar
- 11. , On the implementation of unreliable failure detectors in partially synchronous systems, IEEE Trans. Computers 53(7) (2004) 815–828. Crossref, ISI, Google Scholar
- 12. , Consensus in the presence of partial synchrony, J. ACM 35 (April 1988) 288–323. Crossref, ISI, Google Scholar
- 13. , On the respective power of and to solve one-shot agreement problems, IEEE Trans. Parallel Distrib. Syst. 18(5) (2007) 589–597. Crossref, ISI, Google Scholar
- 14. , Using the heartbeat failure detector for quiescent reliable communication and consensus in partitionable networks, Theoretical Computer Science 220(1) (1999) 3–30. Crossref, ISI, Google Scholar
- 15. , What model and what conditions to implement unreliable failure detectors in dynamic networks?, in Proceedings of the 3rd International Workshop on Theoretical Aspects of Dynamic Distributed Systems (ACM, 2011), pp. 13–17. Crossref, Google Scholar
- 16. , An efficient failure detector for sparsely connected networks, in Parallel and Distributed Computing and Networks (
2004 ), 369–374. Google Scholar - 17. , Eventually perfect failure detectors using add channels, in Proc. 5th Int. Symposium on Parallel and Distributed Processing and Applications (ISPA),
Lecture Notes in Computer Science , Vol. 4742 (Springer, Berlin, Heidelberg, 2007). pp. 483–496. Crossref, Google Scholar - 18. , Implementing with bounded messages on a network of ADD channels, Parallel Processing Letters 29(1) (2019) 1950002:1–1950002:12. Link, ISI, Google Scholar
- 19. , Computer Networking: A Top-Down Approach, International Edition (Pearson Higher Ed, 2013). Google Scholar
- 20. , Computer Networks, 5th edition (Pearson, 2011). Google Scholar
- 21. , Fault-Tolerant Message-Passing Distributed Systems — An Algorithmic Approach (Springer, 2018). Crossref, Google Scholar
- 22. , An adaptive failure detection protocol, in 8th Pacific Rim International Symposium on Dependable Computing (PRDC 2001),
17–19 December 2001 ,Seoul, Korea (2001), pp. 146–153. Crossref, Google Scholar - 23. , Implementing the weakest failure detector for solving the consensus problem, International Journal of Parallel, Emergent and Distributed Systems 28(6) (2013) 537–555. Crossref, Google Scholar
- 24. , On the implementation of communication-optimal failure detectors, in Proc. Latin-American Symposium on Dependable Computing (LADC), eds. A. BondavalliF. BrasileiroS. Rajsbaum,
Lecture Notes in Computer Science , Vol. 4746 (Springer, Berlin, Heidelberg, 2007), pp. 25–37. Crossref, Google Scholar - 25. , Crash-quiescent failure detection, in Proceedings of the 23rd International Conference on Distributed Computing (DISC),
Lecture Notes in Computer Science , Vol. 5805 (Springer-Verlag, Berlin, Heidelberg, 2009), pp. 326–340. Crossref, Google Scholar - 26. , Implementing unreliable failure detectors with unknown membership, Information Processing Letters 100(2) (2006) 60–63. Crossref, ISI, Google Scholar
- 27. , Distributed Algorithms (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1996). Google Scholar
- 28. , Eventually strong failure detector with unknown membership, Comput. J. 55 (December 2012) 1507–1524. Crossref, ISI, Google Scholar


