World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

An Eventually Perfect Failure Detector for Networks of Arbitrary Topology Connected with ADD Channels Using Time-To-Live Values

    We present an implementation of an eventually perfect failure detector in an arbitrarily connected, partitionable network. We assume ADD channels: for each one there exist constants K, D, not known to the processes, such that for every K consecutive messages sent in one direction, at least one is delivered within time D. The best previous implementation used messages of bounded size, but exponential in n, the number of nodes. The main contribution of this paper is a novel use of time-to-live values in the design of failure detectors, obtaining a flexible implementation that uses messages of size O(nlogn).

    A preliminary version of this paper appeared in the 49th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2019).1

    References

    • 1. K. Vargas and S. Rajsbaum, An eventually perfect failure detector for networks of arbitrary topology connected with ADD channels using time-to-live values, in 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Portland, OR, USA (IEEE, 2019), pp. 264–275. CrossrefGoogle Scholar
    • 2. T. D. Chandra and S. Toueg, Unreliable failure detectors for reliable distributed systems, Journal of the ACM (JACM) 43(2) (1996) 225–267. Crossref, ISIGoogle Scholar
    • 3. M. K. Aguilera, C. Delporte-Gallet, H. Fauconnier and S. Toueg, Stable leader election, in Proceedings of the 15th International Conference on Distributed Computing (DISC), Lecture Notes in Computer Science Book Series, Vol. 2180 (Springer-Verlag, London, UK, 2001), pp. 108–122. CrossrefGoogle Scholar
    • 4. M. K. Aguilera, W. Chen and S. Toueg, On quiescent reliable communication, SIAM J. Comput. 29 (April 2000) 2040–2073. Crossref, ISIGoogle Scholar
    • 5. R. Guerraoui, M. Kapalka and P. Kouznetsov, The weakest failure detectors to boost obstruction-freedom, in Proceedings of the 20th International Conference on Distributed Computing (DISC), Lecture Notes in Computer Science, Vol. 4167 (Springer-Verlag, Berlin, Heidelberg, 2006). pp. 399–412. CrossrefGoogle Scholar
    • 6. S. M. Pike, Y. Song and S. Sastry, Wait-free dining under eventual weak exclusion, in Distributed Computing and Networking (ICDCN), eds. S. RaoM. ChatterjeeP. JayantiC. S. R. MurthyS. K. Saha, Lecture Notes in Computer Science, Vol. 4904 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2008), pp. 135–146. CrossrefGoogle Scholar
    • 7. S. M. Pike and P. A. G. Sivilotti, Dining philosophers with crash locality 1, in 24th International Conference on Distributed Computing Systems, 24–26 March 2004, Hachioji, Tokyo, Japan (2004), pp. 22–29. CrossrefGoogle Scholar
    • 8. Y. Song and S. M. Pike, Eventually k-bounded wait-free distributed daemons, in The 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2007), 25–28 June 2007, Edinburgh, UK (2007), pp. 645–655. CrossrefGoogle Scholar
    • 9. C. Delporte-Gallet, H. Fauconnier and R. Guerraoui, A realistic look at failure detectors, in Proceedings International Conference on Dependable Systems and Networks (IEEE, 2002), pp. 345–353. CrossrefGoogle Scholar
    • 10. R. Guerraoui, On the hardness of failure-sensitive agreement problems, Inf. Process. Lett. 79 (June 2001) 99–104. Crossref, ISIGoogle Scholar
    • 11. M. Larrea, A. Fernández and S. Arévalo, On the implementation of unreliable failure detectors in partially synchronous systems, IEEE Trans. Computers 53(7) (2004) 815–828. Crossref, ISIGoogle Scholar
    • 12. C. Dwork, N. Lynch and L. Stockmeyer, Consensus in the presence of partial synchrony, J. ACM 35 (April 1988) 288–323. Crossref, ISIGoogle Scholar
    • 13. R. Friedman, A. Mostéfaoui and M. Raynal, On the respective power of P and S to solve one-shot agreement problems, IEEE Trans. Parallel Distrib. Syst. 18(5) (2007) 589–597. Crossref, ISIGoogle Scholar
    • 14. M. K. Aguilera, W. Chen and S. Toueg, Using the heartbeat failure detector for quiescent reliable communication and consensus in partitionable networks, Theoretical Computer Science 220(1) (1999) 3–30. Crossref, ISIGoogle Scholar
    • 15. F. Greve, L. Arantes and P. Sens, What model and what conditions to implement unreliable failure detectors in dynamic networks?, in Proceedings of the 3rd International Workshop on Theoretical Aspects of Dynamic Distributed Systems (ACM, 2011), pp. 13–17. CrossrefGoogle Scholar
    • 16. M. Hutle, An efficient failure detector for sparsely connected networks, in Parallel and Distributed Computing and Networks (2004), 369–374. Google Scholar
    • 17. S. Sastry and S. M. Pike, Eventually perfect failure detectors using add channels, in Proc. 5th Int. Symposium on Parallel and Distributed Processing and Applications (ISPA), Lecture Notes in Computer Science, Vol. 4742 (Springer, Berlin, Heidelberg, 2007). pp. 483–496. CrossrefGoogle Scholar
    • 18. S. Kumar and J. L. Welch, Implementing P with bounded messages on a network of ADD channels, Parallel Processing Letters 29(1) (2019) 1950002:1–1950002:12. Link, ISIGoogle Scholar
    • 19. J. F. Kurose and K. W. Ross, Computer Networking: A Top-Down Approach, International Edition (Pearson Higher Ed, 2013). Google Scholar
    • 20. A. S. Tanenbaum and D. Wetherall, Computer Networks, 5th edition (Pearson, 2011). Google Scholar
    • 21. M. Raynal, Fault-Tolerant Message-Passing Distributed Systems — An Algorithmic Approach (Springer, 2018). CrossrefGoogle Scholar
    • 22. C. Fetzer, M. Raynal and F. Tronel, An adaptive failure detection protocol, in 8th Pacific Rim International Symposium on Dependable Computing (PRDC 2001), 17–19 December 2001, Seoul, Korea (2001), pp. 146–153. CrossrefGoogle Scholar
    • 23. M. Larrea, A. F. Anta and S. Arévalo, Implementing the weakest failure detector for solving the consensus problem, International Journal of Parallel, Emergent and Distributed Systems 28(6) (2013) 537–555. CrossrefGoogle Scholar
    • 24. M. Larrea, A. Lafuente, I. Soraluze, R. Cortiñas and J. Wieland, On the implementation of communication-optimal failure detectors, in Proc. Latin-American Symposium on Dependable Computing (LADC), eds. A. BondavalliF. BrasileiroS. Rajsbaum, Lecture Notes in Computer Science, Vol. 4746 (Springer, Berlin, Heidelberg, 2007), pp. 25–37. CrossrefGoogle Scholar
    • 25. S. Sastry, S. M. Pike and J. L. Welch, Crash-quiescent failure detection, in Proceedings of the 23rd International Conference on Distributed Computing (DISC), Lecture Notes in Computer Science, Vol. 5805 (Springer-Verlag, Berlin, Heidelberg, 2009), pp. 326–340. CrossrefGoogle Scholar
    • 26. E. Jiménez, S. Arévalo and A. Fernández, Implementing unreliable failure detectors with unknown membership, Information Processing Letters 100(2) (2006) 60–63. Crossref, ISIGoogle Scholar
    • 27. N. A. Lynch, Distributed Algorithms (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1996). Google Scholar
    • 28. F. Greve, P. Sens, L. Arantes and V. Simon, Eventually strong failure detector with unknown membership, Comput. J. 55 (December 2012) 1507–1524. Crossref, ISIGoogle Scholar