Graph Connectivity in Log Steps Using Label Propagation
Abstract
The fastest deterministic algorithms for connected components take logarithmic time and perform superlinear work on a Parallel Random Access Machine (PRAM). These algorithms maintain a spanning forest by merging and compressing trees, which requires pointer-chasing operations that increase memory access latency and are limited to shared-memory systems. Many of these PRAM algorithms are also very complicated to implement. Another popular method is “leader-contraction” where the challenge is to select a constant fraction of leaders that are adjacent to a constant fraction of non-leaders with high probability, but this can require adding more edges than were in the original graph. Instead we investigate label propagation because it is deterministic, easy to implement, and does not rely on pointer-chasing. Label propagation exchanges representative labels within a component using simple graph traversal, but it is inherently difficult to complete in a sublinear number of steps. We are able to overcome the problems with label propagation for graph connectivity.
We introduce a surprisingly simple framework for deterministic, undirected graph connectivity using label propagation that is easily adaptable to many computational models. It achieves logarithmic convergence independently of the number of processors and without increasing the edge count. We employ a novel method of propagating directed edges in alternating direction while performing minimum reduction on vertex labels. We present new algorithms in PRAM, Stream, and MapReduce. Given a simple, undirected graph with vertices, edges, our approach takes O(m) work each step, but we can only prove logarithmic convergence on a path graph. It was conjectured by Liu and Tarjan (2019) to take steps or possibly steps. Our experiments on a range of difficult graphs also suggest logarithmic convergence. We leave the proof of convergence as an open problem.
Communicated by Andrew Adamatzky
References
- 1. , A simpler parallel algorithm for graph connectivity, Journal of Algorithms 16(2) (1994) 190–217. Crossref, Web of Science, Google Scholar
- 2. , Approximate parallel scheduling. Part II: Applications to logarithmic-time optimal graph algorithms, Information and Computation 92 (1991) 1–47. Crossref, Web of Science, Google Scholar
- 3. , New connectivity and MSF algorithms for shuffle-exchange network and PRAM, IEEE Transactions on Computers C-36 (1987) 1258–1263. Crossref, Web of Science, Google Scholar
- 4. , An O(logn) parallel connectivity algorithm, Journal of Algorithms 3 (1982) 57–67. Crossref, Web of Science, Google Scholar
- 5. , Parallel implementation of Boruvka’s minimum spanning tree algorithm, in Proceedings of International Conference on Parallel Processing (ICPP ’96),
1996 , pp. 302–308. Google Scholar - 6. , Connected components algorithms for mesh-connected parallel computers, in Parallel Algorithms: 3rd DIMACS Implementation Challenge. Google Scholar
- 7. , Multi-core spanning forest algorithms using the disjoint-set data structure, in Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’12),
2012 , pp. 827–835. Google Scholar - 8. , A fast, parallel spanning tree algorithm for symmetric multiprocessors, in Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’04),
2004 . Google Scholar - 9. , Parallel graph connectivity in log diameter rounds, in Proceedings of the 59th Annual IEEE Symposium on Foundations of Computer Science. Google Scholar
- 10. , Connected components in MapReduce and Beyond, in Proceedings of the ACM Symposium on Cloud Computing (SoCC’14),
2014 , p. 1–13. Google Scholar - 11. , Finding connected components in map-reduce in logarithmic rounds, in Proceedings of the 2013 IEEE International Conference On Data Engineering (ICDE’13),
2013 , pp. 50–61. Google Scholar - 12. , Fast connected components algorithms for the EREW PRAM, SIAM Journal of Computing 28(3) (1999) 1021–1034. Crossref, Web of Science, Google Scholar
- 13. , Sequential operations in digital picture processing, Journal of the ACM 13(4) (1966) 471–494. Crossref, Web of Science, Google Scholar
- 14. , Efficient component labeling of images of arbitrary dimension represented by linear bintrees, IEEE Transactions on Pattern Analysis and Machine Intelligence 10(4) (1988) 579–586. Crossref, Web of Science, Google Scholar
- 15. , Connected component labeling and adjacency graph construction, Machine Intelligence and Pattern Recognition 19 (1996) 1–30. Google Scholar
- 16. , Simple concurrent labeling algorithms for connected components, in Symposium on Simplicity of Algorithms (SOSA’19),
2019 . Google Scholar - 17. , Concurrent threads and optimal parallel minimum spanning trees algorithm, Journal of the ACM 48(2) (2001) 297–323. Crossref, Web of Science, Google Scholar
- 18. , On the streaming model augmented with a sorting primitive, in Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS’04),
2004 , pp. 540–549. Google Scholar - 19. D. K. Campbell, A survey of models of parallel computation, Tech. Rep., Unversity of York Department of Computer Science (March 1997). Google Scholar
- 20. , A survey of graph algorithms under extended streaming models of computation, in Fundamental Problems in Computing (Springer, 2009), pp. 455–476. Crossref, Google Scholar
- 21. , Graph stream algorithms: A survey, ACM SIGMOD Record 43(1) (2014) 9–20. Crossref, Web of Science, Google Scholar
- 22. , MapReduce parallel programming model: A state-of-the-art survey, International Journal of Parallel Programming 44(4) (2016) 832–866. Crossref, Web of Science, Google Scholar
- 23. , Parallelism in random access machines, in Proceedings of the ACM Symposium on Theory of Computing (STOC’78),
1978 , pp. 114–118. Google Scholar - 24. , Parallel Programming: Concepts and Practice, 1st edn. (Morgan Kaufmann, 2017). Google Scholar
- 25. , Selection and sorting with limited storage, Theoretical Computer Science 12(3) (1980) 315–323. Crossref, Web of Science, Google Scholar
- 26. M. R. Henzinger, P. Raghavan and S. Rajagopalan, Computing on data streams, Tech. Rep. 1998-011, DEC Systems Research Center (1998). Google Scholar
- 27. M. Ruhl, Efficient algorithms for new computational models, PhD thesis, Massachusetts Institute of Technology (Cambridge, MA, USA, 2003). Google Scholar
- 28. , A model of computation for MapReduce, in Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’10),
2010 , pp. 938–948. Google Scholar - 29. , Sorting, searching, and simulation in the MapReduce framework, in Proceedings of International Symposium on Algorithms and Computation (ISAAC’11),
2011 , pp. 374–383. Google Scholar - 30. , Space-round tradeoffs for MapReduce computations, in Proceedings of the 2012 ACM International Conference on Supercomputing (SC ’12),
2012 , pp. 235–244. Google Scholar - 31. , MapReduce: Simplified data processing on large clusters, in Proceedings of the 6th Conference on Symposium on Operating Systems Design and Implementation (OSDI ’04),
2004 , pp. 137–150. Google Scholar - 32. , Trading off space for passes in graph streaming problems, ACM Transactions on Algorithms 6(1) (2009) 6:1–6:17. https://doi.org/10.1145/1644015.1644021 Crossref, Web of Science, Google Scholar
- 33. , Communication steps for parallel query processing, in Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS’13),
2013 , pp. 1–12. Google Scholar - 34. , Communication steps for parallel query processing, Journal of the ACM 64(6) (2017) 1–58. Crossref, Web of Science, Google Scholar
- 35. , Near-optimal massively parallel graph connectivity, in 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS’19),
2019 , pp. 1615–1636. https://doi.org/10.1109/FOCS.2019.00095 Google Scholar - 36. , Connected components on a PRAM in log diameter time, in Symposium on Parallelism in Algorithms and Architectures (SPAA’20),
2020 . Google Scholar - 37. P. Burkhardt, Improved connectivity algorithm, Unpublished, October 19, 2019. Google Scholar
- 38. ,
Parallel algorithms for shared-memory machines , in Handbook of Theoretical Computer Science, ed. J. van Leeuwen (MIT Press, Cambridge, MA, USA, 1990), pp. 869–932. Crossref, Google Scholar - 39. P. Burkhardt, Easy graph algorithms under hard constraints, Tech. Rep., U.S. National Security Agency (May 2016), Keynote at the 2016 Seventh Annual Graph Exploitation Symposium (GraphEx). Google Scholar
- 40. J. Leskovec and A. Krevl, SNAP Datasets: Stanford large network dataset collection (June, 2014), http://snap.stanford.edu/data. Google Scholar
- 41. C. Demetrescu, A. Goldberg and D. Johnson, 9th DIMACS implementation challenge: Shortest paths (2019), http://www.diag.uniroma1.it//challenge9/. Google Scholar
- 42. , ConnectIt: A framework for static and incremental parallel graph connectivity algorithms, Proceedings of the VLDB 14(4) (2020) 653–667. Crossref, Web of Science, Google Scholar