Minimizing I/Os in Out-of-Core Task Tree Scheduling
Abstract
Scientific applications are usually described using directed acyclic graphs, where nodes represent tasks and edges represent dependencies between tasks. For some applications, this graph is a tree: each task produces a single result used solely by its parent. The temporary results of each task have to be stored between their production and their use.
We focus on the case when the data manipulated are very large. Then, during an execution, all data may not fit together in memory. In such a case, some data have to be temporarily written to disk and evicted from memory. These data are later read from disk when they are needed for computation.
These Input/Output operations are very expensive; hence, our goal is to minimize their total volume. The order in which the tasks are processed considerably influences the amount of such Input/Output operations. Finding the schedule which minimizes this amount is an open problem that we revisit in this paper.
We first formalize and generalize known results, and prove that existing solutions can be arbitrarily worse than the optimal. We then present an Integer Linear Program to solve it optimally. Finally, we propose a novel heuristic algorithm. We demonstrate its good performance through simulations on both synthetic and realistic trees built from actual scientific applications.
Communicated by Yong Zhang
References
- 1. , Robust memory-aware mappings for parallel multifrontal factorizations, SIAM J. Scientific Comput. 38(3) (2016) C256–C279. Crossref, Web of Science, Google Scholar
- 2. E. Agullo, On the out-of-core factorization of large sparse matrices, PhD thesis, École normale supérieure de Lyon, France (2008). Google Scholar
- 3. , A fully asynchronous multifrontal solver using distributed dynamic scheduling, SIAM J. Matrix Anal. Appl. 23(1) (2001) 15–41. Crossref, Web of Science, Google Scholar
- 4. , Hybrid scheduling for the parallel solution of linear systems, Parallel Comput. 32(2) (2006) 136–156. Crossref, Web of Science, Google Scholar
- 5. W. G. Aulbur, Parallel implementations of quasiparticle calculations of semiconductors and insulators, PhD thesis, The Ohio State University (1996). Google Scholar
- 6. , A study of replacement algorithms for a virtual-storage computer, IBM Syst. J. 5(2) (1966) 78–101. Crossref, Google Scholar
- 7. , Single-machine scheduling with precedence constraints, Math. Operat. Res. 30(4) (2005) 1005–1021. Crossref, Web of Science, Google Scholar
- 8. , Direct Methods for Sparse Linear Systems,
Fundamentals of Algorithms , Vol. 2 (Society for Industrial and Applied Mathematics, 2006). Crossref, Google Scholar - 9. , Benchmarking optimization software with performance profiles, Math. Program. 91(2) (2002) 201–213. Crossref, Web of Science, Google Scholar
- 10. , Parallel scheduling of task trees with limited memory, TOPC 2(2) (2015) 13. Crossref, Google Scholar
- 11. , Cache-oblivious algorithms, Foundations of Computer Science, 1999. 40th Annual Symposium on,
IEEE (IEEE Computer Society, New York, NY, 1999), pp. 285–297. Crossref, Google Scholar - 12. , Electron correlation in semiconductors and insulators: Band gaps and quasiparticle energies, Phys. Rev. B 34(8) (1986) 5390. Crossref, Web of Science, Google Scholar
- 13. , On optimal tree traversals for sparse matrix factorization, Proc. 25th IEEE Int. Parallel and Distributed Processing Symposium (IPDPS’11) (IEEE Computer Society, Los Alamitos, CA, USA, 2011), pp. 556–567. Crossref, Google Scholar
- 14. , I/o complexity: The red-blue pebble game, Proc. 13th Ann. ACM Symp. Theory of Computing, STOC ’81 (Association for Computing Machinery, New York, NY, USA, 1981), p. 326–333. Crossref, Google Scholar
- 15. , Memory-optimal evaluation of expression trees involving large objects, Computer Languages, Systems & Structures 37(2) (2011) 63–75. Crossref, Google Scholar
- 16. , A simple proof of optimality for the MIN cache replacement policy, Inf. Process. Lett. 116(2) (2016) 168–170. Crossref, Web of Science, Google Scholar
- 17. ,
Achieving chemical accuracy with coupled-cluster theory , Quantum Mechanical Electronic Structure Calculations with Chemical Accuracy, ed. S. R. Langhoff (Springer, New York, NY, 1995), pp. 47–108. Crossref, Google Scholar - 18. , On the storage requirement in the out-of-core multifrontal method for sparse factorization, ACM Trans. Mathematical Software (TOMS) 12(3) (1986) 249–264. Crossref, Web of Science, Google Scholar
- 19. , An application of generalized tree pebbling to sparse matrix factorization, SIAM J. Algebraic Discrete Methods 8(3) (1987) 375–395. Crossref, Web of Science, Google Scholar
- 20. , The role of elimination trees in sparse factorization, SIAM J. Matrix Analysis and Applications 11(1) (1990) 134–172. Crossref, Web of Science, Google Scholar
- 21. , Generating random binary treesa survey, Inform. Sci. 115(1–4) (1999) 123–136. Crossref, Web of Science, Google Scholar
- 22. , Benchmark Studies on Small Molecules, Encyclopedia of Computational Chemistry, Vol. 1 (Wiley, New York, NY, 1998), pp. 115–128. Google Scholar
- 23. , On the hardness of red-blue pebble games, Proc. 32nd ACM Symp. Parallelism in Algorithms and Architectures (Association for Computing Machinery, 2020), pp. 419–429. Crossref, Google Scholar
- 24. ,
An out-of-core task-based middleware for data-intensive scientific computing , Handbook on Data Centers, eds. S. U. Khan and A. Y. Zomaya (Springer, New York, NY, 2015), pp. 647–667. Crossref, Google Scholar - 25. , The generation of optimal code for arithmetic expressions, J. ACM 17(4) (1970) 715–728. Crossref, Web of Science, Google Scholar
- 26. ,
A survey of out-of-core algorithms in numerical linear algebra , External Memory Algorithms, Proceedings of a DIMACS Workshop (American Mathematical Society, New Brunswick, New Jersey, USA, 1998), pp. 161–180. Google Scholar
Remember to check out the Most Cited Articles! |
---|
Check out these Handbooks in Computer Science |