A Framework for OpenCL Task Scheduling on Heterogeneous Multicores
Abstract
We present an intelligent scheduling framework which takes as input a set of OpenCL kernels and distributes the workload across multiple CPUs and GPUs in a heterogeneous multicore platform. The framework relies on a Machine Learning (ML) based frontend that analyzes static program features of OpenCL kernels and predicts the ratio in which kernels are to be distributed across CPUs and GPUs. The framework provides such static analysis information along with system state information like runtime availability details of computing cores using well defined programming interfaces. Such interfaces are to be utilized by a user specified scheduling strategy. Given such a scheduling strategy, the framework generates device specific binaries and dispatches them across multiple devices in the heterogeneous platform as per the strategy. We test our scheduling framework extensively using different OpenCL task mixes of varying sizes and computational nature. Along with the scheduling framework, we propose a set of novel partition-aware scheduling strategies for heterogeneous multicores. Our proposed approach yields considerably better results in terms of schedule makespan when compared with the current state of the art ML based methods for scheduling of OpenCL workloads across heterogeneous multicores.
References
- 1. , OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. CS&E 12(3) :66–73, (2010). Crossref, ISI, Google Scholar
- 2. , A Static Task Partitioning Approach for Heterogeneous Systems using OpenCL, in CC, pages 286–305, (2011). Google Scholar
- 3. , OpenCL Task Partitioning in the Presence of GPU Contention, in LCPC, pages 87–101, (2011). Google Scholar
- 4. , Divergence Aware Automated Partitioning of OpenCL Workloads, in ISEC, pages 131–135, (2016). Google Scholar
- 5. , An Automatic Input-Sensitive Approach for Heterogeneous Task Partitioning, in ICS, pages 149–160, (2013). Google Scholar
- 6. , Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices, in CGO, page 273, (2014). Google Scholar
- 7. , Smart Multi-task Scheduling for OpenCL programs on CPU/GPU Heterogeneous Platforms, in HIPC, (2014). Google Scholar
- 8. , StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. CC-PE 23(2) :187–198, (2011). Google Scholar
- 9. , Composing Multiple Starpu Applications over Heterogeneous Machines: A Supervised Approach. IJHPCA 28(3) :285–300, (2014). Google Scholar
- 10. , Dynamic Scheduling of a Batch of Parallel Task Jobs on Heterogeneous Clusters. Parallel Computing 37(8) :428–438, (2011). Crossref, ISI, Google Scholar
- 11. , Dynamic Job Scheduling on Heterogeneous Clusters, in IPDPS, pages 3–10, (2009). Google Scholar
- 12. , Scheduling Concurrent Applications on a Cluster of CPU–GPU Nodes, in CCGRID, pages 140–147, (2012). Google Scholar
- 13. , CellSs: A Programming Model for the Cell BE Architecture, in SC, pages 5–16, (2006). Google Scholar
- 14. , LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation, in CGO, pages 75–86, (2004). Google Scholar
- 15. , Ocelot: A Dynamic Optimization Framework for Bulk-Synchronous Applications in Heterogeneous System, in PACT, pages 353–364, (2010). Google Scholar
- 16. , Divergence Analysis and Optimizations, in PACT, pages 320–329, (2011). Google Scholar
- 17. , Divergence Analysis with Affine Constraints, in SBAC-PAD, pages 67–74, (2012). Google Scholar
- 18. , Improving Performance of OpenCL on CPUs, in CC, pages 1–20, (2012). Google Scholar
- 19. , Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems, in PACT, pages 245–256, (2013). Google Scholar
- 20. , Random Forests. Machine Learning, pages 5–32, (2001). Crossref, ISI, Google Scholar
- 21. , Correlation-based feature selection for machine learning, The University of Waikato, 1999). Google Scholar
- 22. , Machine Learning (McGraw-Hill, 1997). Google Scholar
- 23. NVIDIA OpenCL SDK, http://developer.nvidia.com/opencl. Google Scholar
- 24. L. Pouchet, Polybench: The Polyhedral benchmark suite, http://web.cs.ucla.edu/~pouchet/software/polybench/. Google Scholar
- 25. , Active Measurement of Memory Resource Consumption, in IPDPS, pages 995–1004, (2014). Google Scholar
- 26. , Evaluation of HPC applications’ memory resource consumption via active measurement, in TPDS, pages 2560–2573, (2016). Google Scholar
- 27. , Neural Networks and Learning Machines (Pearson, 2009). Google Scholar
- 28. , Probabilistic Resource Allocation in Heterogeneous Distributed Systems with Random Failures, in JPDC, pages 1186–1194, (2012). Google Scholar
- 29. , A probabilistic framework for schedulability analysis, in Embedded Software, pages 1–15, (2003). Google Scholar
- 30. , Gradient-based learning applied to document recognition, in Proceedings of IEEE, pages 2278–2324, (1998). Google Scholar


