World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

A Framework for OpenCL Task Scheduling on Heterogeneous Multicores

    We present an intelligent scheduling framework which takes as input a set of OpenCL kernels and distributes the workload across multiple CPUs and GPUs in a heterogeneous multicore platform. The framework relies on a Machine Learning (ML) based frontend that analyzes static program features of OpenCL kernels and predicts the ratio in which kernels are to be distributed across CPUs and GPUs. The framework provides such static analysis information along with system state information like runtime availability details of computing cores using well defined programming interfaces. Such interfaces are to be utilized by a user specified scheduling strategy. Given such a scheduling strategy, the framework generates device specific binaries and dispatches them across multiple devices in the heterogeneous platform as per the strategy. We test our scheduling framework extensively using different OpenCL task mixes of varying sizes and computational nature. Along with the scheduling framework, we propose a set of novel partition-aware scheduling strategies for heterogeneous multicores. Our proposed approach yields considerably better results in terms of schedule makespan when compared with the current state of the art ML based methods for scheduling of OpenCL workloads across heterogeneous multicores.

    References

    • 1. J. E. Stone, D. Gohara, and G. Shi, OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. CS&E 12(3) :66–73, (2010). Crossref, ISIGoogle Scholar
    • 2. D. Grewe and M. F. P. O’Boyle, A Static Task Partitioning Approach for Heterogeneous Systems using OpenCL, in CC, pages 286–305, (2011). Google Scholar
    • 3. D. Grewe, Z. Wang, and M. F. P. O’Boyle, OpenCL Task Partitioning in the Presence of GPU Contention, in LCPC, pages 87–101, (2011). Google Scholar
    • 4. A. Ghose, S. Dey, P. Mitra and M. Chaudhuri, Divergence Aware Automated Partitioning of OpenCL Workloads, in ISEC, pages 131–135, (2016). Google Scholar
    • 5. K. Kofler, I. Grasso, B. Cosenza, and T. Fahringer, An Automatic Input-Sensitive Approach for Heterogeneous Task Partitioning, in ICS, pages 149–160, (2013). Google Scholar
    • 6. P. Pandit and R. Govindarajan, Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices, in CGO, page 273, (2014). Google Scholar
    • 7. Y. Wen, Z. Wang, and M. F. P. O’Boyle, Smart Multi-task Scheduling for OpenCL programs on CPU/GPU Heterogeneous Platforms, in HIPC, (2014). Google Scholar
    • 8. C. Augonnet, S. Thibault, R. Namyst and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. CC-PE 23(2) :187–198, (2011). Google Scholar
    • 9. A. Hugo, A. Guermouche, P. Wacrenier, and R. Namyst, Composing Multiple Starpu Applications over Heterogeneous Machines: A Supervised Approach. IJHPCA 28(3) :285–300, (2014). Google Scholar
    • 10. J. G. Barbosa and B. Moreira, Dynamic Scheduling of a Batch of Parallel Task Jobs on Heterogeneous Clusters. Parallel Computing 37(8) :428–438, (2011). Crossref, ISIGoogle Scholar
    • 11. J. G. Barbosa, B. Moreira and D. Rodrigues, Dynamic Job Scheduling on Heterogeneous Clusters, in IPDPS, pages 3–10, (2009). Google Scholar
    • 12. V. T. Ravi, M. Becchi, W. Jiang, G. Agrawal, and S. Chakradhar, Scheduling Concurrent Applications on a Cluster of CPU–GPU Nodes, in CCGRID, pages 140–147, (2012). Google Scholar
    • 13. P. Bellens, J. M. Perez, R. Badia, and J. Labarta, CellSs: A Programming Model for the Cell BE Architecture, in SC, pages 5–16, (2006). Google Scholar
    • 14. C. Lattner and V. Adve, LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation, in CGO, pages 75–86, (2004). Google Scholar
    • 15. G. F. Diamos, A. R. Kerr, S. Yalamanchili, and N. Clark, Ocelot: A Dynamic Optimization Framework for Bulk-Synchronous Applications in Heterogeneous System, in PACT, pages 353–364, (2010). Google Scholar
    • 16. B. Coutinho, D. Sampaio, F. M. Q. Pereira, and W. Meira, Divergence Analysis and Optimizations, in PACT, pages 320–329, (2011). Google Scholar
    • 17. D. Sampaio, R. Martins, S. Collange, and F. M. Q. Pereira, Divergence Analysis with Affine Constraints, in SBAC-PAD, pages 67–74, (2012). Google Scholar
    • 18. R. Karrenberg and S. Hack, Improving Performance of OpenCL on CPUs, in CC, pages 1–20, (2012). Google Scholar
    • 19. J. Lee, M. Samadi, Y. Park and S. Mahlke, Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems, in PACT, pages 245–256, (2013). Google Scholar
    • 20. L. Breiman, Random Forests. Machine Learning, pages 5–32, (2001). Crossref, ISIGoogle Scholar
    • 21. M. A. Hall, Correlation-based feature selection for machine learning, The University of Waikato, 1999). Google Scholar
    • 22. T. Mitchell, Machine Learning (McGraw-Hill, 1997). Google Scholar
    • 23. NVIDIA OpenCL SDK, http://developer.nvidia.com/opencl. Google Scholar
    • 24. L. Pouchet, Polybench: The Polyhedral benchmark suite, http://web.cs.ucla.edu/~pouchet/software/polybench/. Google Scholar
    • 25. M. Casas and G. Bronevetsky, Active Measurement of Memory Resource Consumption, in IPDPS, pages 995–1004, (2014). Google Scholar
    • 26. M. Casas and G. Bronevetsky, Evaluation of HPC applications’ memory resource consumption via active measurement, in TPDS, pages 2560–2573, (2016). Google Scholar
    • 27. S. S. Haykin, Neural Networks and Learning Machines (Pearson, 2009). Google Scholar
    • 28. V. Shestak, E. K. P. Chong, and A. A. Maciejewski and H. J. Siegel, Probabilistic Resource Allocation in Heterogeneous Distributed Systems with Random Failures, in JPDC, pages 1186–1194, (2012). Google Scholar
    • 29. A. Burns, G. Bernat, and I. Broster, A probabilistic framework for schedulability analysis, in Embedded Software, pages 1–15, (2003). Google Scholar
    • 30. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, in Proceedings of IEEE, pages 2278–2324, (1998). Google Scholar