OmpSs: A PROPOSAL FOR PROGRAMMING HETEROGENEOUS MULTI-CORE ARCHITECTURES
Abstract
In this paper, we present OmpSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on different architectures, SMP, GPUs, and hybrid SMP/GPU environments, showing the wide usefulness of the approach. The evaluation is done with six different benchmarks, Matrix Multiply, BlackScholes, Perlin Noise, Julia Set, PBPI and FixedGrid. We compare the results obtained with the execution of the same benchmarks written in OpenCL or OpenMP, on the same architectures. The results show that OmpSs greatly outperforms both environments. With the use of OmpSs the programming environment is more flexible than traditional approaches to exploit multiple accelerators, and due to the simplicity of the annotations, it increases programmer's productivity.
References
- AMD Corporation. The AMD Fusion Family of APUs , http://fusion.amd.com . Google Scholar
- AMD/ATI. OpenCL: The Open Standard for Parallel Programming of GPUs and Multi–core CPUs, 2010 , http://www.amd.com/us/products/technologies/stream-technology/opencl/Pages/opencl.aspx . Google Scholar
C. Augonnet and R. Namyst , A unified runtime system for heterogeneous multicore architectures, Proceedings of the International Euro-Par Workshops 2008, HPPC'085415,Lecture Notes in Computer Science (Springer, Las Palmas de Gran Canaria, Spain, 2008) pp. 174–183. Google Scholar-
C. Augonnet , StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures , Concurrency and Computation: Practice and Experience, Euro-Par 2009 best papers issue . Google Scholar E. Ayguade , A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures, IWOMP: Evolving OpenMP in an Age of Extreme Parallelism5568 (Springer, Dresden, Germany, 2009) pp. 154–167. Google Scholar-
E. Ayguadé , A Proposal for Task Parallelism in OpenMP , Third International Workshop on OpenMP (IWOMP) . Google Scholar - SIGPLAN Not. 30(8), 207 (1995), DOI: 10.1145/209937.209958. Crossref, ISI, Google Scholar
- W. P. L. Carter. Documentation of the saprc-99 chemical mechanism for voc reactivity assessment. Final Report Contract No. 92-329, California Air Resources Board, May 8 2000 . Google Scholar
P. Cooper , Offload – automating code migration to heterogeneous multicore systems, HiPEAC Conference 2010,Lecture Notes in Computer Science (2010) pp. 307–321. Google Scholar-
R. Dolbeau , S. Bihan and F. Bodin , HMPP: A Hybrid Multi-core Parallel Programming Environment , Workshop on General Processing Using GPUs . Google Scholar A. Duran , Extending the OpenMP Tasking Model to Allow Dependent Tasks, OpenMP in a New Era of Parallelism (Springer, Berlin / Heidelberg, 2008) pp. 111–122. Google Scholar- IBM Systems Journal 45(1), 59 (2006), DOI: 10.1147/sj.451.0059. Crossref, ISI, Google Scholar
X. Feng , K. W. Cameron and D. A. Buell , Pbpi: a high performance implementation of bayesian phylogenetic inference, SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing (ACM, New York, NY, USA, 2006) p. 75. Google Scholar- IEEE Micro 30, 42 (2010). Google Scholar
- W. Hundsdorfer. Numerical solution of advection-diffusion-reaction equations. Technical report, Centrum voor Wiskunde en Informatica, 1996 . Google Scholar
- IBM Corporation. OpenCL, 2010 , http://www.alphaworks.ibm.com/tech/opencl . Google Scholar
- Intel Corporation. Intel Unveils Product Plans for HPC, May 2010 , http://www.intel.com/pressroom/archive/releases/2010/20100531comp.htm . Google Scholar
-
P. Jetley , Scaling Hierarchical N-body Simulations on GPU Clusters , Proceedings of the ACM/IEEE Supercomputing Conference 2010 . Google Scholar - Khronos OpenCL Working Group. The OpenCL Specification, version 1.0.29, 8 December 2008 . Google Scholar
-
V. Kindratenko , GPU Clusters for High-Performance Computing , Workshop on Parallel Programming on Accelerator Clusters, IEEE Int. Conf. on Cluster Comp. . Google Scholar -
T. J. Knight , Compilation for explicitly managed memory hierarchies , Proceedings of the 2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming . Google Scholar -
M. Linderman , Merge: A Programming Model for Heterogeneous Multi-core Systems , Proc. of the 14th Int. Conf. on Arch. Support for Prog. Languages and Operating Systems (ASPLOS) . Google Scholar -
J. C. Linford and A. Sandu , Optimizing large scale chemical transport models for multicore platforms , Proceedings of the 2008 Spring Simulation Multiconference . Google Scholar - NVIDIA Corporation. NVIDIA CUDA Compute Unified Device Architecture Version 2.0, 2008 . Google Scholar
- NVIDIA Corporation. OpenCL, 2010 , http://www.nvidia.com/object/cuda_opencl_new.html . Google Scholar
- International Journal of Parallel Programming 36(3), 289 (2008). Crossref, ISI, Google Scholar
- OpenMP Architecture Review Board. OpenMP Application Program Interface. Version 3.0, May 2008 . Google Scholar
J. M. Perez , R. M. Badia and J. Labarta , A dependency-aware task-based programming environment for multi-core architectures, IEEE Int. Conference on Cluster Computing (2008) pp. 142–151. Google Scholar- RapidMind. RapidMind Multi-core Development Platform , http://www.rapidmind.com/pdfs/RapidmindDatasheet.pdf . Google Scholar
- Journal of Computational Physics 204, 222 (2005), DOI: 10.1016/j.jcp.2004.10.011. Crossref, ISI, Google Scholar
-
S.-Z. Ueng , CUDA-lite: Reducing GPU Programming Complexity , Languages and Compilers for Parallel Computing (LCPC) 21st Annual Workshop . Google Scholar P. Wang , EXOCHI: Architecture and programming environment for a heterogeneous multi-core multithreaded system, Proc. of PLDI (2007) pp. 156–166. Google Scholar


