World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.
Special Issue on High-Level Programming for Heterogeneous and Hierarchical Parallel Systems; Guest-editors: Gaétan Hains and Frédéric Gava (LACL, Université Paris-Est, France), and Kevin Hammond (University of St. Andrews, UK)No Access

ASYMPTOTIC PEAK UTILISATION IN HETEROGENEOUS PARALLEL CPU/GPU PIPELINES: A DECENTRALISED QUEUE MONITORING STRATEGY

    Widespread heterogeneous parallelism is unavoidable given the emergence of General-Purpose computing on graphics processing units (GPGPU). The characteristics of a Graphics Processing Unit (GPU)—including significant memory transfer latency and complex performance characteristics—demand new approaches to ensuring that all available computational resources are efficiently utilised. This paper considers the simple case of a divisible workload based on widely-used numerical linear algebra routines and the challenges that prevent efficient use of all resources available to a naive SPMD application using the GPU as an accelerator. We suggest a possible queue monitoring strategy that facilitates resource usage with a view to balancing the CPU/GPU utilisation for applications that fit the pipeline parallel architectural pattern on heterogeneous multicore/multi-node CPU and GPU systems. We propose a stochastic allocation technique that may serve as a foundation for heuristic approaches to balancing CPU/GPU workloads.

    This work has been partially funded by the European Commission FP7 through the project ParaPhrase: Parallel Patterns for Adaptive Heterogeneous Multicore Systems, under contract no.: 288570 (10/2011-9/2014).

    References

    • F. Almeidaet al., Concurrency and Computation: Practice and Experience 17(9), 1173 (2005). Crossref, ISIGoogle Scholar
    • K. Asanovicet al., Communications of the ACM 52(10), 56 (2009). Crossref, ISIGoogle Scholar
    • A. Benoitet al., Workload balancing and throughput optimization for heterogeneous systems subject to failures, Euro-Par 20116852, LNCS (Springer, Bordeaux, 2011) pp. 242–254. Google Scholar
    • J. Bolzet al., ACM Transactions on Graphics 22(3), 917 (2003). Crossref, ISIGoogle Scholar
    • W. M. Brownet al., Computer Physics Communications 183(3), 449 (2012). Crossref, ISIGoogle Scholar
    • M. Daga, A. M. Aji and W.-c. Feng, On the efficacy of a fused CPU+GPU processor (or APU) for parallel computing, SAAHPC'11 (IEEE, Knoxville, 2011) pp. 141–149. Google Scholar
    • J. J.   Dongarra et al. , Numerical linear algebra for high-performance computers , 2nd edn. ( SIAM , 1998 ) . CrossrefGoogle Scholar
    • A. H. El Zein and A. P. Rendell, Concurrency and Computation: Practice and Experience 24(1), 3 (2012). Crossref, ISIGoogle Scholar
    • N. Fujimoto, Parallel Processing Letters 18(4), 511 (2008). LinkGoogle Scholar
    • M. Garba, H. González-Vélez and D. Roach, Parallel computational modelling of inelastic neutron scattering in multi-node and multi-core architectures, IEEE HPCC-10 (IEEE, Melbourne, 2010) pp. 509–514. Google Scholar
    • M. Garba, H. González-Vélez, and D. Roach. GPU Acceleration for Hermitian Eigen-systems. Transactions on Computational Collective Intelligence, 2012. In Press . Google Scholar
    • C. J. Gillanet al., Concurrency and Computation: Practice and Experience 24(1), 84 (2012). Crossref, ISIGoogle Scholar
    • H. González-Vélez and M. Cole, Concurrency and Computation: Practice and Experience 22(15), 2073 (2010). ISIGoogle Scholar
    • H. González-Vélez and M. Leyton, Software–Practice and Experience 40(12), 1135 (2010). Crossref, ISIGoogle Scholar
    • D. Luebke and G. Humphreys, Computer 40(2), 96 (2007). Crossref, ISIGoogle Scholar
    • T.   Mattson , B.   Sanders and B.   Massingill , Patterns for parallel programming ( Addison-Wesley Professional , 2004 ) . Google Scholar
    • B. T.   Smith et al. , Matrix Eigensystem Routines - EISPACK Guide 6 , LNCS ( Springer-Verlag , 1976 ) . Google Scholar
    • J. Stuart, M. Cox and J. Owens, GPU-to-CPU callbacks, Euro-Par 20106586, LNCS (Springer-Verlag, Ischia, 2011) pp. 365–372. Google Scholar
    • S. Tomovet al., Dense linear algebra solvers for multicore with GPU accelerators, IPDPS 2010 (IEEE, Atlanta, 2010) pp. 1–8. Google Scholar