World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

HYBRID-PARALLEL SPARSE MATRIX-VECTOR MULTIPLICATION WITH EXPLICIT COMMUNICATION OVERLAP ON CURRENT MULTICORE-BASED SYSTEMS

    We evaluate optimized parallel sparse matrix-vector operations for several representative application areas on widespread multicore-based cluster configurations. First the single-socket baseline performance is analyzed and modeled with respect to basic architectural properties of standard multicore chips. Beyond the single node, the performance of parallel sparse matrix-vector operations is often limited by communication overhead. Starting from the observation that nonblocking MPI is not able to hide communication cost using standard MPI implementations, we demonstrate that explicit overlap of communication and computation can be achieved by using a dedicated communication thread, which may run on a virtual core. Moreover we identify performance benefits of hybrid MPI/OpenMP programming due to improved load balancing even without explicit communication overlap. We compare performance results for pure MPI, the widely used "vector-like" hybrid programming strategies, and explicit overlap on a modern multicore-based cluster and a Cray XE6 system.

    References

    • G. Goumaset al., J. Supercomputing 50(1), 36 (2008). Crossref, ISIGoogle Scholar
    • S. Williamset al., Parallel Computing 35, 178 (2009). Crossref, ISIGoogle Scholar
    • N. Bell and M. Garland: Implementing sparse matrix-vector multiplication on throughput-oriented processors. Proceedings of SC09 . Google Scholar
    • M. Krotkiewski and M. Dabrowski, Parallel Computing 36(4), 181 (2010). Crossref, ISIGoogle Scholar
    • R. Geuss and S. Röllin, Parallel Computing 27(1), 883 (2001). Crossref, ISIGoogle Scholar
    • R. Rabenseifner and G. Wellein, International Journal of High Performance Computing Applications 17, 49 (2003). Crossref, ISIGoogle Scholar
    • G.   Wellein et al. , Fast sparse matrix-vector multiplication for TFlop/s computers , Proceedings of VECPAR2002 , LNCS   2565 ( Springer , Berlin , 2003 ) . Google Scholar
    • R. Barrett et al. , Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods ( SIAM , 1993 ) . Google Scholar
    • G.   Hager and G.   Wellein , Introduction to High Performance Computing for Scientists and Engineers ( CRC Press , 2010 ) . CrossrefGoogle Scholar
    • A. Weißeet al., Rev. Mod. Phys. 78, 275 (2006). Crossref, ISIGoogle Scholar
    • A. Weiße and H. Fehske, Chebyshev expansion techniques, Lecture Notes in Physics 739 (Springer, Berlin Heidelberg, 2008) pp. 545–577. CrossrefGoogle Scholar
    • H. Fehskeet al., Phys. Rev. B 69, 165115 (2004). Crossref, ISIGoogle Scholar
    • E. Cuthill and J. McKee, Reducing the bandwidth of sparse symmetric matrices, Proceedings of 24th national conference (ACM '69) (ACM, New York, NY, USA) pp. 157–172. Google Scholar
    • K.   Stüben , An Introduction to Algebraic Multigrid , eds. U. Trottenberg et al. ( Academic Press , 2000 ) . Google Scholar
    • http://www.scai.fraunhofer.de/en/business-research-areas/numerical-software/products/samg.html . Google Scholar
    • A. Basermann et al.: HICFD - Highly Efficient Implementation of CFD Codes for HPC Many-Core Architectures. In: Proceedings of CiHPC, Springer 2011 [in print] . Google Scholar
    • A. Buluc, S. W. Williams, L. Oliker, and J. Demmel: Reduced-Bandwidth Multithreaded Algorithms for Sparse-Matrix Vector Multiplication. Proc. IPDPS 2011 (to appear) , http://gauss.cs.ucsb.edu/~aydin/ipdps2011.pdf . Google Scholar
    • http://code.google.com/p/likwid . Google Scholar
    • G. Schubert, G. Hager and H. Fehske, Performance limitations for sparse matrix-vector multiplications on current multicore environments, High Performance Computing in Science and Engineering (Springer, Berlin Heidelberg, 2010) pp. 13–26. Google Scholar
    • R. Rabenseifner, G. Hager, and G. Jost: Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes. In: Proceedings of PDP 2009 . Google Scholar