World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

DATA ACCESS CHARACTERISTICS AND OPTIMIZATIONS FOR SUN ULTRASPARC T2 AND T2+ SYSTEMS

    Processor and system architectures that feature multiple memory controllers and/or ccNUMA characteristics are prone to show bottlenecks and erratic performance numbers on scientific codes. Although cache thrashing, aliasing conflicts, and ccNUMA locality and contention problems are well known for many types of systems, they take on peculiar forms on the new Sun UltraSPARC T2 and T2+ processors, which we use here as prototypical multi-core designs. We analyze performance patterns in low-level and application benchmarks and put some emphasis on a comparison of performance features between T2 and its successor. Furthermore we show ways to circumvent bottlenecks by careful data layout, placement and padding.

    References

    • Sun Microsystems: OpenSPARC T2 Core Microarchitecture Specification, http://opensparc-t2.sunsource.net/specs/OpenSPARCT2_Core_Micro_Arch.pdf . Google Scholar
    • Sun Microsystems, private communication . Google Scholar
    • http://www.cs.Virginia.edu/stream/ . Google Scholar
    • S. W.   Williams et al. , Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms , Proceedings of SC07 ( 2007 ) . Google Scholar
    • W.   Schönauer , Scientific Supercomputing: Architecture and Use of Shared and Distributed Memory Parallel Computers ( Karlsruhe , 2000 ) ,   http://www.rz.uni-karlsruhe.de/~rx03/book . Google Scholar
    • M. H. Austern: Segmented Iterators and Hierarchical Algorithms. In M. Jazayeri, R. G. K. Loos, and D. R. Musser (ed.), Generic programming: International Seminar on Generic Programming, Dagstuhl Castle, Germany, Apr 28 - May 1, 1998, Selected Papers, Springer, 2000. Series: Lecture Notes in Computer Science, Vol. 1766 ISBN: 978-3-540-41090-4 . Google Scholar
    • H. Stengel: C++ programming techniques for High Performance Computing on systems with non-uniform memory access using OpenMP. Diploma thesis, University of Applied Sciences Nuremberg, 2007 (in German), http://www.hpc.rrze.uni-erlangen.de/Projekte/numa.shtml . Google Scholar
    • G. Hager et al.: Using segmented iterators for locality and alignment optimizations on current cache-based architectures. In preparation . Google Scholar
    • G. Welleinet al., Computers & Fluids 35, 910 (2006). Crossref, ISIGoogle Scholar
    • S. Donathet al., Int. J. Comp. Sci. Eng.  (2007). Google Scholar
    • J.   Reinders , Intel Threading Building Blocks ( O'Reilly , 2007 ) . Google Scholar