DATA ACCESS CHARACTERISTICS AND OPTIMIZATIONS FOR SUN ULTRASPARC T2 AND T2+ SYSTEMS
Abstract
Processor and system architectures that feature multiple memory controllers and/or ccNUMA characteristics are prone to show bottlenecks and erratic performance numbers on scientific codes. Although cache thrashing, aliasing conflicts, and ccNUMA locality and contention problems are well known for many types of systems, they take on peculiar forms on the new Sun UltraSPARC T2 and T2+ processors, which we use here as prototypical multi-core designs. We analyze performance patterns in low-level and application benchmarks and put some emphasis on a comparison of performance features between T2 and its successor. Furthermore we show ways to circumvent bottlenecks by careful data layout, placement and padding.
References
- Sun Microsystems: OpenSPARC T2 Core Microarchitecture Specification, http://opensparc-t2.sunsource.net/specs/OpenSPARCT2_Core_Micro_Arch.pdf . Google Scholar
- Sun Microsystems, private communication . Google Scholar
- http://www.cs.Virginia.edu/stream/ . Google Scholar
-
S. W. Williams , Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms , Proceedings of SC07 ( 2007 ) . Google Scholar -
W. Schönauer , Scientific Supercomputing: Architecture and Use of Shared and Distributed Memory Parallel Computers ( Karlsruhe , 2000 ) , http://www.rz.uni-karlsruhe.de/~rx03/book . Google Scholar - M. H. Austern: Segmented Iterators and Hierarchical Algorithms. In M. Jazayeri, R. G. K. Loos, and D. R. Musser (ed.), Generic programming: International Seminar on Generic Programming, Dagstuhl Castle, Germany, Apr 28 - May 1, 1998, Selected Papers, Springer, 2000. Series: Lecture Notes in Computer Science, Vol. 1766 ISBN: 978-3-540-41090-4 . Google Scholar
- H. Stengel: C++ programming techniques for High Performance Computing on systems with non-uniform memory access using OpenMP. Diploma thesis, University of Applied Sciences Nuremberg, 2007 (in German), http://www.hpc.rrze.uni-erlangen.de/Projekte/numa.shtml . Google Scholar
- G. Hager et al.: Using segmented iterators for locality and alignment optimizations on current cache-based architectures. In preparation . Google Scholar
- Computers & Fluids 35, 910 (2006). Crossref, ISI, Google Scholar
- Int. J. Comp. Sci. Eng. (2007). Google Scholar
-
J. Reinders , Intel Threading Building Blocks ( O'Reilly , 2007 ) . Google Scholar


