A PERFORMANCE EVALUATION OF THE NEHALEM QUAD-CORE PROCESSOR FOR SCIENTIFIC COMPUTING
Abstract
In this work we present an initial performance evaluation of Intel's latest, second-generation quad-core processor, Nehalem, and provide a comparison to first-generation AMD and Intel quad-core processors Barcelona and Tigerton. Nehalem is the first Intel processor to implement a NUMA architecture incorporating QuickPath Interconnect for interconnecting processors within a node, and the first to incorporate an integrated memory controller. We evaluate the suitability of these processors in quad-socket compute nodes as building blocks for large-scale scientific computing clusters. Our analysis of intra-processor and intra-node scalability of microbenchmarks, and a range of large-scale scientific applications, indicates that quad-core processors can deliver an improvement in performance of up to 4x over a single core depending on the workload being processed. However, scalability can be less when considering a full node. We show that Nehalem outperforms Barcelona on memory-intensive codes by a factor of two for a Nehalem node with 8 cores and a Barcelona node containing 16 cores. Further optimizations are possible with Nehalem, including the use of Simultaneous Multithreading, which improves the performance of some applications by up to 50%.
References
-
A. Hoisie , A Performance Comparison Through Benchmarking and Modeling of Three Supercomputers: Blue Gene/L, Read Storm and ASC Purple , Proc. IEEE/ACM SuperComputing ( 2006 ) . Google Scholar -
K. J. Barker , Entering the Petaflop Era: The Architecture and Performance of Roadrunner , Proc. IEEE/ACM SuperComputing ( 2008 ) . Google Scholar - K.J. Barker, K. Davis, A. Hoisie, D.J. Kerbyson, M. Lang, S. Pakin, J.C. Sancho, Experiences in Scaling Scientific Applications on Current-generation Quad-core Processors, in Proc. workshop on Large-Scale Parallel Processing (LSPP), Int. Parallel and Distributed Processing Symposium (IPDPS), Miami, FL, Apr. 2008 . Google Scholar
- Intel White Paper, First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem), Intel publication 0408/VP/HBD/PDF 319724-001US, Apr. 2008 . Google Scholar
- AMD White Paper, Hyper Transport Technology I/O Link, A High-Bandwidth I/O Architecture, Advanced Micro Devices, Inc., One AMD Place, Sunnyvale, CA 94088, #25012A, July 20, 2001 . Google Scholar
- Intel Corporation, Quad-core Intel Xeon Processor 7300 Series. Product Brief. 2007 . Google Scholar
J. McCalpin , Memory bandwidth and machine balance in current high performance computers, IEEE Comp. Soc. Tech. committee on Computer Architecture (TCCA) Newsletter (1995) pp. 19–25. Google Scholar-
N. Wichmann , M. Adams and S. Ethier , New Advances in the Gyrokinetic Toriodal Code and Their Impact on Performance on the Cray XT Series , Proc. Cray User Group (CUG) ( 2007 ) . Google Scholar - Nuclear Science & Engineering 141(1), 1 (2002). Crossref, ISI, Google Scholar
-
D. J. Kerbyson , Predictive Performance and Scalability Modeling of a Large-scale Application , Proc. IEEE/ACM Supercomputing ( 2001 ) . Google Scholar - J. of Computer-Aided Materials Design 3(3), 183 (1995). Crossref, Google Scholar
- Trans, of the American Nuclear Soc. 65, 198 (1992). Google Scholar
- Int. J. of High Performance Computing Applications 14(4), 330 (2000). Crossref, ISI, Google Scholar
- J.M. Blondin, VH-1 User's Guide. North Carolina State University, 1999 . Google Scholar
K. Bowers , Speed optimal implementation of a fully relativistic 3d particle push with charge conserving current accumulation on modern processors, Proc. 18th Int. Conf. Numerical Simulation of Plasmas (2003) p. 383. Google Scholar


