World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

A PERFORMANCE EVALUATION OF THE NEHALEM QUAD-CORE PROCESSOR FOR SCIENTIFIC COMPUTING

    In this work we present an initial performance evaluation of Intel's latest, second-generation quad-core processor, Nehalem, and provide a comparison to first-generation AMD and Intel quad-core processors Barcelona and Tigerton. Nehalem is the first Intel processor to implement a NUMA architecture incorporating QuickPath Interconnect for interconnecting processors within a node, and the first to incorporate an integrated memory controller. We evaluate the suitability of these processors in quad-socket compute nodes as building blocks for large-scale scientific computing clusters. Our analysis of intra-processor and intra-node scalability of microbenchmarks, and a range of large-scale scientific applications, indicates that quad-core processors can deliver an improvement in performance of up to 4x over a single core depending on the workload being processed. However, scalability can be less when considering a full node. We show that Nehalem outperforms Barcelona on memory-intensive codes by a factor of two for a Nehalem node with 8 cores and a Barcelona node containing 16 cores. Further optimizations are possible with Nehalem, including the use of Simultaneous Multithreading, which improves the performance of some applications by up to 50%.

    References

    • A.   Hoisie et al. , A Performance Comparison Through Benchmarking and Modeling of Three Supercomputers: Blue Gene/L, Read Storm and ASC Purple , Proc. IEEE/ACM SuperComputing ( 2006 ) . Google Scholar
    • K. J.   Barker et al. , Entering the Petaflop Era: The Architecture and Performance of Roadrunner , Proc. IEEE/ACM SuperComputing ( 2008 ) . Google Scholar
    • K.J. Barker, K. Davis, A. Hoisie, D.J. Kerbyson, M. Lang, S. Pakin, J.C. Sancho, Experiences in Scaling Scientific Applications on Current-generation Quad-core Processors, in Proc. workshop on Large-Scale Parallel Processing (LSPP), Int. Parallel and Distributed Processing Symposium (IPDPS), Miami, FL, Apr. 2008 . Google Scholar
    • Intel White Paper, First the Tick, Now the Tock: Next Generation Intel Microarchitecture (Nehalem), Intel publication 0408/VP/HBD/PDF 319724-001US, Apr. 2008 . Google Scholar
    • AMD White Paper, Hyper Transport Technology I/O Link, A High-Bandwidth I/O Architecture, Advanced Micro Devices, Inc., One AMD Place, Sunnyvale, CA 94088, #25012A, July 20, 2001 . Google Scholar
    • Intel Corporation, Quad-core Intel Xeon Processor 7300 Series. Product Brief. 2007 . Google Scholar
    • J. McCalpin, Memory bandwidth and machine balance in current high performance computers, IEEE Comp. Soc. Tech. committee on Computer Architecture (TCCA) Newsletter (1995) pp. 19–25. Google Scholar
    • N.   Wichmann , M.   Adams and S.   Ethier , New Advances in the Gyrokinetic Toriodal Code and Their Impact on Performance on the Cray XT Series , Proc. Cray User Group (CUG) ( 2007 ) . Google Scholar
    • R. S. Baker, Nuclear Science & Engineering 141(1), 1 (2002). Crossref, ISIGoogle Scholar
    • D. J.   Kerbyson et al. , Predictive Performance and Scalability Modeling of a Large-scale Application , Proc. IEEE/ACM Supercomputing ( 2001 ) . Google Scholar
    • S. J. Zhouet al., J. of Computer-Aided Materials Design 3(3), 183 (1995). CrossrefGoogle Scholar
    • K. R. Koch, R. S. Baker and R. E. Alcouffe, Trans, of the American Nuclear Soc. 65, 198 (1992). Google Scholar
    • A. Hoisie, O. Lubeck and H. J. Wasserman, Int. J. of High Performance Computing Applications 14(4), 330 (2000). Crossref, ISIGoogle Scholar
    • J.M. Blondin, VH-1 User's Guide. North Carolina State University, 1999 . Google Scholar
    • K. Bowers, Speed optimal implementation of a fully relativistic 3d particle push with charge conserving current accumulation on modern processors, Proc. 18th Int. Conf. Numerical Simulation of Plasmas (2003) p. 383. Google Scholar