World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

HIGH LATENCY AND CONTENTION ON SHARED L2-CACHE FOR MANY-CORE ARCHITECTURES

    Several studies point out the benefits of a shared L2 cache, but some other properties of shared caches must be considered to lead to a thorough understanding of all chip multiprocessor (CMP) bottlenecks. Our paper evaluates and explains shared cache bottlenecks, which are very important considering the rise of many-core processors. The results of our simulations with 32 cores show low performance when L2 cache memory is shared between 2 or 4 cores. In these two cases, the increase of L2 cache latency and contention are the main causes responsible for the increase of execution time.

    References

    • Alameldeen, A. R., Mauer, C. J., Xu, M., Harper, P. J., Martin, M. M., Sorin, D. J., Hill, M. D., and Wood, D. A., Evaluating non-deterministic multi-threaded commercial workloads, Computer Architecture Evaluation using Comercial Workloads (2002) . Google Scholar
    • A. R. Alameldeen and D. A. Wood, Variability in architectural simulations of multi-threaded workloads, Proceedings HPCA: Int. Symp. on High-Performance Computer Architecture pp. 7–18. Google Scholar
    • M. A. Z. Alves, H. C. Freitas and P. O. A. Navaux, Investigation of shared l2 cache on many-core processors, Proceedings Workshop on Many-Core (VDE Verlag GMBH, Berlin, 2009) pp. 21–30. Google Scholar
    • L. Benini and D. Bertozzi, Computers and Digital Techniques 152, 261 (2005), DOI: 10.1049/ip-cdt:20045100. Crossref, ISIGoogle Scholar
    • B. Chapman, High-Performance Computing: paradigm and infrastructure, eds. L. T. Yang and M. Guo (Wiley Press, 2005) pp. 21–50. Google Scholar
    • G.   De Micheli and L.   Benini , Networks on Chips: Technology and Tools ( Morgan Kaufmann , 2006 ) . Google Scholar
    • H. C. Freitas, T. G. S. Santos and P. O. A. Navaux, Electronics Letters 44, 969 (2008), DOI: 10.1049/el:20080854. Crossref, ISIGoogle Scholar
    • J. Gibsonet al., ACM SIGPLAN Notices 35, 49 (2000), DOI: 10.1145/356989.356994. CrossrefGoogle Scholar
    • R.   Jain , The art of computer systems performance analysis: techniques for experimental design, measurement, simulation, and modeling ( J. Wiley , New York, USA , 1991 ) . Google Scholar
    • Jin, H., Frumkin, M., and Yan, J., The openmp implementation of nas parallel benchmarks and its performance, in Technical Report: NAS-99-011 (1999) . Google Scholar
    • P. Kongetira, K. Aingaran and K. Olukotun, IEEE Micro 25, 21 (2005), DOI: 10.1109/MM.2005.35. Crossref, ISIGoogle Scholar
    • D. J.   Lilja , Measuring Computer Performance ( Cambridge University Press , Cambridge , 2004 ) . Google Scholar
    • P. Magnussonet al., IEEE Computer Micro 35, 50 (2002). CrossrefGoogle Scholar
    • M. D. Marino, 32-core cmp with multi-sliced 12: 2 and 4 cores sharing a 12 slice, Proceedings SBAC-PAD: Int. Symp. on Computer Architecture and High Performance Computing (IEEE, 2006) pp. 141–150. Google Scholar
    • M. D. Marino, L2-cache hierarchical organizations for multi-core architectures, Proceedings ISPA: Int. Symp. on Parallel and Distributed Processing and Applications (IEEE, 2006) pp. 74–83. Google Scholar
    • B. A. Nayfeh, K. Olukotun and J. P. Singht, The impact of shared-cache clustering in small-scale shared-memory multiprocessors, Proceedings HPCA: Second Int. Symp. on High-Performance Computer Architecture (IEEE, 1996) pp. 74–84. Google Scholar
    • Shyamkumar Thoziyoor and Naveen Muralimanohar and Norman P. Jouppi, Cacti 5.0, HP Laboratories (2007) . Google Scholar
    • B. Sinharoyet al., IBM Journal of Research and Development 49, (2005), DOI: 10.1147/rd.494.0505. Google Scholar
    • M. M. Zahran, SIGARCH Computer Architecture News 31, 39 (2003), DOI: 10.1145/773365.773370. CrossrefGoogle Scholar