HIGH LATENCY AND CONTENTION ON SHARED L2-CACHE FOR MANY-CORE ARCHITECTURES
Abstract
Several studies point out the benefits of a shared L2 cache, but some other properties of shared caches must be considered to lead to a thorough understanding of all chip multiprocessor (CMP) bottlenecks. Our paper evaluates and explains shared cache bottlenecks, which are very important considering the rise of many-core processors. The results of our simulations with 32 cores show low performance when L2 cache memory is shared between 2 or 4 cores. In these two cases, the increase of L2 cache latency and contention are the main causes responsible for the increase of execution time.
References
- Alameldeen, A. R., Mauer, C. J., Xu, M., Harper, P. J., Martin, M. M., Sorin, D. J., Hill, M. D., and Wood, D. A., Evaluating non-deterministic multi-threaded commercial workloads, Computer Architecture Evaluation using Comercial Workloads (2002) . Google Scholar
A. R. Alameldeen and D. A. Wood , Variability in architectural simulations of multi-threaded workloads, Proceedings HPCA: Int. Symp. on High-Performance Computer Architecture pp. 7–18. Google ScholarM. A. Z. Alves , H. C. Freitas and P. O. A. Navaux , Investigation of shared l2 cache on many-core processors, Proceedings Workshop on Many-Core (VDE Verlag GMBH, Berlin, 2009) pp. 21–30. Google Scholar- Computers and Digital Techniques 152, 261 (2005), DOI: 10.1049/ip-cdt:20045100. Crossref, ISI, Google Scholar
- , High-Performance Computing: paradigm and infrastructure, eds.
L. T. Yang and M. Guo (Wiley Press, 2005) pp. 21–50. Google Scholar -
G. De Micheli and L. Benini , Networks on Chips: Technology and Tools ( Morgan Kaufmann , 2006 ) . Google Scholar - Electronics Letters 44, 969 (2008), DOI: 10.1049/el:20080854. Crossref, ISI, Google Scholar
- ACM SIGPLAN Notices 35, 49 (2000), DOI: 10.1145/356989.356994. Crossref, Google Scholar
-
R. Jain , The art of computer systems performance analysis: techniques for experimental design, measurement, simulation, and modeling ( J. Wiley , New York, USA , 1991 ) . Google Scholar - Jin, H., Frumkin, M., and Yan, J., The openmp implementation of nas parallel benchmarks and its performance, in Technical Report: NAS-99-011 (1999) . Google Scholar
- IEEE Micro 25, 21 (2005), DOI: 10.1109/MM.2005.35. Crossref, ISI, Google Scholar
-
D. J. Lilja , Measuring Computer Performance ( Cambridge University Press , Cambridge , 2004 ) . Google Scholar - IEEE Computer Micro 35, 50 (2002). Crossref, Google Scholar
M. D. Marino , 32-core cmp with multi-sliced 12: 2 and 4 cores sharing a 12 slice, Proceedings SBAC-PAD: Int. Symp. on Computer Architecture and High Performance Computing (IEEE, 2006) pp. 141–150. Google ScholarM. D. Marino , L2-cache hierarchical organizations for multi-core architectures, Proceedings ISPA: Int. Symp. on Parallel and Distributed Processing and Applications (IEEE, 2006) pp. 74–83. Google ScholarB. A. Nayfeh , K. Olukotun and J. P. Singht , The impact of shared-cache clustering in small-scale shared-memory multiprocessors, Proceedings HPCA: Second Int. Symp. on High-Performance Computer Architecture (IEEE, 1996) pp. 74–84. Google Scholar- Shyamkumar Thoziyoor and Naveen Muralimanohar and Norman P. Jouppi, Cacti 5.0, HP Laboratories (2007) . Google Scholar
- IBM Journal of Research and Development 49, (2005), DOI: 10.1147/rd.494.0505. Google Scholar
- SIGARCH Computer Architecture News 31, 39 (2003), DOI: 10.1145/773365.773370. Crossref, Google Scholar


