Exploiting Long-Term Temporal Cache Access Patterns for LRU Insertion Prioritization
Abstract
In a CPU cache utilizing least recently used (LRU) replacement, cache sets manage a buffer which orders all cache lines in the set from LRU to most recently used (MRU). When a cache line is brought into cache, it is placed at the MRU and the LRU line is evicted. When re-accessed, a line is promoted to the MRU position. LRU replacement provides a simple heuristic to predict the optimal cache line to evict. However, LRU utilizes only simple, short-term access patterns. In this paper, we propose a method that uses a buffer called the history queue to record longer-term access-eviction patterns than the LRU buffer can capture. Using this information, we make a simple modification to LRU insertion policy such that recently-recalled blocks have priority over others. As lines are evicted, their addresses are recorded in a FIFO history queue. Incoming lines that have recently been evicted and now recalled (those in the history queue at recall time) remain in the MRU for an extended period of time as non-recalled lines entering the cache thereafter are placed below the MRU. We show that the proposed LRU insertion prioritization increases performance in single-threaded and multi-threaded workloads in simulations with simple adjustments to baseline LRU.
Communicated by Andrew Adamatzky
References
- 1. LLC WikiChip. Skylake client microarchitectures (2020), https://en.wikichip.org/wiki/intel/microarchitectures/skylake˙(client). Google Scholar
- 2. , Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers, ACM SIGARCH Computer Architecture News 18(2SI) (1990) 364–373. Crossref, Google Scholar
- 3. , Configurable cache tuning with a victim cache, in 2015 10th International Symposium on Reconfigurable Communication-Centric Systems-on-Chip (ReCoSoC) (IEEE, 2015), pp. 1–6. Crossref, Google Scholar
- 4. , Using dead blocks as a virtual victim cache, in Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (ACM, 2010) 489–500. Crossref, Google Scholar
- 5. , Reusing trace buffers as victim caches, IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26(9) (2018) 1699–1712. Crossref, ISI, Google Scholar
- 6. , Insertion policy selection using decision tree analysis, in 2010 IEEE International Conference on Computer Design (IEEE, 2010), pp. 106–111. Crossref, Google Scholar
- 7. , Insertion and promotion for tree-based pseudolru last-level caches, in Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (2013), pp. 284–296. Crossref, Google Scholar
- 8. , Adaptive insertion policies for high performance caching, ACM SIGARCH Computer Architecture News 35(2) (2007) 381–391. Crossref, Google Scholar
- 9. , Insertion policy selection using decision tree analysis, in 2010 IEEE International Conference on Computer Design (IEEE, 2010), pp. 106–111. Crossref, Google Scholar
- 10. , Dueling clock: Adaptive cache replacement policy based on the clock algorithm, in 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010) (IEEE, 2010), pp. 920–925. Crossref, Google Scholar
- 11. , Adaptive insertion policies for managing shared caches, in Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (2008), pp. 208–219. Crossref, Google Scholar
- 12. , Cache replacement based on reuse-distance prediction, in 2007 25th International Conference on Computer Design (IEEE, 2007), pp. 245–250. Crossref, Google Scholar
- 13. , Reuse distance as a metric for cache behavior, in Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems, Volume 14 (Citeseer, 2001), pp. 350–360. Google Scholar
- 14. , Ship: Signature-based hit predictor for high performance caching, in Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (2011), 430–441. Crossref, Google Scholar
- 15. , High performance cache replacement using re-reference interval prediction (RRIP), ACM SIGARCH Computer Architecture News 38(3) (2010) 60–71. Crossref, Google Scholar
- 16. , Cache replacement policy based on expected hit count, IEEE Computer Architecture Letters 17(1) (2017) 64–67. Crossref, ISI, Google Scholar
- 17. Seyed Armin Vakil Ghahani, Sara Mahdizadeh Shahri, Mohammad Bakhshalipour, Pejman Lotfi-Kamran and Hamid Sarbazi-Azad, Making Belady-inspired replacement policies more effective using expected hit count, arXiv preprint arXiv:1808.05024 (2018). Google Scholar
- 18. Joseph Sharkey, Dmitry Ponomarev and Kanad Ghose, M-SIM: A flexible, multithreaded architectural simulation environment, Technical Report, Department of Computer Science, State University of New York at Binghamton (2005). Google Scholar
- 19. , SPEC CPU2006 benchmark descriptions, ACM SIGARCH Computer Architecture News 34(4) (2006) 1–17. Crossref, Google Scholar


