Round Robin Thread Selection Optimization in Multithreaded Processors
Abstract
We propose a variation of round-robin ordering in an multi-threaded pipeline to increase system throughput and resource distribution fairness. We show that using round robin with a typical arbitrary ordering results in inefficient use of shared resources and subsequent thread starvation. To address this but still use a simple round-robin approach, we optimally and dynamically sort the order of the round robin periodically at runtime. We show that with 4-threaded workloads, throughput can be improved by over 9% and harmonic throughput by over 3% by sorting thread order at run time. We experiment with multiple stages of the pipeline and show consistent results throughout several experiments using the SPEC CPU 2006 benchmarks. Furthermore, since the technique is still a simple round robin, the increased performance requires little overhead to implement.
References
- 1. J. Sharkey, D. Ponomarev and K. Ghose, M-Sim: A flexible, multithreaded architectural simulation environment. Technical report, Department of Computer Science, State University of New York at Binghamton, 2005. Google Scholar
- 2. , SPEC CPU2006 benchmark descriptions, ACM SIGARCH Computer Architecture News 34(4) (2006) 1–17. Crossref, Google Scholar
- 3. , Simultaneous multithreading: A platform for next-generation processors, IEEE Micro 17(5) (1997) 12–19. Crossref, ISI, Google Scholar
- 4. , A dynamic multithreading processor, in Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture (IEEE, 1998), pp. 226–236. Crossref, Google Scholar
- 5. , Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor, ACM SIGARCH Computer Architecture News, 24 (1996) 191–202. Crossref, Google Scholar
- 6. , Niagara: A 32-way multithreaded sparc processor, IEEE Micro 25(2) (2005) 21–29. Crossref, ISI, Google Scholar
- 7. , The use of multithreading for exception handling, in Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture (IEEE Computer Society, 1999), pp. 219–229. Crossref, Google Scholar
- 8. , Balancing thoughput and fairness in SMT processors, in 2001 IEEE International Symposium on Performance Analysis of Systems and Software (IEEE, 2001), pp. 164–171. Google Scholar
- 9. , Front-end policies for improved issue efficiency in SMT processors, in Proceedings. The Ninth International Symposium on High-Performance Computer Architecture. HPCA-9 2003,
2003 (IEEE, 2003), pp. 31–40. Crossref, Google Scholar - 10. , An effective instruction fetch policy for simultaneous multithreaded processors, in Proceedings. Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region,
2004 (IEEE, 2004), pp. 162–168. Google Scholar - 11. , Dynamically controlled resource allocation in SMT processors, in Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture (IEEE Computer Society, 2004), pp. 171–182. Crossref, Google Scholar
- 12. , Handling long-latency loads in a simultaneousmultithreading processor, in Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture (IEEE Computer Society, 2001), pp. 318–327. Crossref, Google Scholar
- 13. , Speculative multithreaded processors, in International Conference on Supercomputing, Vol. 98 (Citeseer, 1998). Crossref, Google Scholar
- 14. , DCache warn: An I-fetch policy to increase SMT efficiency, in International Parallel and Distributed Processing Symposium (IEEE, 2004), p. 74. Google Scholar
- 15. , Implicitly-multithreaded processors, ACM SIGARCH Computer Architecture News 31 (2003) 39–51. Crossref, Google Scholar
- 16. , Energy-efficient mechanisms for managing thread context in throughput processors, in 2011 38th Annual International Symposium on Computer Architecture (ISCA) (IEEE, 2011), pp. 235–246. Crossref, Google Scholar
- 17. , Speculative data-driven multithreading, in Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture (IEEE, 2001), pp. 37–48. Crossref, Google Scholar
- 18. , Power-sensitive multithreaded architecture, in 2012 IEEE 30th International Conference on Computer Design (ICCD), (IEEE, 2012), pp. 17–24. Crossref, Google Scholar
- 19. , SimpleScalar: An infrastructure for computer system modeling, Computer 35(2) (2002) 59–67. Crossref, ISI, Google Scholar


