World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Round Robin Thread Selection Optimization in Multithreaded Processors

    We propose a variation of round-robin ordering in an multi-threaded pipeline to increase system throughput and resource distribution fairness. We show that using round robin with a typical arbitrary ordering results in inefficient use of shared resources and subsequent thread starvation. To address this but still use a simple round-robin approach, we optimally and dynamically sort the order of the round robin periodically at runtime. We show that with 4-threaded workloads, throughput can be improved by over 9% and harmonic throughput by over 3% by sorting thread order at run time. We experiment with multiple stages of the pipeline and show consistent results throughout several experiments using the SPEC CPU 2006 benchmarks. Furthermore, since the technique is still a simple round robin, the increased performance requires little overhead to implement.

    References

    • 1. J. Sharkey, D. Ponomarev and K. Ghose, M-Sim: A flexible, multithreaded architectural simulation environment. Technical report, Department of Computer Science, State University of New York at Binghamton, 2005. Google Scholar
    • 2. J. L. Henning, SPEC CPU2006 benchmark descriptions, ACM SIGARCH Computer Architecture News 34(4) (2006) 1–17. CrossrefGoogle Scholar
    • 3. S. J. Eggers et al., Simultaneous multithreading: A platform for next-generation processors, IEEE Micro 17(5) (1997) 12–19. Crossref, ISIGoogle Scholar
    • 4. H. Akkary and M. A. Driscoll, A dynamic multithreading processor, in Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture (IEEE, 1998), pp. 226–236. CrossrefGoogle Scholar
    • 5. D. M. Tullsen et al., Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor, ACM SIGARCH Computer Architecture News, 24 (1996) 191–202. CrossrefGoogle Scholar
    • 6. P. Kongetira, K. Aingaran and K. Olukotun, Niagara: A 32-way multithreaded sparc processor, IEEE Micro 25(2) (2005) 21–29. Crossref, ISIGoogle Scholar
    • 7. C. B. Zilles, J. S. Emer and G. S. Sohi, The use of multithreading for exception handling, in Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture (IEEE Computer Society, 1999), pp. 219–229. CrossrefGoogle Scholar
    • 8. K. Luo et al., Balancing thoughput and fairness in SMT processors, in 2001 IEEE International Symposium on Performance Analysis of Systems and Software (IEEE, 2001), pp. 164–171. Google Scholar
    • 9. A. El-Moursy and D. H. Albonesi, Front-end policies for improved issue efficiency in SMT processors, in Proceedings. The Ninth International Symposium on High-Performance Computer Architecture. HPCA-9 2003, 2003 (IEEE, 2003), pp. 31–40. CrossrefGoogle Scholar
    • 10. L. Q. He and Z. Y. Liu, An effective instruction fetch policy for simultaneous multithreaded processors, in Proceedings. Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region, 2004 (IEEE, 2004), pp. 162–168. Google Scholar
    • 11. F. J. Cazorla et al., Dynamically controlled resource allocation in SMT processors, in Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture (IEEE Computer Society, 2004), pp. 171–182. CrossrefGoogle Scholar
    • 12. D. M. Tullsen and J. A. Brown, Handling long-latency loads in a simultaneousmultithreading processor, in Proceedings of the 34th Annual ACM/IEEE International Symposium on Microarchitecture (IEEE Computer Society, 2001), pp. 318–327. CrossrefGoogle Scholar
    • 13. P. Marcuello, A. González and J. Tubella, Speculative multithreaded processors, in International Conference on Supercomputing, Vol. 98 (Citeseer, 1998). CrossrefGoogle Scholar
    • 14. F. J. Cazorla et al., DCache warn: An I-fetch policy to increase SMT efficiency, in International Parallel and Distributed Processing Symposium (IEEE, 2004), p. 74. Google Scholar
    • 15. Il Park, B. Falsafi and T. N. Vijaykumar, Implicitly-multithreaded processors, ACM SIGARCH Computer Architecture News 31 (2003) 39–51. CrossrefGoogle Scholar
    • 16. M. Gebhart et al., Energy-efficient mechanisms for managing thread context in throughput processors, in 2011 38th Annual International Symposium on Computer Architecture (ISCA) (IEEE, 2011), pp. 235–246. CrossrefGoogle Scholar
    • 17. A. Roth and G. S. Sohi, Speculative data-driven multithreading, in Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture (IEEE, 2001), pp. 37–48. CrossrefGoogle Scholar
    • 18. J. S. Seng, D. M. Tullsen and O. Z. N. Cai, Power-sensitive multithreaded architecture, in 2012 IEEE 30th International Conference on Computer Design (ICCD), (IEEE, 2012), pp. 17–24. CrossrefGoogle Scholar
    • 19. T. Austin, E. Larson and D. Ernst, SimpleScalar: An infrastructure for computer system modeling, Computer 35(2) (2002) 59–67. Crossref, ISIGoogle Scholar