World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

USING HARDWARE MULTITHREADING TO OVERCOME BROADCAST/REDUCTION LATENCY IN AN ASSOCIATIVE SIMD PROCESSOR

    The latency of broadcast/reduction operations has a significant impact on the performance of SIMD processors. This is especially true for associative programs, which make extensive use of global search operations. Previously, we developed a prototype associative SIMD processor that uses hardware multithreading to overcome the broadcast/reduction latency. In this paper we show, through simulations of the processor running an associative program, that hardware multithreading is able to improve performance by increasing system utilization, even for processors with hundreds or thousands of processing elements. However, the choice of thread scheduling policy used by the hardware is critical in determining the actual utilization achieved. We consider three thread scheduling policies and show that a thread scheduler that avoids issuing threads that will stall due to pipeline dependencies or thread synchronization operations is able to maintain system utilization independent of the number of threads.

    References

    • K. E. Grosspietsch, IEEE Micro 12, 12 (1992), DOI: 10.1109/40.141599. Crossref, ISIGoogle Scholar
    • James D. Allen and David E. Schimmel, IEEE Transactions on Parallel and Distributed Systems 7, 818 (1996), DOI: 10.1109/71.532113. Crossref, ISIGoogle Scholar
    • Jerry Potteret al., Computer 27, 19 (1994), DOI: 10.1109/2.330039. CrossrefGoogle Scholar
    • Behrooz Parhami, Associative Processing and Processors, eds. Anargyros Krikelis and Charles C. Weems (IEEE Computer Society Press, 1997) pp. 10–25. Google Scholar
    • Martin C.   Herbordt and Charles C.   Weems , Associative Processing and Processors , eds. Anargyros   Krikelis and Charles C.   Weems ( IEEE Press , 1997 ) . Google Scholar
    • Meiduo Wu, Robert Walker and Jerry Potter, Implementing Associative Search and Responder Resolution, Proceedings of the International Parallel and Distributed Processing Symposium: Workshop on Massively Parallel Processing (2002) p. 246. Google Scholar
    • Hong Wang and Robert Walker, Implementing a Scalable ASC Processor, Proceedings of the International Parallel and Distributed Processing Symposium: Workshop on Massively Parallel Processing (2003) p. 267a. Google Scholar
    • Hong Wang and Robert Walker, A Scalable Pipelined Associative SIMD Array with Reconfigurable PE Interconnection Network for Embedded Applications, Proceedings of the International Conference on Parallel and Distributed Computing and Systems (2005) pp. 667–673. Google Scholar
    • Kevin Schaffer and Robert Walker, A Prototype Multithreaded Associative SIMD Processor, Proceedings of the International Parallel and Distributed Processing Symposium, Workshop on Advances in Parallel and Distributed Computing Models (2007) p. 228. Google Scholar
    • Kevin Schaffer and Robert Walker, Using Hardware Multithreading to Overcome Broadcast/Reduction Latency in an Associative SIMD Processor, Proceedings of the International Parallel and Distributed Processing Symposium, Workshop on LargeScale Parallel Processing (2008) p. 289. Google Scholar
    • Theo Ungerer, Borut Robic and Jurij ilc, Computer Journal 45, 320 (2002), DOI: 10.1093/comjnl/45.3.320. Crossref, ISIGoogle Scholar
    • Anant Agarwal, IEEE Transactions on Parallel and Distributed Systems 3, 525 (1992), DOI: 10.1109/71.159037. Crossref, ISIGoogle Scholar
    • Henry M. Levy, Susan J. Eggers and Dean M. Tullsen, Simultaneous Multithreading: Maximizing On-Chip Parallelism, Proceedings of the International Symposium on Computer Architecture (1995) p. 392. Google Scholar
    • Maher Atwah, Johnnie Baker and Selim Akl, An Associative Implementation of Classical Convex Hull Algorithms, Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems (1996) pp. 435–438. Google Scholar