USING HARDWARE MULTITHREADING TO OVERCOME BROADCAST/REDUCTION LATENCY IN AN ASSOCIATIVE SIMD PROCESSOR
Abstract
The latency of broadcast/reduction operations has a significant impact on the performance of SIMD processors. This is especially true for associative programs, which make extensive use of global search operations. Previously, we developed a prototype associative SIMD processor that uses hardware multithreading to overcome the broadcast/reduction latency. In this paper we show, through simulations of the processor running an associative program, that hardware multithreading is able to improve performance by increasing system utilization, even for processors with hundreds or thousands of processing elements. However, the choice of thread scheduling policy used by the hardware is critical in determining the actual utilization achieved. We consider three thread scheduling policies and show that a thread scheduler that avoids issuing threads that will stall due to pipeline dependencies or thread synchronization operations is able to maintain system utilization independent of the number of threads.
References
- IEEE Micro 12, 12 (1992), DOI: 10.1109/40.141599. Crossref, ISI, Google Scholar
- IEEE Transactions on Parallel and Distributed Systems 7, 818 (1996), DOI: 10.1109/71.532113. Crossref, ISI, Google Scholar
- Computer 27, 19 (1994), DOI: 10.1109/2.330039. Crossref, Google Scholar
- , Associative Processing and Processors, eds.
Anargyros Krikelis and Charles C. Weems (IEEE Computer Society Press, 1997) pp. 10–25. Google Scholar - , Associative Processing and Processors , eds.
Anargyros Krikelis and Charles C. Weems ( IEEE Press , 1997 ) . Google Scholar Meiduo Wu , Robert Walker and Jerry Potter , Implementing Associative Search and Responder Resolution, Proceedings of the International Parallel and Distributed Processing Symposium: Workshop on Massively Parallel Processing (2002) p. 246. Google ScholarHong Wang and Robert Walker , Implementing a Scalable ASC Processor, Proceedings of the International Parallel and Distributed Processing Symposium: Workshop on Massively Parallel Processing (2003) p. 267a. Google ScholarHong Wang and Robert Walker , A Scalable Pipelined Associative SIMD Array with Reconfigurable PE Interconnection Network for Embedded Applications, Proceedings of the International Conference on Parallel and Distributed Computing and Systems (2005) pp. 667–673. Google ScholarKevin Schaffer and Robert Walker , A Prototype Multithreaded Associative SIMD Processor, Proceedings of the International Parallel and Distributed Processing Symposium, Workshop on Advances in Parallel and Distributed Computing Models (2007) p. 228. Google ScholarKevin Schaffer and Robert Walker , Using Hardware Multithreading to Overcome Broadcast/Reduction Latency in an Associative SIMD Processor, Proceedings of the International Parallel and Distributed Processing Symposium, Workshop on LargeScale Parallel Processing (2008) p. 289. Google Scholar- Computer Journal 45, 320 (2002), DOI: 10.1093/comjnl/45.3.320. Crossref, ISI, Google Scholar
- IEEE Transactions on Parallel and Distributed Systems 3, 525 (1992), DOI: 10.1109/71.159037. Crossref, ISI, Google Scholar
Henry M. Levy , Susan J. Eggers and Dean M. Tullsen , Simultaneous Multithreading: Maximizing On-Chip Parallelism, Proceedings of the International Symposium on Computer Architecture (1995) p. 392. Google ScholarMaher Atwah , Johnnie Baker and Selim Akl , An Associative Implementation of Classical Convex Hull Algorithms, Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems (1996) pp. 435–438. Google Scholar


