CacheFlow: Cache Optimizations for Data Driven Multithreading
Abstract
Data-Driven Multithreading is a non-blocking multithreading model of execution that provides effective latency tolerance by allowing the computation processor do useful work, while a long latency event is in progress. With the Data-Driven Multithreading model, a thread is scheduled for execution only if all of its inputs have been produced and placed in the processor's local memory. Data-driven sequencing leads to irregular memory access patterns that could affect negatively cache performance. Nevertheless, it enables the implementation of short-term optimal cache management policies. This paper presents the implementation of CacheFlow, an optimized cache management policy which eliminates the side effects due to the loss of locality caused by the data-driven sequencing, and reduces further cache misses. CacheFlow employs thread-based prefetching to preload data blocks of threads deemed executable. Simulation results, for nine scientific applications, on a 32-node Data-Driven Multithreaded machine show an average speedup improvement from 19.8 to 22.6. Two techniques to further improve the performance of CacheFlow, conflict avoidance and thread reordering, are proposed and tested. Simulation experiments have shown a speedup improvement of 24% and 32%, respectively. The average speedup for all applications on a 32-node machine with both optimizations is 26.1.
References
-
P. Evripidou and J.-L. Gaudiot , A Decoupled Graph/Computation Data-Driven Architecture with Variable Resolution Actors , Proceedings of the 1990 International Conference on Parallel Processing . Google Scholar - Journal of Universal Computer Science (JUCS) 6(10), (2000). Google Scholar
-
C. Kyriacou and P. Evripidou , Advances in Informatics ,LNCS 2563 ( Springer-Verlang , 2002 ) . Google Scholar -
S. Cameron Woo and et al. , The SPLASH-2 Programs: Characterization and Methodological Considerations , Proc. of the 22nd Annual International Symposium on Computer Architecture . Google Scholar -
M. Devarokonda and A. Mukherjee , Issues in implementation of cache-affinity scheduling , Proc. of Winter USENIX Conference . Google Scholar - IEEE/ACM Transactions on Networking 4, (1996). Google Scholar
-
D. K. Poulsen and P. C. Yew , Data Prefetching and Data Forwarding in Shared Memory Multiprocessors , Proc. of the 1994 International Conference on Parallel Processing . Google Scholar - ICPP 3, 79 (1996). ISI, Google Scholar
B. Batson and T. N. Vijaykumar , Reactive-associative caches, Proc. of the International Symposium on Parallel Architectures and Compiler Techniques (PACT) pp. 49–60. Google Scholar-
T. Sherwood , S. Sair and B. Calten , Predictor-directed stream buffers , 33rd International Symposium on Microarchitecture . Google Scholar -
J. Collins , Pointer cache assisted prefetching , Proc. of the 35th Annual International Symposium on Microarchitecture (MICRO-35) . Google Scholar -
T. C. Mowry , M. S. Lam and A. Gupta , Design and evaluation of a compiler algorithm for prefetching , Proc. of the 5th Inter. Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-V) . Google Scholar -
C. K. Luk and T. C. Mowry , Compiler based prefetching for recursive data structures , Proc. of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII) . Google Scholar -
J. Collins , Dynamic Speculative Precomputation , Proc. of the 34th Annual International Symposium on Microarchitecture (MICRO-34) . Google Scholar -
A. Roth and G. Sohi , Speculative data-driven multithreading , Proc. of the 7th Inter. Symposium on High-Performance Computer Architecture . Google Scholar -
Y. Solihin , J. Lee and J. Torrellas , Using User-Level Memory Thread for Correlation Prefetching , Proc. of the 29th Annual International Symposium on Computer Architecture (ISCA) . Google Scholar - IEEE Micro 12 (1997). Google Scholar
-
C. K. Luk , Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , Proc. of the 28th Annual International Symposium on Computer Architecture (ISCA) . Google Scholar - IEEE Computer Architecture Letters (CAL) (2004). Google Scholar
-
N. Tuck and D. Tullsen , Multithreaded Value Prediction , Proc. of the 11th International Symposium on High Performance Computer Architecture . Google Scholar - Intel (2003). Google Scholar


