World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Rapid Prototyping of the Data-Driven Chip-Multiprocessor (D2-CMP) using FPGAs

    This paper presents the FPGA implementation of the prototype for the Data-Driven Chip-Multiprocessor (D2-CMP). In particular, we study the implementation of a Thread Synchronization Unit (TSU) on FPGA, a hardware unit that enables thread execution using dataflow-like scheduling policy on a chip multiprocessor. Threads are scheduled for execution based on data availability, i.e., a thread is scheduled for execution only if its input data is available. This model of execution is called the non-blocking Data-Driven Multithreading (DDM) model of execution. The DDM model has been evaluated using an execution driven simulator. To validate the simulation results, a 2-node DDM chip multiprocessor has been implemented on a Xilinx Virtex-II Pro FPGA with two PowerPC processors hardwired on the FPGA. Measurements on the hardware prototype show that the TSU can be implemented with a moderate hardware budget. The 2-node multiprocessor has been implemented with less than half of the reconfigurable hardware available on the Xilinx Virtex-II Pro FPGA (45% slices), which corresponds to an ASIC equivalent gate count of 1.9 million gates. Measurements on the prototype showed that the delays incurred by the operation of the TSU can be tolerated.

    References

    •  Arvind and R. S. Nikhil, IEEE Transactions on Computers 39(3), 300 (1990), DOI: 10.1109/12.48862. Crossref, ISIGoogle Scholar
    • J. Dennis and G. Gao, An efficient pipelined dataflow Processor Architectures, proceedings of Supercomputing '88 (1988) pp. 368–373. Google Scholar
    • J. Gurd, C. Kirkham and I. Watson, Communications of the Association of Computing Machinery 28(1), 34 (1985). Crossref, ISIGoogle Scholar
    • P. Evripidou and C. Kyriacou, Journal of Universal Computer Science (J.UCS) 6(10), 1015 (2000). Google Scholar
    • K. Stavrouet al., International Journal of High Performance Systems Architecture (IJHPSA) 1(1), 34 (2007), DOI: 10.1504/IJHPSA.2007.013289. CrossrefGoogle Scholar
    • P. Evripidou, Thread Synchronization Unit (TSU): A building block for High Performance Computers, Proceedings of the ISHPC (1997) pp. 405–414. Google Scholar
    • C. Kyriacou, P. Evripidou and P. Trancoso, IEEE Trans, on Parallel and Distributed Systems 17, 1176 (2006), DOI: 10.1109/TPDS.2006.136. Crossref, ISIGoogle Scholar
    • C. Kyriacou, "Data Driven Multithreading using Conventional Control Flow Microprocessors", PhD dissertation, University of Cyprus (2005) . Google Scholar
    • http://www.xilinx.com/products/silicon_solutions/virtex_ii_pro_fpgas/index.htm . Google Scholar
    • http://www.xilinx.com/ise/embedded/edk91i_docs/edk_ctt.pdf . Google Scholar
    • M.   Keating and P.   Bricaud , Reuse Methodology Manual for System-On-A-Chip Designs ( Springer , 2006 ) . Google Scholar
    • R. Kumar and  et. al, Interconnections in Multi-core Architectures: Understanding Mechanisms, Overheads and Scaling, Proceedings of ISCA '05 (2005) pp. 408–419. Google Scholar
    • D. Burger and  et. al, Computer 37(7), 44 (2004), DOI: 10.1109/MC.2004.65. Crossref, ISIGoogle Scholar
    • K. Sankaralingam and  et. al, SIGARCH Computer Architecture News 31(2), 422 (2003), DOI: 10.1145/871656.859667. CrossrefGoogle Scholar
    • K. Kavi, R. Giorgi and J. Arul, IEEE Transactions on Computers 50(8), 834 (2001). Crossref, ISIGoogle Scholar
    • J. Arul and K. Kavi, ICA3 02 (2002). Google Scholar
    • S. Swanson and  et. al, Area-Performance Trade-offs in Tiled Dataflow Architectures, Proceedings of ISCA '06 (2006) pp. 314–326. Google Scholar
    • S. Amamiya and  et. al, Fuce: the continuation-based multithreading processor, Proc. of the 4th International Conference on Computing Frontiers (2007) pp. 213–224. Google Scholar
    • L.A. Barroso, and et. al. "RPM: A Rapid Prototyping Engine for Multiprocessor Systems". IEEE Computer Magazine. 28(2), pp.26-34, (1995) . Google Scholar
    • J. D. Davis and  et. al, SIGARCH Computer Architecture News 33(4), 34 (2005), DOI: 10.1145/1105734.1105740. CrossrefGoogle Scholar
    • R. C.   Cofer and B. F.   Harding , Newnes ( 2005 ) . Google Scholar
    • S. Hauck, G. Borriello and C. Ebeling, IEEE Transuctions on VLSI Systems 6, 170 (1998). ISIGoogle Scholar
    • S. Wee and  et. al, A Practical FPGA-based Framework for Novel CMP Research, FPGA'07 (2007) pp. 116–125. Google Scholar
    • A. Bolychevsky, C. R. Jesshope, and V. Muchnick, "Dynamic Scheduling in RISC Architectures", IEE Proc. Comput. Digit. Tech. 143, pp.309-317, (1996) . Google Scholar
    • K. Bousias, N. Hasasneh and C. Jesshope, The Computer Journal 49, 211 (2006), DOI: 10.1093/comjnl/bxh157. Crossref, ISIGoogle Scholar