SYSTEMC IMPLEMENTATION AND PERFORMANCE EVALUATION OF A DECOUPLED GENERAL-PURPOSE MATRIX PROCESSOR
Abstract
Technological advances in IC manufacturing provide us with the capability to integrate more and more functionality into a single chip. Today's modern processors have nearly one billion transistors on a single chip. With the increasing complexity of today's system, the designs have to be modeled at a high-level of abstraction before partitioning into hardware and software components for final implementation. This paper explains in detail the implementation and performance evaluation of a matrix processor called Mat-Core with SystemC (system level modeling language). Mat-Core is a research processor aiming at exploiting the increasingly number of transistors per IC to improve the performance of a wide range of applications. It extends a general-purpose scalar processor with a matrix unit. To hide memory latency, the extended matrix unit is decoupled into two components: address generation and data computation, which communicate through data queues. Like vector architectures, the data computation unit is organized in parallel lanes. However, on parallel lanes, Mat-Core can execute matrix-scalar, matrix-vector, and matrix-matrix instructions in addition to vector-scalar and vector-vector instructions. For controlling the execution of vector/matrix instructions on the matrix core, this paper extends the well known scoreboard technique. Furthermore, the performance of Mat-Core is evaluated on vector and matrix kernels. Our results show that the performance of four lanes Mat-Core with matrix registers of size 4 × 4 or 16 elements each, queues size of 10, start up time of 6 clock cycles, and memory latency of 10 clock cycles is about 0.94, 1.3, 2.3, 1.6, 2.3, and 5.5 FLOPs per clock cycle; achieved on scalar-vector multiplication, SAXPY, Givens, rank-1 update, vector-matrix multiplication, and matrix-matrix multiplication, respectively.
References
J. Hennessy and D. Patterson , Computer Architecture: A Quantitative Approach, 4th edn. (Morgan Kaufmann, San Francisco, CA, 2007). Google ScholarM. Soliman , Mat-Core: A Matrix Core Extension for General Purpose Processors, Proc. The 2007 International Conference on Computer Engineering and Systems (ICCES'07) pp. 304–310. Google Scholar- Proceedings of the IEEE 83, 1609 (1995), DOI: 10.1109/5.476078. Crossref, ISI, Google Scholar
J. Fisher , VLIW Architectures and the ELI-512, Proc. 10th International Symposium on Computer Architecture pp. 140–150. Google Scholar-
W. Schonauer , Scientific Computing on Vector Computers ( North-Holland , Amsterdam , 1987 ) . Google Scholar - C. Lee, Code Optimizers and Register Organizations for Vector Architectures, Ph.D. Thesis, University of California at Berkeley, 1992 . Google Scholar
- R. Espasa, Advanced Vector Architectures, Ph.D. Thesis, Universitat Politecnica de Catalunya, 1997 . Google Scholar
- C. Kozyrakis, Scalable Vector Media-processors for Embedded Systems, Ph.D. Thesis, University of California at Berkeley, 2002 . Google Scholar
- R. Krashinsky, Vector-Thread Architecture And Implementation, Ph.D. Thesis, Massachusetts Institute Of Technology, 2007 . Google Scholar
- J. Gebis, Low-complexity Vector Microprocessor Extensions, Ph.D. thesis, University of California at Berkeley, 2008 . Google Scholar
R. Espasa , M. Valero and J. Smith , Vector Architectures: Past, Present and Future, Proc. 2th International Conference on Supercomputing pp. 425–432. Google Scholar-
J. Smith , The Best Way to Achieve Vector-Like Performance? , Proc. 21st International Symposium on Computer Architecture . Google Scholar - Intel. Technology Journal 12, 69 (2008), DOI: 10.1535/itj.1201.07. Crossref, ISI, Google Scholar
- IEEE Computer 30, 51 (1997), DOI: 10.1535/itj.1201.07. Crossref, Google Scholar
- IEEE Computer 37, 22 (2004). Crossref, Google Scholar
- IEEE MICRO 20, 71 (2000), DOI: 10.1109/40.848474. Crossref, ISI, Google Scholar
- IEEE Computer 30, 68 (1997), DOI: 10.1109/40.848474. Crossref, Google Scholar
- IEEE Computer 30, 59 (1997). Crossref, Google Scholar
- IEEE Computer 30, 75 (1997). Crossref, Google Scholar
-
D. Burger , TRIPS Processor Reference Manual ( The University of Texas at Austin , 2005 ) . Google Scholar - ACM Transactions on Computer Systems 2, 289 (1984), DOI: 10.1145/357401.357403. Crossref, ISI, Google Scholar
R. Espasa and M. Valero , Decoupled Vector Architecture, Proc. 2nd International Symposium on High-Performance Computer Architecture pp. 281–290. Google Scholar- The Journal of Supercomputing 38, 237 (2006), DOI: 10.1007/s11227-006-8321-2. Crossref, ISI, Google Scholar
-
T. Grotker , System Design with SystemC ( Kluwer Academic Publishers , Norwell, MA,USA , 2002 ) . Google Scholar -
J. Bhaske , A SystemC Primer ( Star Galaxy Publishing , 1058 Treeline Drive, Allentown, PA 18103,USA , 2002 ) . Google Scholar -
D. Black and J. Donovan , SystemC: From The Ground Up ( Kluwer Academic Publishers , Norwell, MA, USA , 2004 ) . Crossref, Google Scholar - Open SystemC Initiative, The SystemC Library, 2009 , www.systemc.org . Google Scholar
J. Thornton , Parallel Operation in the Control Data 6600, Proc. 26th AFIPS Conference2 (1964) pp. 33–40. Google Scholar-
J. Thornton , Design of a Computer: The Control Data 6600 ( Scott Foresman , Glenview, Ill , 1970 ) . Google Scholar M. Weiss , Strip Mining on SIMD Architectures, Proc. 5th International Conference on Supercomputing pp. 234–243. Google Scholar- ACM Computing Surveys 26, 345 (1994), DOI: 10.1145/197405.197406. Crossref, ISI, Google Scholar
- D. DeVries, A Vectorizing SUIF Compiler: Implementation and Performance, Master Thesis, University of Toronto, 1997 . Google Scholar
-
G. Golub and C. Van Loan , Matrix Computations , 3rd edn. ( The Johns Hopkins University Press , Baltimore and London , 1996 ) . Google Scholar - Intel 64 and IA-32 Architectures Software Developer's Manual, 2009 , www.intel.com/products/processor/manuals/index.htm . Google Scholar


