PERFORMANCE EVALUATION OF BLAS ON THE TRIDENT PROCESSOR
Abstract
Different subtasks of an application usually have different computational, memory, and I/O requirements that result in different needs for computer capabilities. Thus, the more appropriate approach for both high performance and simple programming model is designing a processor having multi-level instruction set architecture (ISA). This leads to high performance and minimum executable code size.
Since the fundamental data structures for a wide variety of existing applications are scalar, vector, and matrix, our research Trident processor has three-level ISA executed on zero-, one-, and two-dimensional arrays of data. These levels are used to express a great amount of fine-grain data parallelism to a processor instead of the dynamical extraction by a complicated logic or statically with compilers. This reduces the design complexity and provides high-level programming interface to hardware.
In this paper, the performance of Trident processor is evaluated on BLAS, which represent the kernel operations of many data parallel applications. We show that Trident processor proportionally reduces the number of clock cycles per floating-point operation by increasing the number of execution datapaths.
References
- Proc. of the IEEE 89, 1553 (2001). Crossref, ISI, Google Scholar
- , International Technology Roadmap for Semiconductors ( 2001 ) , http://public.itrs.net . Google Scholar
- IEEE Computer 34, 47 (2001). Crossref, Google Scholar
- IEEE Computer 8, 37 (1997). Google Scholar
- IEEE Micro 17, 55 (1997). Crossref, ISI, Google Scholar
- ACM Transactions on Mathematical Software 5, 308 (1979). Crossref, Google Scholar
- ACM Transactions on Mathematical Software 14, 1 (1988). Crossref, ISI, Google Scholar
- ACM Transactions on Mathematical Software 16, 1 (1990). Crossref, ISI, Google Scholar
M. Soliman and S. Sedukhin , Trident: A Scalable Architecture for Scalar, Vector, and Matrix Operations, Proc. of the 7th Asia-Pacific Computer Systems Architecture Conference24 (2002) pp. 91–99. Google Scholar-
J. Hennessy and D. Patterson , Computer Architecture: A Quantitative Approach ( Morgan Kaufmann , California , 2002 ) . Google Scholar J. Anderson , Data and Computation Transformations for Multiprocessors, Proc. 5th ACM SIGPLAN (1995) pp. 166–178. Google Scholar-
G. Golub and C. Van Loan , Matrix Computations ( The Johns Hopkins University Press , Baltimore and London , 1996 ) . Google Scholar -
J. Dongarra , Solving Linear Systems on Vector and Shared Memory Computers ( SIAM , Philadelphia , 1991 ) . Google Scholar


