World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

PERFORMANCE EVALUATION OF BLAS ON THE TRIDENT PROCESSOR

    Different subtasks of an application usually have different computational, memory, and I/O requirements that result in different needs for computer capabilities. Thus, the more appropriate approach for both high performance and simple programming model is designing a processor having multi-level instruction set architecture (ISA). This leads to high performance and minimum executable code size.

    Since the fundamental data structures for a wide variety of existing applications are scalar, vector, and matrix, our research Trident processor has three-level ISA executed on zero-, one-, and two-dimensional arrays of data. These levels are used to express a great amount of fine-grain data parallelism to a processor instead of the dynamical extraction by a complicated logic or statically with compilers. This reduces the design complexity and provides high-level programming interface to hardware.

    In this paper, the performance of Trident processor is evaluated on BLAS, which represent the kernel operations of many data parallel applications. We show that Trident processor proportionally reduces the number of clock cycles per floating-point operation by increasing the number of execution datapaths.

    References

    • Y. Patt, Proc. of the IEEE 89, 1553 (2001). Crossref, ISIGoogle Scholar
    • , International Technology Roadmap for Semiconductors ( 2001 ) ,   http://public.itrs.net . Google Scholar
    • S. Vajapeyam and M. Valero, IEEE Computer 34, 47 (2001). CrossrefGoogle Scholar
    • D. Matzke, IEEE Computer 8, 37 (1997). Google Scholar
    • D. Burgeret al., IEEE Micro 17, 55 (1997). Crossref, ISIGoogle Scholar
    • C. Lawsonet al., ACM Transactions on Mathematical Software 5, 308 (1979). CrossrefGoogle Scholar
    • J. Dongarraet al., ACM Transactions on Mathematical Software 14, 1 (1988). Crossref, ISIGoogle Scholar
    • J. Dongarraet al., ACM Transactions on Mathematical Software 16, 1 (1990). Crossref, ISIGoogle Scholar
    • M. Soliman and S. Sedukhin, Trident: A Scalable Architecture for Scalar, Vector, and Matrix Operations, Proc. of the 7th Asia-Pacific Computer Systems Architecture Conference24 (2002) pp. 91–99. Google Scholar
    • J.   Hennessy and D.   Patterson , Computer Architecture: A Quantitative Approach ( Morgan Kaufmann , California , 2002 ) . Google Scholar
    • J. Andersonet al., Data and Computation Transformations for Multiprocessors, Proc. 5th ACM SIGPLAN (1995) pp. 166–178. Google Scholar
    • G.   Golub and C.   Van Loan , Matrix Computations ( The Johns Hopkins University Press , Baltimore and London , 1996 ) . Google Scholar
    • J. Dongarra et al. , Solving Linear Systems on Vector and Shared Memory Computers ( SIAM , Philadelphia , 1991 ) . Google Scholar