World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Chapter 10: Compact Code Generation and Throughput Optimization for Coarse-Grained Reconfigurable Arrays by:1 (Source: Crossref)

    We consider the problem of compact code generation for coarse-grained reconfigurable architectures consisting of an array of tightly interconnected processing elements, each having multiple functional units. Switching from one context to another one is realized on a cycle basis by employing concepts similar as in programmable processors, i.e. a processing element possesses an instruction memory, an instruction decoder, and a program counter and thus can be seen as a simplified VLIW processor. Such architectures support both loop-level parallelism by having several processing elements arranged in an array and instruction-level parallelism by their VLIW nature. They are often limited in instruction memory size to reduce area and power consumption. Hence, generating compact code while achieving high throughput for such a processor array is challenging. Furthermore, software pipelining allows such an array to execute more than one iteration at a time, making the code generation and compaction even more ambitious. In this realm, we present a novel approach for the generation of compact and problem size independent code for such VLIWprocessor arrays on the basis of control flow graph notations. For selected benchmarks, we compare our proposed approach for code generation with the Trimaran compilation infrastructure. As a result, our approach may reduce the code size by up to 64% and may achieve up to 8.4 × higher throughput.