Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
An FPGA-based VLIW processor with custom hardware execution
Jones A., Hoare R., Kusic D., Fazekas J., Foster J.  Field-programmable gate arrays (Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-programmable Gate Arrays, Monterey, California, Feb 20-22, 2005)107-117.2005.Type:Proceedings
Date Reviewed: May 23 2006

This is a very exciting piece of research in the general area of configurable, extensible processors and the software/hardware interface. The authors propose a hybrid architecture, consisting of a parameterized very long instruction word (VLIW) core augmented with custom hardware execution units, as a very potent programmable execution engine. In addition, they have developed the software infrastructure to allow for automatic optimization of C-based applications.

In the introductory section, the authors identify large-capacity field-programmable gate arrays (FPGAs) with substantial computer/memory resources as becoming commonplace. They correctly point out that the efficient mapping of applications on such devices is not a trivial exercise anymore, with a typical use being software kernels allocated on the FPGA fabric, and the irregular (control) part of the application running on an embedded processor. This segregation has indeed been identified by the major FPGA vendors, which utilize embedded processors on their devices to accommodate both regular and irregular codes.

The authors provide a good discussion of past and present behavioral synthesis solutions, and correctly identify such solutions as appropriate for combinational code, not for control-dominated applications. In addition, they provide a very good overview of the literature, both from academia and from industry, on configurable (static) and reconfigurable (dynamic) systems for software acceleration.

To address large, irregular code pieces in a semi-automatic manner, the authors propose a parametric platform to efficiently exploit all parallelism. The platform is a four-wide VLIW-based processor that is binary-compatible with the Altera NIOS II instruction set architecture (ISA). In addition, it supports extending that ISA with custom hardware resources to achieve superlinear speedups. The software infrastructure is based on the well-known Trimaran VLIW research.

The authors use an interesting technique to extract computational kernels (hardware functions), which are implemented directly as hardware blocks. These blocks make use of the abundant MAC units in typical high-performance FPGA devices, such as the Altera Stratix family.

The authors discuss their hardware architecture, which is based on a four-wide VLIW with an eight-register, four-word (8R/4W) 32x32-bit register file, shared among the VLIW processing elements (PEs) and the custom hardware units. They also correctly identify the register file as the performance-limiting resource in an FPGA implementation, and provide substantial microarchitecture performance data.

In the remaining sections, the authors discuss zero-overhead hardware/software switching, the hardware functions, and the software tool chain. They performed design, validation, and FPGA implementation, and achieved 167 megahertz (MHz) on an Altera Stratix, which is an impressive clock speed for a programmable device. Finally, they report on application speedups for both their standalone VLIW engine and their four-wide VLIW, augmented with hardware functions. Results range from nine percent to 230 times for kernel acceleration, which is indeed impressive.

Overall, this is a thorough account of the proposed field of research; the authors did their best to disclose as much information as possible in the context of a conference paper. I was very much impressed with the technical ability of all those involved. This is a solid paper on embedded central processing unit (CPU) architecture.

Reviewer:  Vassilios Chouliaras Review #: CR132817 (0704-0358)
Bookmark and Share
  Reviewer Selected
 
 
RISC/ CISC, VLIW Architectures (C.1.1 ... )
 
 
Control Design Styles (B.1.1 )
 
Would you recommend this review?
yes
no
Other reviews under "RISC/CISC, VLIW Architectures": Date
A cost-effective design for MPEG-2 audio decoder with embedded RISC core
Tsai T., Wu R., Chen L. Journal of VLSI Signal Processing Systems 29(3): 255-265, 2001. Type: Article
Apr 21 2003
Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures
López D., Llosa J., Valero M., Ayguadé E. IEEE Transactions on Computers 50(10): 1033-1051, 2001. Type: Article
Dec 16 2002
Design of a multimedia processor based on metrics computation
Amor N., Le Moullec Y., Diguet J., Philippe J., Abid M. Advances in Engineering Software 36(7): 448-458, 2005. Type: Article
Jan 5 2006
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy