ComputingReviews.com

An FPGA-based VLIW processor with custom hardware execution
Jones A., Hoare R., Kusic D., Fazekas J., Foster J. Field-programmable gate arrays (Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-programmable Gate Arrays, Monterey, California, Feb 20-22, 2005)107-117,2005.Type:Proceedings

Date Reviewed: 05/23/06

This is a very exciting piece of research in the general area of configurable, extensible processors and the software/hardware interface. The authors propose a hybrid architecture, consisting of a parameterized very long instruction word (VLIW) core augmented with custom hardware execution units, as a very potent programmable execution engine. In addition, they have developed the software infrastructure to allow for automatic optimization of C-based applications.

In the introductory section, the authors identify large-capacity field-programmable gate arrays (FPGAs) with substantial computer/memory resources as becoming commonplace. They correctly point out that the efficient mapping of applications on such devices is not a trivial exercise anymore, with a typical use being software kernels allocated on the FPGA fabric, and the irregular (control) part of the application running on an embedded processor. This segregation has indeed been identified by the major FPGA vendors, which utilize embedded processors on their devices to accommodate both regular and irregular codes.

The authors provide a good discussion of past and present behavioral synthesis solutions, and correctly identify such solutions as appropriate for combinational code, not for control-dominated applications. In addition, they provide a very good overview of the literature, both from academia and from industry, on configurable (static) and reconfigurable (dynamic) systems for software acceleration.

To address large, irregular code pieces in a semi-automatic manner, the authors propose a parametric platform to efficiently exploit all parallelism. The platform is a four-wide VLIW-based processor that is binary-compatible with the Altera NIOS II instruction set architecture (ISA). In addition, it supports extending that ISA with custom hardware resources to achieve superlinear speedups. The software infrastructure is based on the well-known Trimaran VLIW research.

The authors use an interesting technique to extract computational kernels (hardware functions), which are implemented directly as hardware blocks. These blocks make use of the abundant MAC units in typical high-performance FPGA devices, such as the Altera Stratix family.

The authors discuss their hardware architecture, which is based on a four-wide VLIW with an eight-register, four-word (8R/4W) 32x32-bit register file, shared among the VLIW processing elements (PEs) and the custom hardware units. They also correctly identify the register file as the performance-limiting resource in an FPGA implementation, and provide substantial microarchitecture performance data.

In the remaining sections, the authors discuss zero-overhead hardware/software switching, the hardware functions, and the software tool chain. They performed design, validation, and FPGA implementation, and achieved 167 megahertz (MHz) on an Altera Stratix, which is an impressive clock speed for a programmable device. Finally, they report on application speedups for both their standalone VLIW engine and their four-wide VLIW, augmented with hardware functions. Results range from nine percent to 230 times for kernel acceleration, which is indeed impressive.

Overall, this is a thorough account of the proposed field of research; the authors did their best to disclose as much information as possible in the context of a conference paper. I was very much impressed with the technical ability of all those involved. This is a solid paper on embedded central processing unit (CPU) architecture.

Reviewer: Vassilios Chouliaras

Review #: CR132817 (0704-0358)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy