Vector processors had their heyday in the 1980s, before classical supercomputers were mostly replaced by multiprocessors. Today, however, vector processors are experiencing a renaissance: their efficient exploitation of data-level parallelism has the potential for superior energy efficiency, which is the top priority in the pursuit of exascale performance. Toward this goal, this paper introduces the Vitruvius+ engine, a vector coprocessor that “implements the RISC-V vector extension (RVV) 0.7.1 and can be easily connected to a scalar [RISC-V] core using the Open Vector Interface standard.”
The core of the paper describes the four notable features of Vitruvius+:
- The “out-of-order chaining” of memory-to-arithmetic instructions allows for overlapping the arrival of groups of vector elements in the vector register file with their further processing in the pipelined functional units;
- “Fast moves” replace the execution of vector-vector move operations by renaming, such that multiple logical registers can be associated to the same physical register;
- “Switched ring reconfiguration” optimizes data shifts between the eight vector processor “lanes” by reverting their ring connections; and
- The execution of “vector reduction” instructions are enhanced by utilizing the eight lanes for tree-structured parallel reductions.
The Vitrivius+ design is experimentally evaluated in great detail by logic gate synthesis, demonstrating a higher peak efficiency than other vector processors.
The paper is very well written and systematically structured, which enables the
reader to process the material step-by-step, understand the rationale of the design decisions, and get an idea of the potential of the architecture. In its next generation, Vitruvius+ will support the latest version RVV-1.0, for which the main challenges and their solutions are sketched.