This paper begins by defining a systolic system and presents some of the standard formats in which n-by-n matrices are written. This is followed by a discussion of the best known systolic arrays for multiplication of two such matrices, noting the required number of elementary processors.
The author then examines the applications of these arrays to the problem of multiplying three n-by-n matrices. Some difficulties are revealed, even if two states per elementary processor are allowed.
Finally, a description is given of a strategy which merges the work of two systolic arrays into a systolic system. The multiplication is then performed using n2 elementary processors, and the scheme possesses optimal efficiency.
This work shows a clever merging of two otherwise unattractive arrays into a system with optimal efficiency. The paper is a nice contribution to VLSI.