Cheng attempts to characterize the performance of 24 simple vectorizable FORTRAN loops on two vector supercomputers. Cheng collected the data scientifically and compares the two machines. This paper is intended for several audiences: vector computer programmers who want to achieve better performance, compiler writers who want to optimize simple kernel loops, and computer architects who want to improve the performance of next-generation systems. The paper suffers from many flaws, making it hard to read and perhaps redundant for many.
The paper is filled with large tables of speeds, which are difficult to assimilate, and text that goes through the tables line by line. A few graphs would have gone a long way--I found it difficult to keep my interest through the whole paper. Also, the author gives no justification for choosing these particular kernel loops. Optimization of inner loop kernels is easy to analyze, but the real performance benefit of vector computers comes from more deeply nested loop structures.
For those with little knowledge of the IBM 3090 VF or Cray X-MP vector processors, this paper will provide an introduction. It gives little insight into how to get performance out of a system, however, especially since some of the performance quirks are due to characteristics of the software, such as compilers, rather than of the vector hardware.