As hardware continues to change, developers look for ways to maintain compatibility with current platforms. Right now, the principal issue is that memory bandwidth has not kept up with other changes in hardware, which constrains the performance of simple algorithms optimized for central processing unit (CPU) efficiency. Thus, it is important to examine memory usage as well.
There are various tricks to enable a program to use less memory, but this paper takes a more reliable approach, coupling a deep review of the theory of matrix transformations with a systematic search for efficiencies. The result is a detailed, well-reasoned overview of three well-known reduction algorithms, upper Hessenberg, tridiagonal, and bidiagonal, and a discussion on optimizing their memory usage. The analysis is thorough, spanning theory and practice, with carefully analyzed, detailed benchmarks. While such papers can often be dull, the writing style manages to make the results very readable.
Only two aspects of the paper disappoint: the source code is not publicly available, and, although the algorithms do represent an abstract family, there is no higher-level analysis of the code as a family. In other words, while this is a splendid paper on numerical algorithms and their performance, the use of “families” in the title primes the reader to also expect some software engineering content, which is unfortunately not included.
Despite these concerns, the paper is well worth reading by experts in numerical software who are interested in memory issues. One could easily use this paper as a template for how to write such papers. It is well enough written that nonexperts would also find it readable; however, they might not get a lot out of it, as it really is all about the details.