Previous work, for example Cierniak and Li [1], established that jointly permuting the dimensions of arrays and the loops of loop nests can achieve significant performance gains over either kind of transformation alone, or even the two consecutively. In this case, compilation units may also be small enough to render practical a brute force search for the optimal transformation.
One contribution of this paper is to provide additional evidence in support of these findings. The authors present experimental results from a dozen programs (or kernels) on one and eight processors of an Origin 2000 server. Importantly, they show their transformations’ benefits when used in combination with the control transformations performed by the production compiler. Viewed as a source of additional experimental confirmation for prior claims, the paper can be judged a success; it should go a long way towards dispelling any doubts about the fruitfulness of combining control and data transformations.
Unfortunately, the authors see the paper’s major contribution elsewhere, in the technique they have developed to search for the optimal transformation. This technique embeds the set of arrays into an arbitrary sequence. This arbitrary sequence is then used to produce an integer linear programming (ILP) formulation of the problem that has far more variables and constraints than would otherwise have been needed, by considering overlapping pairs of consecutive arrays rather than individual arrays. If this has any purpose, other than to artificially inflate the number of variables and constraints the ILP solver is faced with, the authors do not make it clear.