A technique for scheduling instructions using software pipelining and “kernel scheduling” is described. This technique takes advantage of the instruction-level parallelism inherent in many loops, but writers of optimizing compilers confront several conflicting goals:
- a low “initiation interval” is desired so as to increase throughput through the pipe;
- the “stage-count” must be kept small; and
- register pressure must be reduced lest there is no feasible schedule.
Swing modulo scheduling (SMS) is the term the authors give for their heuristic approach for computing such schedules while trying to satisfy all three goals. This is done without the use of backtracking through the solution space of schedules.
What makes this paper exciting is its extensive discussion of industrial experience using SMS in an optimizing compiler. The target architecture is a quad-issue VLIW-based processor, and the constraints of real hardware (for example, special-purpose registers, distribution of functional units) made for important modifications, refinements, and improvements to the algorithm. The results are very encouraging; for example, a straight recompile of an MPEG2 video decoder yielded an 11 percent increase in speed.
Unless the reader is familiar with optimizing compilers and dependence graphs, they would do well to have by their side either Allen and Kennedy [1] or Muchnick [2]. For those who are interested in code generation and the application of software pipelining algorithms in practice, section 6 (“SMS in a Production Compiler”) would be enough reason to read the whole paper.