Central processing unit (CPU) architectures with out-of-order instruction scheduling use, among other data, static instruction-timing information to create good schedules. The work reported in this paper describes the result of augmenting out-of-order instruction scheduling with dynamic timing information derived during execution. Simulations show the new architecture demonstrates a favorable balance between performance loss resulting from more-constrained schedules and the simpler and more power-efficient scheduling mechanisms.
Dynamic instruction-timing information is captured for load instructions within loop iterations; other instructions are scheduled with static timing information. Instruction issue is done via priority queue ordered by expected ready time. The simplicity of the mechanisms implementing these functions provides the benefit; for example, complex resource-scheduling schemes can be replaced by assigning queues to resources. The cost comes from, among other sources, inaccuracies in applying intra-iteration timing information across iterations; experiments with Standard Performance Evaluation Corporation (SPEC) benchmarks show timing information has roughly 90 percent consistency between successive iterations.
Simulations show the results of incorporating dynamic timing information in out-of-order instruction scheduling. When compared to baseline out-of-order schedules, schedules using dynamic timings show a roughly 13 percent drop-off in performance. On the other side, power consumption drops by roughly 20 percent, and chip area is reduced by roughly 15 percent. More detailed simulations pick apart the various components on both sides of the tradeoff.
The writing is clear, and the details are laid out effectively. The numbers, however, are a little difficult to trace through the explanations, although the general points come through. The bibliography is fine, although for some reason there are no references to scheduling algorithms for drum memory.