Performance evaluation of computer systems often relies on the offline analysis of traces of actual program execution. A trace of even a few seconds can record millions of events, so the size of trace files can get out of hand quickly.
Johnson and Ha describe techniques for compressing trace files, which result in a reduction in size by a factor of 20 to 50 compared with existing formats. As well as saving space, the smaller files can be significantly faster to read, so overall time savings are also made in the performance evaluation process.
The method proposed, called PDATS, takes advantage of locality of reference and repetition in the trace files by storing differences and repeat counts. This provides good compression, but significantly better compression is achieved by further compressing the PDATS format with a Ziv-Lempel compressor. A further extension, called PDI, achieves even more compression by using a separate code for the 256 most common instructions.
The paper includes a very brief comparison with the Mache system, which predates PDATS by about ten years, and given the similarities of the two systems, a more careful comparison would have been helpful.
In general, the paper is clearly written. However, the tables would benefit from clearer labeling, the method of measuring compression is not stated clearly, and the abbreviation “pdt” appears suddenly. There is also some confusion between Ziv-Lempel methods (for example, the LZ method seems to be the gzip program).
The paper describes and carefully evaluates a method that, while relatively simple, is highly practical, and is likely to be of significant value to the performance evaluation community.