The authors present the Parallel Program Debugger (PPD), a system for debugging parallel programs that run on shared memory multiprocessors. Their approach to parallel debugging uses compile-time and debugging-time analysis to reduce the amount of trace data that must be recorded at execution time. PPD operates in three phases.
At compile time, interprocedural and dataflow analysis are used to decompose the program into a series of blocks and to determine the sets of variables that might be read and written in each block.
During program execution, a coarse-grained trace log is written. At entry to each block, PPD records the values of variables in the read-accessed sets; at exit, it records the values of variables in the write-accessed set. In addition, at predetermined times it records the values of shared variables that might be read. The authors’ hand annotation of sample programs suggests an execution-time overhead of about 15 percent to generate the trace logs.
The debugging phase uses the trace logs, along with re-execution of partial blocks as necessary, to build the dynamic program dependence graph (showing actual rather than potential dependencies) at any desired level of detail. The programmer may view this graph as an aid to understanding the program’s behavior; it also helps in detecting race conditions and in analyzing deadlocks for the given execution.
Although PPD had not been fully implemented when this paper was written, so no actual experience with the system was available, the paper provides a good description of one approach to the debugging of parallel programs. It deserves careful study by those in the field.