This paper describes QuickRec, a field-programmable gate array (FPGA) implementation of hardware-assisted record replay (RnR) for an x86 processor. Record replay allows the recording of the execution of a multithreaded application, capturing all sources of nondeterminism, including input, nondeterministic instructions such as those that read the processor timestamp counter, and the interleaving of racing accesses to memory. This then allows replay of the execution, providing complete information about the execution to tools such as debuggers and race detectors, thereby enabling reasoning about it.
There has been significant prior work in this area, and the key contribution with QuickRec is a fully working prototype on FPGAs of previous work called Capo, which was originally evaluated on a simulator. The resulting full system (Capo3) consists of a modified Linux kernel supporting record replay, an FPGA prototype of four Intel Pentium cores connected to memory, and modifications to the cores to support record replay. The primary components of the record replay system are bloom filters that record addresses of reads and writes to the level 1 cache and in-memory logs of input events such as data supplied by the operating system. On certain events that demand the enforcing of total order, such as an interleaving access from a different core, the bloom filters and input logs are written out to a totally ordered log as “chunks.”
Because this work resulted in the building of a real system, the paper provides a number of interesting insights of a practical nature. First, the overheads of record and replay are as low as 13 percent on average, suggesting that this feature is mature enough and useful enough to demand inclusion in future processors. This is backed up by the fact that memory bandwidth requirements for record and replay are as low as 0.3 percent in the emulated system. The authors also provide a number of practical suggestions for operating system support for RnR, including a careful exposition on how to instrument routines that copy data back to user space and how to handle page faults in those routines by means of extra hardware support, thereby connecting the dots between RnR hardware and operating system support for RnR.
This paper is a good read for researchers interested in the practical aspects of record and replay. However, it does assume knowledge of prior art in the area. In particular, a careful reading of the original Capo system paper [1] would greatly enhance the potential for learning from this paper.