Necessity, and even opportunity, lead to invention. Until there were MIMD computers with independent compute processors (CPs) and I/O processors (IOPs), there was no need to optimize file I/O (input/output) across CPs and IOPs. Such machines have existed for a decade or more, and during that time, various I/O optimization schemes have been put forward. Kotz presents simulation data comparing his proposed improvement, Disk-Directed I/O (DDIO), with a standard Simple Parallel File System (SPFS) and with an older improvement, two-phase I/O (2PIO). These simulations show that DDIO outperforms 2PIO, and 2PIO outperforms SPFS, over a wide range of CP, IOP, and disk counts and for multiple files and disks, although the rankings are inverted in a few cases.
The problem to be solved is how to partition and access large matrices across multiple disks; how programs should communicate requests for these data; and how the work should be divided among the CPs and the IOPs. For SPFS, essentially all the logic is in the CPs: each one explicitly requests the part of the data it needs, and a simple IOP retrieves it. For 2PIO, a second phase is added: in the first (SPFS) stage, large chunks of data are retrieved by each IOP, and in the second phase, these chunks are sorted and sent to the appropriate CP. In DDIO, most of the logic is moved to the IOPs, the assumption and hope being that, because semantic information is not lost and less data handling is needed, more parallelism will be achieved.
The simulation data support this expectation. Kotz performed simulations varying block size, degree of contiguity, number of CPs, and number of IOPs. The main results were obtained using 16 CPs and 16 IOPs; for this case, record sizes of 8 bytes and 8K bytes for a file size of 10 MB were simulated using various patterns ranging from contiguous to checkerboard. Results for read and for write were similar. The simulations contain two variants on DDIO, one with and one without a presort in the IOP of the block requests by physical location. The results make it clear that most of the improvement of DDIO is due to the presort; without it, DDIO, 2PIO, and SPFS hardly differ over many cases.
The paper is clear and the results are convincing. It is an updated version of a paper from November, 1994 conference proceedings and reflects the situation in this field as of that time. The work has antecedents in a few other systems that allow a collective or strided interface. Regrettably, while details of the simulator are given, including the processor used, actual elapsed times are not reported. Such a report would have clarified the claim that not all combinations could be simulated due to a lack of time. In what amounts to an afterthought, Kotz proposes that even better results could be obtained by moving filtering and other logic into the IOPs. This is undoubtedly so, but blurs the distinction between IOP and CP so much that the analysis may as well be based on a large number of undifferentiated processors. The source code is available on the Web at http://www.cs.dartmouth.edu/research/starfish.