A problem in multiprocessor multidata (MPMD) and multiprocessor single data (MPSD) machines is the reassembly of data blocks into a coherent whole. The author provides a solution to this problem. He has developed a sorting algorithm and I/O architecture for an Intel hypercube that allow him to use the High-Performance Parallel Interface to connect with a Cray at another site. Steenkiste gives a thorough description of his solution and the results from a series of tests. The work may not apply to other MPSD machines, and many readers may consider it a variation on a theme.
There is a natural tension between hardware and software in an I/O solution. Software based on sorting algorithms seems less expensive (until the embedded NP-complete problem is encountered), while hardware tends to be specific and costly to change. It is possible to build a piece of hardware (a ring or series of rings) that can accomplish the data reassembly in a general manner. Hardware can be flexible if built right, and it can avoid the NP-complete problems.