Fine-grained dataflow computing creates circumstances that technologies, particularly hardware technologies, have difficulty handling. However, techniques like coarsening computational grain and substituting software for hardware, as described in this paper, can help realize the promises of dataflow computing. The reported results show that these techniques deliver respectable and responsive performance with effective and efficient utilization.
Data-driven multithreading models coarse-grained computations. Schedulers and a distributed shared memory (DSM) model over multinode, multicore high-performance computing (HPC) systems model dataflow architectures. Programs are explicit thread sets with specialized thread types supporting distributed recursive computations. The thread-core mapping is static, and threads are scheduled dynamically when other threads have computed all required input. The DSM abstraction provides inter-thread communication, along with message passing for selected special cases and lower level matters.
Evaluation was comprehensive across HPC architectures, system capabilities (a custom network interface versus the message passing interface), algorithms, and competing HPC systems. The results of most combinations are respectable to impressive, showing good scalability over small problem sizes and high core counts. Extracting program parallelism is more explicit than it is in hardware-supported dataflow computing, but is consonant with other thread-based HPC computing paradigms.
The system described is a continuation of prior work. For most people, reading this paper will be like joining an ongoing conversation. Prior knowledge of distributed and high-performance computing is necessary and somewhat helpful, but mapping the myriad details presented into general-level knowledge requires ample and judicious assumptions, which slows down reading. The writing is solid, and the bibliography is fulsome but selective.