While in procedural parallel programming with multithreading lemmas, the synchronization of the sequences of parallel execution of tasks is important to quick job execution, the data-driven method draws a new pattern. By passing synchronization overhead and its latencies, an assigned task is scheduled when its input data streams are complete, an idea which is the origin of data-driven multithreading (DDM).
The paper’s main aim is to extend single-node DDM to distributed multinode concurrent processing for high performance computing (HPC). The data is forwarded to remote nodes for execution, where there is no need for synchronization to perform “scheduling operations, computations, and network functionalities.”
As the basis of the proposal, the FREDDO framework, which was developed with C++ application programming interfaces (APIs) for DDM implementation, is introduced. Its different components at single node and distributed remote multi-node scenarios are featured. A dependency graph is used to describe its workflow dynamism in a single-node scenario. In a distributed scenario, the key components are connected multicore nodes equipped with network managers. A runtime system handles communications, data management, dependency graph arrangement, and scheduling/execution policy enforcement.
The emphasis then shifts to FREDDO’s memory model, which with distributed shared memory (DSM) and shared global access space (GAS) mechanisms, at any node, describes the single distributed address space management. Scheduling mechanisms and distribution patterns for thread instances are discussed with distributed execution termination techniques and network management policies. The architectural discussion finishes with reducing the network traffic and recursion support subjects.
The detailed discussion continues with a programming example, which describes the DDM dependency graph and its coding and FREDDO sample code for a considered scenario. The experimental evaluation includes two different hardware platforms and three application categories (low, medium, and heavy inter-node communication) to estimate the time, speed-up, and network traffic measurement criteria.
The paper is a great contribution to the transition from single-node DDM to distributed remote multi-node for HPC. However, the nature of the programming example section is somehow incompatible with the discussion stream. It would be nice if an analytical comparative discussion of the proposed architecture and procedural programming platforms like MPI was provided prior to the experimental evaluation, to better demonstrate their contrast.