The domain of optimizing process execution for high-performance computing (HPC) applications is explored in this paper. These applications execute many smaller processes in parallel on large computing clusters. The challenge in this paper is to schedule the processes on the available hardware in a way that minimizes runtime. Runtime performance depends on the hardware on which the process runs and the location of the input data for that process. The scheduling algorithm’s choices include where to run each process, and whether to move a running process from one computation node to another. This approach considers not only the memory and central processing unit (CPU) requirements of each process, but also its input/output (I/O) footprint. Previous scheduling algorithms were oblivious to this aspect of process execution.
The authors correctly note that modern clusters are not homogeneous. As hardware ages, the operations team performs selective upgrades to machines, networks, and disks. A heterogeneous platform introduces further complications. On the one hand, the system wants to use the highest performance components, while, on the other hand, idle resources should be utilized so as to improve the overall throughput.
The algorithms developed herein are not particularly complicated. They define thresholds for determining placement or migration of a process. The paper continues with simulations of clusters and applications, strongly suggesting that this algorithm is significantly better than others.
This paper is appropriate for those interested in the impact of I/O on large-scale parallel applications. The specific results form a good starting point for new system design and for distributed computational platforms such as Map/Reduce, Hadoop, and Condor.