Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A constraint programming scheduler for heterogeneous high-performance computing machines
Bridi T., Bartolini A., Lombardi M., Milano M., Benini L. IEEE Transactions on Parallel and Distributed Systems27 (10):2781-2794,2016.Type:Article
Date Reviewed: Mar 23 2017

If you have managed high-performance computing (HPC) facilities, you will be aware of the difficulties that can arise in job scheduling. Nineteen nodes in your large-memory queue become available; you have one user who wants 20 nodes for an hour to run a large-memory message-passing interface (MPI) job and several users who want five large-memory nodes for 30 minutes. How do you decide what should happen?

The authors point out that the average supercomputer reaches full depreciation in three to five years, so its utilization has to be aggressively managed to produce an acceptable return on investment. Schedulers like Torque and Slurm can be configured using simple priority-rule-based algorithms, but better results can be obtained using constraint programming (CP) paradigms. Such paradigms have been considered too computationally expensive for general use; however, the authors note that HPC jobs in general exhibit a longer duration and lower arrival rate than jobs in enterprise servers and data centers.

They therefore propose an efficient CP approach that minimizes the job time in queue weighted on expected average waiting time. Their CP solver has been tested by embedding it as a plug-in for a PBS Professional scheduler on the 64-node CINECA Eurora machine; it was able to significantly reduce job waiting times while maintaining the same average machine utilization.

A CP solver scheduler cycle is triggered by changes in system state such as new job submission, job deletion, or node awakening. If a solution is not found within a predefined time, the solver is re-executed with an increased time limit. If there is still no solution, the number of queued jobs considered for scheduling is halved. Thresholds are empirically defined such that the average overhead for 1600 jobs on the Eurora machine is about six seconds.

The authors observe that the CP scheduler can be further developed to reduce its execution overhead; one might hope that a significant improvement in scalability could thereby be realized. This paper will fascinate those who manage or use HPC facilities.

Reviewer:  G. K. Jenkins Review #: CR145137 (1706-0384)
Bookmark and Share
  Featured Reviewer  
 
Scheduling (D.4.1 ... )
 
 
Super (Very Large) Computers (C.5.1 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Scheduling": Date
The gradient model load balancing method
Lin F., Keller R. (ed) IEEE Transactions on Software Engineering 13(1): 32-38, 1987. Type: Article
Sep 1 1987
Preemptive scheduling of a multiprocessor system with memories to minimize maximum lateness
Lai T., Sahni S. SIAM Journal on Computing 13(4): 690-704, 1984. Type: Article
Jul 1 1985
Scheduling independent tasks on uniform processors
Dobson G. SIAM Journal on Computing 13(4): 705-716, 1984. Type: Article
Apr 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy