The core idea of this paper is to use abstraction and refinement to solve large-scale job scheduling problems. Given a large scheduling problem with constraints on the required solution, the idea is to first abstract a problem that one can solve reasonably quickly. The next step is to refine the abstraction to a potential solution for the original problem. If the solution fails to satisfy the required constraints, we then use a refined version of the abstraction. An essential aspect of the abstraction algorithm is that it is designed to produce “pessimistic” abstractions that guarantee that the solution to the abstract problem can be refined to a solution for the original problem.
To illustrate their technique, the authors consider a job scheduling problem that involves scheduling jobs at different interconnected data centers. They present a general abstraction framework, and two schedulers that are instances of this framework: FISCH and BLIND. FISCH abstracts the collection of jobs that define a topological abstraction, and identifies two jobs if they both have the same minimal length path from the start job. In FISCH, we can further abstract jobs by identifying jobs with similar lengths: by varying the ratio that determines this similarity, we can coarsen or refine the abstraction. The other instance, BLIND, abstracts the graph of the data centers. A subtree of data centers is abstracted as a single node, where the minimum capacity of the component nodes determines its capacity.
The authors implemented schedulers based on these ideas. Their results show “a significant speed-up compared to traditional static scheduling heuristics, at a reasonable cost in the quality of the produced schedules.”