ComputingReviews.com

Online execution time prediction for computationally intensive applications with periodic progress updates
Chtepen M., Claeys F., Dhoedt B., De Turck F., Fostier J., Demeester P., Vanrolleghem P. The Journal of Supercomputing62(2):768-786,2012.Type:Article

Date Reviewed: 03/01/13

Grid computing is a recent technology that enables large-scale scientific computations. Efficient use of the vast number of distributed resources found in such infrastructures remains an important research issue. The task, typically handled by schedulers, is to find a proper balance of computation, storage, and network resources to assign to the tasks present in parallel jobs. This paper builds on an adaptive workflow scheduler introduced in a 2009 paper by some of the same authors [1], introducing an extension that improves resource usage for grid computations by up to 15 percent.

Instead of using a simple extrapolation technique to guess the eventual total execution time of a task from successively sampled measured runtimes, the authors propose using a small set of user-provided evolution models for these completion times. Based on experimental data, one can indeed notice that, over the execution span of a task, total time estimations are often overestimated, underestimated, or randomly varying, before settling to their exact value. Using past estimation samples, one can decide in which category a run estimation behavior belongs, and consequently provide a more accurate expected total time for tasks earlier in the process, improving the whole scheduling process. This approach has been successfully tested on the dynamic scheduling in distributed environments (DSiDE) grid simulator, using simulated tasks whose characteristics were derived from actual Tornado runs on the UGent grid. (Tornado is a modeling and virtual experimentation platform for environmental analysis.)

Even though this paper suffers from some typographical errors and a somewhat convoluted and approximative presentation, I recommend it as an interesting read for people working with grid infrastructure issues.

Chtepen, M.; Claeys, F.; Dhoedt, B.; De Turck, F.; Demeester, P.; Vanrolleghem, P. Adaptive task checkpointing and replication: toward efficient fault-tolerant grids. IEEE Transactions on Parallel and Distributed Systems 20, 2(2009), 180–190.

Reviewer: P. Jouvelot

Review #: CR140978 (1306-0528)

Reproduction in whole or in part without permission is prohibited. Copyright 2024 ComputingReviews.com™
Terms of Use | Privacy Policy