Grid networks are gaining relevance as the scale of processors and networks expands. This paper analyzes data-intensive task scheduling and data migration in their data consolidation (DC) problem. In a complex grid, DC must select which datasets are required to perform a task, perform selections of how and where that data will be gathered from the grid, and manage the time delays and synchronization concerns. DC is a continuous-time problem versus a onetime problem, and it involves complex task scheduling, data management, and routing issues.
This paper describes and analyzes various cost performance and scheduling algorithms associated with DC delays, dataset sizes, and processing timelines. The solutions use a central processor that “is responsible for the task scheduling and data management.” The authors include an extensive performance evaluation based on a simulation environment, consisting of a host of nodes that are geographically dispersed. They then change the various cost algorithms and provide comparison graphs. Their results show that DC schemes that consider both computation and communication requirements fare better across the grid than those that use only one of those factors.
Future research should look at relaxing some of the assumptions of the studies. For example, how can DC be applied to both a dynamic network topology, such as a mobile network, and also be implemented across a distributed noncentral scheduler? In all, this paper provides a good overview of issues facing grid networks as they are applied to scheduling, and it invokes intensive computations using data retrieved from across the wide area grid.