Scientific datasets are huge compared to even the most complex business ones, and their analysis methods are accordingly complex. These methods, called scientific workflows, consist of a large number of dependent jobs with complex priority constraints between them, and describe the overall process needed to reach the required scientific objectives. Such computationally intensive methods take advantage of the many different types of cloud infrastructure available today. However, since cloud infrastructures are expensive and difficult to configure, tools to simulate them before committing to any given solution would be heartily appreciated.
The good news is these tools exist and are called workflow management systems (WMSs); they bring to life scientific workflows by optimizing the allocation of individual jobs to single-cloud infrastructure components. The bad news? The available WMSs treat the cloud as black boxes and do not allow for the fine-grained behavior evaluation of single infrastructure components, above all energy consumption which is a key worry in today’s energy-conscious world. The authors respond to this challenge by developing their own WMS, called DISSECT-CF-WMS and based on the DISSECT-CF framework; in their view it overcomes the known limitations of existing WMSs. This paper describes DISSECT-CF-WMS extensively.
The paper starts with detailed descriptions of scientific datasets; workflows; workflow scheduling problems and the most widely used algorithms to solve them; and the concept of WMS with the most widespread software products available. Then comes a detailed description of DISSECT-CF, the framework, and DISSECT-CF-WMS, the software product. In the last part, the authors evaluate the performance of DISSECT-CF-WMS, as compared to other products, in areas such as power consumption, resource usage, data center configuration, background load, resource allocation, throughput, latency, and reliability.
Of course, DISSECT-CF-WMS, like any other WMS, cannot perform actual analyses on scientific datasets; it can, however, help researchers and professionals alike to set up an optimized cloud infrastructure before committing to any real-world solution. The last bonus of this paper is that both its code and the datasets on which it was tested are directly downloadable from GitHub. This permits the reader to tinker with DISSECT-CF-WMS, get an idea of its qualities, and maybe even use it for real-world purposes.