Big data analytics (BDA) encompasses a set of activities oriented toward the transformation of enormous amounts of raw data into meaningful insights, the discovery of hidden patterns, and the prediction of the likelihood of future events. A BDA ecosystem is a collection of software solutions that support the different phases and activities of this process. Many different approaches, techniques, and tools have been proposed for each of these phases, making it difficult for data engineers and organizations to understand their advantages and disadvantages, the way in which they address the main BDA challenges, their similarities and differences, and their impact on the ecosystem as a whole.
The goal of this survey is to help shed some light on this complex scenario. First, the problem is subdivided into six main pillars or capabilities that the BDA ecosystem should address (storage, processing, orchestration, assistance, user interface, and deployment method). Then, a taxonomy is proposed to organize the different approaches for each pillar.
Following the schema provided by the taxonomy, the paper goes into a thorough overview of the existing approaches, covering both the description of how BDA goals and challenges are addressed and which are the main software solutions that implement them.
Finally, the criteria to be used for the selection of components in order to assemble an ecosystem are exposed (organization’s objectives, data characteristics, and available skills), providing some recommendations for the appropriate components in different use cases.
This paper is a very interesting survey that provides detailed guidance to BDA practitioners and high-level abstraction tools to help organizations and decision makers.