This is an enlightening treatise on data science. There is no hype here--just a thought-provoking piece that articulates fundamental concepts and implications. The natural audience is the IT or business professional (or manager) who is interested in acquiring a clearer understanding of modern data science.
Focusing primarily on examples from the healthcare industry, this article explains succinctly why “big data” really is different because of its impact on well-established approaches to creating knowledge. The author begins by defining “data science” as the “generalizable extraction of knowledge from data,” focusing on the notions that much of today’s data is unstructured and that traditional database models are mostly unsuitable for such data. After this introduction, he begins to develop the core thesis of the article with a discussion of prediction and machine learning.
The conventional approach to creating knowledge is to build a theory in the human mind based upon previously established theories and then to verify the new theory by collecting and analyzing appropriate data. The author points out that big data turns this on its head by making it possible for machine learning algorithms to build good models for predicting outcomes with little understanding of key underlying relationships and with no theoretical framework to support those models. Furthermore, since these models are based on the data and are essentially computer-generated, they can be made to evolve in conjunction with the processes that create their data. There is no need to rebuild theory as the situation changes in order to build new models. All of this, of course, portends fully automated decision making on a grand scale.
This is an important article for those who wish to understand the rationale and potential for data science. The focus is not so much on analytics, per se, as it is on machine-based prediction and machine-based decision making. This informative article lays the conceptual groundwork for these insights, and explains how and why machine learning is the true driving force behind the future of the data science phenomenon.