Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Big data and social science : a practical guide to methods and tools
Foster I., Ghani R., Jarmin R., Kreuter F., Lane J., Chapman & Hall/CRC, Boca Raton, FL, 2016. 376 pp. Type: Book (978-1-498751-40-7)
Date Reviewed: May 9 2017

The increasing availability of data permeates a broad range of applications. It provides the scientific community with unprecedented possibilities to carry out research and seek answers to questions that would have been difficult to study a few years ago. But one of the caveats of using largely available amounts of data is learning the appropriate gathering, preparation, handling, and analysis of the information.

Foster and colleagues intend to lead the social science community into the promising waters of big data by presenting a step-by-step description of a research project. The authors and their team carried out a study on behalf of the President’s Science Advisor aimed at determining the impact of investments in science. The study serves as an arguably complete description of methods and tools relevant to big data analysis.

Their approach consists of a four-step model, presented in the first part of the book, to deal with big data. The first step is concerned with the gathering of data from different sources of relevant information on the Internet and the assumptions that should be made on the completeness and reliability of the data. Some of the code presented at this stage cannot be run without accounting for some modifications necessary to recover from missing arguments in the data or changes to the application programming interfaces (APIs) presented. This sort of situation may be detectable for a seasoned programmer, but difficult to deal with for a novice; it might even hinder the novice reader in following the examples presented.

The second step consists of having data originating from different sources that bring to bear the concept of record linkage as a powerful process to fill in missing data or simply broaden the scope of the dataset in question. Two techniques are discussed and superficially described. The first technique discussed is probabilistic linking, and the second relates to machine learning techniques. This section closes with some discussion on privacy before moving on to the next stage.

The third step relates to data storage. The authors strongly encourage the use of a database management system to store and handle large amounts of diverse data that will likely evolve and grow over time. They compare diverse alternatives to data storage depending on the sort of data to be stored, whether the data is structured or unstructured, its size, and how to manipulate the data once it is stored in a database.

The fourth step describes the way in which to perform computations on the data. It particularly addresses issues of parallel distributed computing. The MapReduce model and some variations of it are briefly discussed. However, it misses a hands-on explanation of how this model can be applied to the bulk of data associated with the case presented in the book.

The second part of the book explores the many methods that can be used in the analysis of data. Again, only overviews are presented regarding classification methods, unsupervised and supervised machine learning algorithms, and text and network analysis. Some topics on data visualization are discussed.

Finally, the third part of the book discusses some caveats in the use of big data to study social problems. Some of the issues discussed concern their statistical prediction and explanation power. Other issues concern matters of confidentiality and privacy. Starting from the issues arising from error accumulation due to incomplete, selective, or simply erroneous data, each step in the research process, from the data gathering, preparation, analysis, and interpretation, can add errors to the explanatory or predictive power of the data, rendering the results of the analysis inaccurate. Ethical and legal issues are also discussed in this part of the book.

In summary, the text provides an overview of the different methods for incorporating big data in the social sciences. It requires a strong technical background to enable the reader to thoroughly understand the concepts, the programming, and the algorithms presented throughout the book. In order to follow some of the examples presented in the book, some software needs to be downloaded, installed, and configured; however, the software tools are elusively mentioned in the last chapter. The information provided is scarce and in my view difficult to follow if the reader is a beginner in programming languages or new to Python.

Reviewer:  Carla Sánchez Aguilar Review #: CR145257 (1707-0437)
Bookmark and Share
 
Social And Behavioral Sciences (J.4 )
 
 
Data Mining (H.2.8 ... )
 
 
Content Analysis And Indexing (H.3.1 )
 
 
Database Applications (H.2.8 )
 
Would you recommend this review?
yes
no
Other reviews under "Social And Behavioral Sciences": Date
Computers and history
Adman P., Halsted Press, New York, NY, 1987. Type: Book (9789780470208526)
Aug 1 1988
A guide to SPSS/PC+
Frude N., Springer-Verlag New York, Inc., New York, NY, 1987. Type: Book (9789780387913124)
May 1 1988
Relational data base structures and concept formation in the social sciences
Edward E. J. Computers and the Social Sciences 1(1): 29-49, 1985. Type: Article
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy