Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Beginning data science in R : data analysis, visualization, and modelling for the data scientist
Mailund T., Apress, New York, NY, 2017. 352 pp. Type: Book (978-1-484226-70-4)
Date Reviewed: Mar 23 2018

Data science has existed for 40 years, but is increasingly receiving more attention since large amounts of structured and unstructured data (big data) are collected every day within industry, the healthcare sector, the environment, and so on. Every day, we are producing data through our behaviors, for example, using our credit cards at the store, visiting a public garden and being recorded by a camera, and driving. Modern information and communications technology (ICT) systems are able to collect, store, and process the data we produce. Big data includes large amounts of structured and unstructured raw data produced by various sources and collected by modern ICT-based systems, including the Internet of Things (IoT) and systems where different groups and classes of objects with networking capabilities like sensors and actuators collect and exchange data. The collected data contain certain information that needs to be analyzed to extract knowledge. This activity is called data science.

Extracting knowledge from a huge amount of collected raw data requires one to follow data analytics methodologies and is arduous. However, ICT systems can be used to perform such knowledge extraction. Various data analytics tools are available, from commercial products like SAS, SPSS, and Statistica, to free products like R. SAS and SPSS are the most widely used tools for data analytics. The use of R for extracting knowledge from data is increasing.

A brief comparison of the previously cited tools shows that R presents several advantages and limitations regarding memory management, programming efforts, and learning. Learning R is harder than SAS, for example. However, R is open source and features several functions that are not available in SAS.

This book presents how to proceed with data science, using R to perform data manipulation, extraction, and analysis. Overall, the book is well written and easy to read, and guides the reader through the use of R. The author judiciously presents many use cases, examples, and exercises at the end of each chapter.

The book contains 14 chapters, introducing R and covering R programming, testing, and optimization.

As the title states, the author really considers the reader as a beginner in R with a strong background in data science. Therefore, at the beginning of the book, the author guides the reader through the installation process and basic notions of R. I appreciate how the book is structured. The initial chapters are less complex, and the complexity level increases as the reader progresses through the material. The end-of-chapter exercises help readers test their knowledge. This self-evaluation is a good strategy to help the reader really understand the material before moving on. Unfortunately, the solutions to the exercises are not included in the book.

Given that SAS, SPSS, and other data science tools are extremely expensive, data scientists as well as researchers with a solid background in programming who are working on projects with small budgets can benefit from this book. The structure and the writing style are major strengths; researchers will quickly learn how to perform data analytics in R.

This book is an excellent teaching and learning tool for people who would like to quickly and easily learn R programming and data analysis. Therefore, it can be recommended for data science students and lecturers, as well as for researchers.

I recommend this book to anyone who plans to learn R. Previous object-oriented programming experience and a mathematical background are the minimum prerequisites.

More reviews about this item: Amazon

Reviewer:  Thierry Edoh Review #: CR145928 (1806-0281)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Database Management (H.2 )
 
 
Statistical Computing (G.3 ... )
 
 
Mathematical Software (G.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Database Management": Date
Progressive skyline computation in database systems
Papadias D., Tao Y., Fu G., Seeger B. ACM Transactions on Database Systems 30(1): 41-82, 2005. Type: Article
Jan 24 2006
 Raghu Ramakrishnan speaks out on deductive databases, what lies beyond scalability, how he burned through $20M briskly, why we should reach out to policymakers, and more
Winslett M. ACM SIGMOD Record 35(2): 77-85, 2006. Type: Article
Nov 23 2006
Beginning PHP 5 and MySQL 5: from novice to professional (2nd ed.)
Gilmore W., APress, LP, Berkeley, CA, 2005.  952, Type: Book (9781590595527)
Nov 30 2006
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy