One of the disciplines that has greatly benefited from increasing computational processing power is machine learning. Increased computer power, increased availability of data (or the ease with which we can now gather data), and the open-source movement, to which R belongs, are paramount in the rapid and extended development of machine learning applications.
Four things are necessary to start experimenting with machine learning:
- (1) a thorough understanding of machine learning techniques,
- (2) the way these techniques are implemented in the statistical program we are using,
- (3) a prepared dataset to which we can apply the algorithms, and
- (4) an understanding of what is it we are trying to achieve, so that we know when there is room for improvement.
The book first provides an overview of the technical and philosophical concepts and trends behind machine learning, how they relate to other disciplines in computer science, and what can be achieved by using techniques. Then, the author covers some fundamental statistical concepts and their implementation in R. Once we are reminded of basic statistics, we can proceed to explore and experiment in R. We learn fundamental data abstractions and syntax in R with several examples, so that we understand later how data structures should be handled, formatted, and adapted for each technique.
Some of the most popular techniques in machine learning are covered in the book, corresponding to the following categories: unsupervised learning, supervised learning, and meta-learning. For each technique presented, we get background information, such as mathematical and statistical definitions, strengths and weaknesses of the technique, and a description of the steps comprising the algorithm.
The author illustrates each technique with real data examples; he shows us how to format the data so that the technique can make the most of it with the R implementation chosen. Commonly, there is more than one R implementation available for the specified technique, but the author often works with the one that, in his opinion, is the easiest to understand.
Once he applies the technique to the data, an evaluation of the model performance is carried out with suggestions of what makes the technique and the data perform well; improvements are then applied and the model is again evaluated.
The book serves both as a tutorial and a reference for those starting in the field. I would recommend reading it and following the examples on a computer.
More reviews about this item: Amazon