Deep learning is currently the most popular (and maybe hyped) discipline within artificial intelligence (AI). It is a key component of current speech recognition systems and it has allowed computers to reach human-level performance in many tasks that were beyond their reach just a decade ago, such as object recognition in computer vision. It has also been successfully applied in other areas, ranging from machine translation (Google Translate changed its architecture to use neural networks in November 2016 [1]) to self-driving cars (NVIDIA trained a fully autonomous end-to-end self-driving system using convolutional neural networks [2]).

Even though media reports on these milestones might make you think that deep learning is truly novel, as a matter of fact, the term “deep learning” basically refers to the same set of techniques that were already used in the 1980s, when they were called artificial neural networks or neurocomputing. Their history traces back even further, to the very beginnings of AI, since the first neural models were proposed by Warren McCulloch and Walter Pitts in 1943. Alan Turing suggested, in a talk at the London Mathematical Society in 1947 and also in a later lesser-known 1948 technical report [3], that neural networks could be trained to perform any task. A decade later, Frank Rosenblatt popularized, with his perceptron, the first training algorithm for neural networks, hailed by the *New York Times* as “the embryo of an electronic computer ... able to walk, talk, see, write, reproduce itself and be conscious of its existence” [4]. As you can see, that is not too different from what you can read in the news 60 years later. The second wave of neural networks arrived with the backpropagation learning algorithm [5], still used to train current neural networks. The ebb and flow of neural network research is now at its third peak with deep learning techniques, after the troughs of the so-called AI winter (1970s) and the golden age of alternative machine learning techniques, such as support vector machines (1990s-2000s).

Ian Goodfellow, Yoshua Bengio, and Aaron Courville’s monograph is a thorough survey of the state of the art in deep learning. Freely available online, at http://www.deeplearningbook.org/, with just minor differences in some figures with respect to its print edition, their book is an excellent resource for researchers who want to delve deeper into the art and science of deep learning.

A short introductory chapter sets the stage by focusing on representation learning and providing an eagle’s-eye perspective on the evolution of neural networks. Whereas the first commercially successful AI systems used hand-coded expert knowledge, machine learning techniques build models directly from data. The automation of learning models from data is often done after a costly process of manual feature engineering. In contrast, deep learning promises the automation of feature extraction from raw data (also known as representation learning.

The first 150 pages of the book cover some applied mathematics and machine learning fundamentals, from a clever yet convoluted description of principal component analysis (PCA) to the introduction of Kullback-Leibler divergence, a measure of similarity between probability distributions, later used to interpret maximum likelihood estimators. Special emphasis is given to gradient-based optimization, the key strategy behind neural network training, both on first- and second-order methods.

The historical evolution of neural networks has been enabled by Moore’s law, with increasing network size and training datasets. From those datasets, different machine learning algorithms can be viewed as particular instances of a really simple recipe: a dataset, a cost function, an optimization procedure, and a resulting model. Whereas traditional machine learning techniques require *O*(*k*) examples to distinguish *O*(*k*) regions in space, when using deep learning, *O*(2*k*) regions can be defined with *O*(*k*) examples. Since probability distributions over images, text, and sounds that occur in the natural world are highly concentrated (in contrast to the static noise in old analog TV sets), the so-called manifold hypothesis favors the use of deep learning to extract hierarchies of features from raw training data.

The second part of the book, encompassing more than 300 pages, describes deep networks as they are currently used in practice. Apart from the aforementioned decades-old backpropagation algorithm, the authors provide an excellent survey of regularization techniques (a varied collection of methods, often heuristic, used to reduce the test error in neural networks) and optimization algorithms, such as stochastic gradient descent, Nesterov momentum, conjugate gradients, of L-BFGS. They also discuss specialized network topologies for dealing with images (convolutional networks) and sequences (recurrent networks). Finally, they provide some advice on using neural networks and a shallow review of some of their many different applications.

The third part of the book, covering another 300 pages, will probably be of interest to researchers only, but not for practitioners. Here, they analyze the state of the art of deep learning research, including more speculative ideas that still have to prove their worth in practice. With a strong focus on autoencoders and generative models, the authors address the computational challenges of probabilistic models, discussing Markov chain Monte Carlo (MCMC) methods, stochastic maximum likelihood, contrastive divergence, sampling techniques, and approximate inference, some of the mathematical tools that deep learning models resort to. Myriads of Boltzmann machine variants and differential generator networks (for example, variational autoencoders and generative adversarial networks) are briefly described, as well as the subtleties behind the evaluation of generative models.

Overall, the book provides an excellent survey of the deep learning research field. Even though it might not be truly accessible to the novice and it might be somewhat demanding for professional data scientists, it is priceless for those interested in starting their career in a highly dynamic field with wonderful research opportunities.

More reviews about this item: Amazon, Goodreads