Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Deep learning for computer architects
Reagen B., Adolf R., Whatmough P., Wei G., Brooks D., Morgan & Claypool Publishers, San Rafael, CA, 2017. 124 pp. Type: Book (978-1-627057-28-8)
Date Reviewed: Jan 31 2018

I was looking for a book on deep learning that could put the emphasis on efficiency, rather than achieving task accuracy at whatever cost. Although this book was not meant for this purpose, it changed my perception completely. It is an invaluable source of material with original and insightful plots, some of them uncommon in some machine learning conferences, with the exception of the Conference on Neural Information Processing Systems (NIPS), where hardware optimization is playing a more important role.

This book is said to introduce computer architects to the world of deep learning, and not the other way round. Still, it can help readers understand the efficiency and hardware issues when executing deep learning algorithms. And, after reading this book, a machine learning researcher would surely think twice before saying that one algorithm is better than another just by looking at a measure of error, disregarding the computational costs of the algorithms.

The first two chapters are a short and gentle introduction to machine learning and deep learning. It is so well written that I would even recommend it as standalone material to learn why and how deep learning works. It gives a glimpse of how much of the past and, especially, the future progress in deep learning has (and will) come from hardware and software optimizations. There are some summaries, such as the one about the types of learning, that are refreshing because of their simplicity. While Figure 1.3 appears too soon in the book, it is incredibly illuminating. It shows the prediction error for more than 20 neural network architectures when run on central processing units (CPUs), graphics processing units (GPUs), field-programmable gate arrays (FPGAs), or application-specific integrated chips (ASICs) and their consumption (power, in W).

Chapter 3 starts with an account of deep learning architectures (this would have been better suited in chapter 2), and then the “fathom workloads.” These workloads are eight pairs of benchmarks and architectures they used to analyze several issues related to computational resources and efficiency. My only objection is that (at the bottom of page 29) they exclude the preprocessing and postprocessing of data, which may affect some workloads more than others. Nevertheless, the really interesting bits come next. The profiling of these eight workloads per operation type is really clarifying (although, as the authors admit, unsurprising). A little bit more surprising is the clustering they do, where one benchmark related to image recognition (autoenc) is aligned with text/speech processing because it doesn’t use a convolutional network. As a side note, some concepts (such as the operation types) are not introduced because they are assumed to be well known by computer architects.

Chapter 4 goes to the core question: How much can we optimize with respect to the behavior of the network? Not using the easiest terminology, they distinguish between “safe” (when the computation is exactly equal to the nonoptimized variant) and “unsafe” (otherwise). For the unsafe case, they distinguish between “approximate computation” (where there is some significant effect on accuracy) and the iso-training noise (ITN) paradigm (where the change in accuracy is inside the standard deviation caused by random weight initialization). It is important to note that any implementation of a neural network always calculates things up to a precision: in theory, neural networks are analog, rather than digital, beasts. This chapter goes from safe to unsafe optimizations, mostly focusing on what can be done under the ITN paradigm. Still, the “approximate computation” is a very interesting field where good tradeoffs could be found.

Chapter 5 looks somewhat misplaced because it is a survey that could have been useful at some earlier point (to help readers understand Figure 1.3); part of it also could have been moved to an appendix or extended (to broaden the audience). Still, it goes from the optimizations dealing with simpler data representation (data types), to compression of the data and fewer weights (sparsity), and ultimately to circuit design. Cases such as BinaryConnect (where weights are represented by a single bit) show that, for the purpose of optimization, one can come up with radical changes that might force a complete overhaul of neural network theory. Also, preprocessing, ignored in earlier chapters, is important here, especially because the volume of data is key for optimization.

The conclusion chapter is too short and doesn’t give clues about the future. For instance, the authors say that “Moore’s law begins to fade” on page 79. I would have liked to see more discussion about this in the conclusions.

Overall, the term “performance” (understood as quality in machine learning and speed and less energy here) will never be the same for me. For researchers, in general, these very efficiently used 106 pages are valuable for the sake of replicability, as well as a very comprehensive introduction and survey. I cannot properly evaluate how useful this book is for computer architects, but the feeling (from how much you can take from it if you’re not) is that it may well be a gem.

Reviewer:  Jose Hernandez-Orallo Review #: CR145820 (1805-0214)
Bookmark and Share
  Featured Reviewer  
 
Learning (I.2.6 )
 
 
Applications And Expert Systems (I.2.1 )
 
 
General (C.1.0 )
 
Would you recommend this review?
yes
no
Other reviews under "Learning": Date
Learning in parallel networks: simulating learning in a probabilistic system
Hinton G. (ed) BYTE 10(4): 265-273, 1985. Type: Article
Nov 1 1985
Macro-operators: a weak method for learning
Korf R. Artificial Intelligence 26(1): 35-77, 1985. Type: Article
Feb 1 1986
Inferring (mal) rules from pupils’ protocols
Sleeman D.  Progress in artificial intelligence (, Orsay, France,391985. Type: Proceedings
Dec 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy