Morgan & Claypool’s “Synthesis Lectures on Human Language Technologies” series provides grounded theory and practical approaches in an applied research sense. *Bayesian analysis in natural language processing* is in this series.

Chapter 1 gives an overview of the basic probability theory that is relevant to the topic. Parametric and generative models are presented, as they play significant roles in the next chapters. Bayesian and frequentist philosophy are discussed briefly. To understand the presented details, readers need two semesters of probability theory in spite of the fact that the book uses a well-defined realm of both discrete and continuous probability theory. Moreover, there is a comprehensive appendix of the basic concepts.

Chapter 2 discusses the relationship between Bayesian statistics and natural language processing (NLP). The recent application of Bayesian statistics in NLP uses the most modern approaches of statistics and machine learning. The overview is based on the most recent literature; it is very useful to anybody who wants to study and apply the described approaches.

Chapter 3’s analysis identifies the basic principle of the Bayesian statistics: combine prior (that is, previous) beliefs with currently available data, and then conclude a posterior (that is, subsequent, consequence) distribution; this yields inferences that are drawn from the available data. The chapter introduces the notion of conjugate priors, which became a very important conceptual tool to handle the compromise between expressiveness of the prior probability distributions and computationally tractable inferences. The reader can understand the frequent use of Dirichlet distribution in NLP, since the Dirichlet distribution is the conjugate prior of the categorical distribution. The categorical distribution fits to structures that occur in NLP, so these two distributions can be applied as a rule of thumb.

Chapter 4 covers Bayesian estimation methods in NLP. The reasoning behind the application of Bayesian point estimation is to support a lightweight model with a fixed set of parameters in NLP.

Chapter 5 is about sampling methods. Some important tools are Monte Carlo methods and Markov chains, and their combination. The samples come from the posterior distribution. Markov chain Monte Carlo (MCMC) methods can be used to compute the posterior distribution up to a normalization constant. For interested readers, the chapter gives a good introduction to sampling methods, and strongly advises starting with the basic principles to define your own approaches.

Chapter 6 describes variational inference methods. Here, variational means the searching for maximum or minimum of functionals mathematically. In NLP and Bayesian environments, expectation–maximization (EM) is used.

Chapter 7 analyzes non-parametric priors--the book discusses parametric models up to this point. The chapter presents the Dirichlet process and some other approaches that can be used in Bayesian non-parametric models and priors.

Chapter 8 talks about probabilistic grammars that are frequently used in NLP. It covers probabilistic context-free grammars in the context of both parametric and non-parametric Bayesian analyses. These approaches fit NLP, as probabilistic grammars are generative models similarly to Bayesian models.

Chapter 9 covers representational learning and neural networks, which became essential methods as they made automatically generated features and other representational structures in linear models possible. The application of neural networks led to architectural engineering neural networks that represent certain artifacts in NLP as words, sentences, paragraphs, and documents. Recurrent and convolutional neural networks are frequently employed in NLP. There are several open research questions in which Bayesian learning can help.

The authors provide a broad and detailed overview of Bayesian-related NLP techniques. The presented methods and mathematical calculations require an understanding of probability theories. Mappings between abstract probability concepts and the notion of NLP are given several times, and especially the last chapters contain several clues to the relationships between the mathematical concepts and notions of NLP. Diagrams illustrate probability distribution in multi-dimensional spaces.

The book is also very useful for readers in research and development (R&D) who want to apply NLP in other contexts, for example, ontology learning, database schema matching, and identifying where natural language may play an important role.