The primary objective in the area of phylogenetic trees is to construct a tree where leaf nodes represent extant species, internal nodes represent (possibly hypothesized) ancestors, and edges represent relationships between species. This book focuses on a related combinatorial object, ancestral recombination graphs (ARGs), which represent the phylogenetic data in terms of a graph or network rather than a tree. This approach attempts to model a wider range of biological events (for example, recombination and coalescent).
In this case, the broad objective is to construct an ARG that best fits the data at hand. However, constructing an ARG with a minimum number of recombination nodes is nondeterministic polynomial-time hard (NP-hard). Even determining the number of recombination nodes such a minimum ARG should have is NP-hard. Therefore, the focus is on heuristics for the graph, and bounds for the number of nodes. Moreover, this book is about the theory: ideas, models, and methods, rather than recipes for constructing ARGs.
Chapter 1 is an introduction to the topic. It outlines the central thesis of the book, namely that explicit genealogical networks can be computed that represent a derivation of extant species. It also covers the definitions from biology, as well as basic definitions from graph theory. The concepts are illustrated with a simplified pedigree of some of the British Royal Family. This example demonstrates in particular why a network, rather than a tree, is the right object to consider. The particular type of network focused on is the ARG that is a special kind of directed acyclic graph (DAG).
Chapter 2 explores trees. Although the overall focus of the book is on graphs rather than trees, the history of most of the results is in phylogenetic trees, thus a thorough examination of their properties is desirable. The chapter introduces the perfect phylogeny--essentially a rooted binary tree that matches the data--and covers a number of existence-style theorems about it, including conditions for binary sequences to be representable by a perfect phylogeny.
As a perfect phylogeny may be difficult or inadequate, chapter 3 explores a more realistic model, one that includes recombination. Recombination of two sequences is, in one form, the concatenation of parts of sequence one with parts of sequence two. Beginning from the recombination perspective, the author defines and studies ARGs, distinguishing between different kinds of phylogenetic networks and introducing properties of ARGs. Chapter 4, “Exploiting Recombination,” explores the biological concepts that relate to recombination.
Chapter 5 considers bounds. Since one of the primary goals in this area is the proper construction of ARGs, and since constructing a minimum ARG, or even calculating Rmin(M), are NP-hard problems, it is useful to consider bounds for some related values. Here the first algorithms appear, which are used to construct different types of bounds on Rmin(M).
Chapters 6 and 7 explore the structure of the ARG, focusing particularly on connected components and decompositions, and leading to more lower bounds on Rmin(M) and eventually to constructions of ARGs. Chapter 8 works toward the goal of building ARGs whose number of recombination nodes is close to Rmin(M), and introduces a special case of a min ARG called a galled tree.
Chapter 9 is devoted to heuristic ARG construction methods that produce reasonable results for this NP-hard problem. Chapter 10 returns to bounds for Rmin(M), introducing the history lower bound and the forest bound. The remaining chapters center around more specialized topics, including ARG-based halotyping and ARG-based association mapping.
The book is highly organized, easy to follow, and easy to read. It is self-contained both with respect to biology and combinatorics, and it would be suitable for a researcher in either discipline, or in computer science, who wants to learn the necessary background and theorems. The only drawback is that there are no problems given; however, there are numerous examples and references. This is a useful resource that brings together state-of-the-art research results in one place. Highly recommended.