Machine learning, possibly contrary to popular belief, is not just about endless variations of neural networks. There is also a thriving subculture of probabilistic programming based on Bayesian principles. A large advantage of the latter approach is that most models provide some level of explanation of what was learned, whereas extracting such information from a neural network is (very) active research.
In probabilistic programming, conditioning--appropriately called “a main feature” here--plays a central role. Thus it is quite important to understand it well, which is the domain of inquiry of the paper at hand. The authors make the choice of working over an imperative language with discrete choices (via nondeterminism). The nature of their imperative language brings in issues of nontermination. One of the interesting discussion points is the semantic difference between nontermination and nondeterminism. Some semantics do not differentiate between these, which the authors claim is problematic. Instead, a relative semantics that computes “probability of success + termination” over “probability of termination” is given. Once this design choice and the choice of using Dijkstra-style weakest precondition semantics are made, quite a lot of the rest follows quite naturally. Of course, there are many details that need to line up just right for everything to work, and this is all detailed.
While parts of the paper are quite technical, the authors take great pains to also motivate and illustrate everything they do. They provide many well-chosen examples to illustrate their point. When appropriate, figures are also used to give a more visual explanation. The examples given are quite concrete, accompanied by fully explicit computations. This really helps explain all the steps of what could otherwise be seemingly arbitrary rule sets.
Perhaps the main disappointment I have is that Narayanan and Shan’s work on disintegration in Hakaru [1] is not mentioned at all. Perhaps this is because the work is more concerned with the continuous case (from their perspective, the discrete case is too easy).
Anyone interested in the semantics of probabilistic programming languages should read this paper. It is a model of clarity. While the various choices taken here may not necessarily stand the test of time, they all make sense and are well justified.