Design and Implementation of Probabilistic Programming Language Anglican
1. Introduction
Probabilitic programming systems represent generative models as programs (given an observable variable X and a target variable Y, a generative model is a statistical model of the joint probability distribution on X × Y) written in a language that provides syntax for the definition and conditioning of random variables. Typically, inference can be performed for any probabilistic program using inference techniques provided by the system back-end, such as Metropolis-Hastings, Hamiltonian Monte Carlo, Sequential Monte Carlo, and expectation progation.
There are several ways to build a programming language on top of or besides another language.
- The easiest way is to implement an interpreter – a program that reads a program in its entirety, and executes it by applying operational semantics of a certain kind to the language.
- Another way is to write a compiler – here the whole program is translated from the higher-level source language to a lower-level object language, which can be directly executed either by hardware or an interpreter which can be made simpler and more efficient than an interpreter for the source language.
- A third approach is where a new language is implemented inside another language of the same level of abstraction. Different languages provide different means for this; Lisp has a macro facility that allows one to extend the language – by writing macros, one adds new constructs to the existing language. There are several uses of macros – one is to extend the language syntax; another is to keep the existing syntax but alter the operational semantics (the way programs are executed and how their outputs are computed).
Anglican takes the third approach. A macro facility provided by Clojure is used to both extend Clojure with constructs that delimit probabilistic code, and alter the operational semantics (operational semantics ties any type of operation (arithmetic, assignment, etc.) to the computation involved) of Clojure expressions inside probabilistic code fragments.
Inference algorithms execute Anglican programs while trying to answer probabilistic queries queries on those programs. This execution is significantly different from the one described in the standard operational semantics. These algorithms typically run Anglican programs multiple times for a single inference task. The algorithms may make random choices that do not correspond to any statements in the program, and decide which parts of the program code are executed, and how often.
3. Design Outline
An Anglican program, or query, is compiled into a Clojure function. When inference is performed with a provided algorithm, this produces a sequence of samples.
(defquery model "model selection" data
(let [ dist (sample (categorical [[normal 0.5]
[gamma 0.5]]))
a (sample (gamma 1 1))
b (sample (gamma 1 1))
d (dist a b)]
(loop [observations data]
(when (not-empty observations)
(let [ o (first observations)]
(observe d o)
)
(recur (rest observations))
)
)
[d a b]
)
)
This query builds a model for the input data
, a sequence of data points.
- It defines a probability distribution on three variables,
d ∈ {normal, gamma}
for a distribution type, anda
andb
for positive parameters for the type. - Using the
sample
forms, the query first defines a prior distribution on these three variables. - Using the
observe
form, the query then adjusts the prior distribution based on observations indata
. - Samples from this conditioned distribution (i.e. the posterior distribution) can be obtained by running the query under one of Anglican’s inference algorithms.
A probabilistic program (or query) mostly runs deterministic code, aside from the special forms sample
and observe
at which normal deterministic execution is interrupted and Anglican programs must allow the inference algorithm to step in, recording information and affecting control flow. We refer to these points in the execution as checkpoints – handling of checkpoints can be implemented through coroutines, parallel execution, as well as explicit maintenance of program continuations. Anglican uses explicit maintenance of program continuations.
4. Probability of a Program Execution
Anglican programs define probability distributions over sequences of values, implicitly through program execution. A good way to understand this is to imagine the following interpreter of Anglican programs:
Starting from a fixed initial state, the interpreter runs the deterministic parts of a program according to the standard semantics, executes the sample
form by generating a random sample, and treats the observe
form by skipping.
More importantly, the interpreter keeps a log that records information about all the sample
and observe
forms encountered during execution.
- The information recorded for
sample
is a triple(F, x, α)
consisting of:- A primitive probability distribution
F
, for which we have the probability densitypF
. - A value
x
sampled from the distributionF
. - A label
α
that uniquely and systematically identifies the random choice made.
- A primitive probability distribution
- The information recorded for
observe
is a pair(G, y)
consisting of:- A primitive probability distribution
G
, just likeF
. - An observed value
y
. Thus the log is a sequence of triples(F, x, α)
and pairs(G, y)
.
- A primitive probability distribution
An important property of logs is that they are determined by their projection to triples: when two logs project to the same sequence of triples, they must be the same. This is because the triples in a log contain all the information about random choices made during execution and all the non-sample forms in Anglican programs are deterministic.
We define a trace x to be a sequence of triples (F, x, α)
, and say that x is feasible if the trace is precisely the triple part of the log of some execution. A feasible trace uniquely determines the rest of the execution and its log. In particular, it decides the pair part of the log i.e. the sequence of (G, y)
for observed values. We define the image y of a trace x to be the sequence of (G, y)
.
A probabilistic program defines a probability distribution over finite feasible traces x .
5. Syntax
5.3 Special Forms
Anglican provides special forms for probabilistic inference:
(sample distribution)
(observe distribution value)
A good way to understand the sample
and observe
forms is to consider the imaginary interpreter previously described. Under this interpreter, the idealised semantics are:
- The
sample
form draws a valuex
from its parameter distributionF
and generates a unique labelα
for this random draw. Then it appends(F, x, α)
, the record of this random draw, to the log of the current execution. - For the
observe
form, the interpreter just appends its two parameters, the distributionG
and observed valuev
, to the log of the current execution, and returnsnil
.
The actual computation involved in handling sample
and observe
depends on the inference algorithm. Different inference algorithms may treat sample
and observe
differently, as long as they compute the same distribution on traces (sequences of sampled values).
The forms sample
and observe
may appear anywhere in the code of an Anglican program. To support these forms, the inference engine must be able to intervene into execution of the program, perform computations related to inference, and control further execution of the program.