Probabilistic Effects. λθ

Design and Implementation of Probabilistic Programming Language Anglican

1. Introduction

Probabilitic programming systems represent generative models as programs (given an observable variable X and a target variable Y, a generative model is a statistical model of the joint probability distribution on X × Y) written in a language that provides syntax for the definition and conditioning of random variables. Typically, inference can be performed for any probabilistic program using inference techniques provided by the system back-end, such as Metropolis-Hastings, Hamiltonian Monte Carlo, Sequential Monte Carlo, and expectation progation.

There are several ways to build a programming language on top of or besides another language.

  1. The easiest way is to implement an interpreter – a program that reads a program in its entirety, and executes it by applying operational semantics of a certain kind to the language.
  2. Another way is to write a compiler – here the whole program is translated from the higher-level source language to a lower-level object language, which can be directly executed either by hardware or an interpreter which can be made simpler and more efficient than an interpreter for the source language.
  3. A third approach is where a new language is implemented inside another language of the same level of abstraction. Different languages provide different means for this; Lisp has a macro facility that allows one to extend the language – by writing macros, one adds new constructs to the existing language. There are several uses of macros – one is to extend the language syntax; another is to keep the existing syntax but alter the operational semantics (the way programs are executed and how their outputs are computed).

Anglican takes the third approach. A macro facility provided by Clojure is used to both extend Clojure with constructs that delimit probabilistic code, and alter the operational semantics (operational semantics ties any type of operation (arithmetic, assignment, etc.) to the computation involved) of Clojure expressions inside probabilistic code fragments.

Inference algorithms execute Anglican programs while trying to answer probabilistic queries queries on those programs. This execution is significantly different from the one described in the standard operational semantics. These algorithms typically run Anglican programs multiple times for a single inference task. The algorithms may make random choices that do not correspond to any statements in the program, and decide which parts of the program code are executed, and how often.

3. Design Outline

An Anglican program, or query, is compiled into a Clojure function. When inference is performed with a provided algorithm, this produces a sequence of samples.

(defquery model "model selection" data
  (let [ dist (sample (categorical [[normal 0.5]
                                    [gamma  0.5]]))
         a    (sample (gamma 1 1))
         b    (sample (gamma 1 1))
         d    (dist a b)]
      (loop [observations data]
        (when (not-empty observations)
          (let [ o (first observations)]
            (observe d o)
          )
          (recur (rest observations))
        )
      )
      [d a b]
  )
)

This query builds a model for the input data, a sequence of data points.

  • It defines a probability distribution on three variables, d ∈ {normal, gamma} for a distribution type, and a and b for positive parameters for the type.
  • Using the sample forms, the query first defines a prior distribution on these three variables.
  • Using the observe form, the query then adjusts the prior distribution based on observations in data.
  • Samples from this conditioned distribution (i.e. the posterior distribution) can be obtained by running the query under one of Anglican’s inference algorithms.

A probabilistic program (or query) mostly runs deterministic code, aside from the special forms sample and observe at which normal deterministic execution is interrupted and Anglican programs must allow the inference algorithm to step in, recording information and affecting control flow. We refer to these points in the execution as checkpoints – handling of checkpoints can be implemented through coroutines, parallel execution, as well as explicit maintenance of program continuations. Anglican uses explicit maintenance of program continuations.

4. Probability of a Program Execution

Anglican programs define probability distributions over sequences of values, implicitly through program execution. A good way to understand this is to imagine the following interpreter of Anglican programs:

Starting from a fixed initial state, the interpreter runs the deterministic parts of a program according to the standard semantics, executes the sample form by generating a random sample, and treats the observe form by skipping.

More importantly, the interpreter keeps a log that records information about all the sample and observe forms encountered during execution.

  • The information recorded for sample is a triple (F, x, α) consisting of:
    • A primitive probability distribution F, for which we have the probability density pF.
    • A value x sampled from the distribution F.
    • A label α that uniquely and systematically identifies the random choice made.
  • The information recorded for observe is a pair (G, y) consisting of:
    • A primitive probability distribution G, just like F.
    • An observed value y. Thus the log is a sequence of triples (F, x, α) and pairs (G, y).

An important property of logs is that they are determined by their projection to triples: when two logs project to the same sequence of triples, they must be the same. This is because the triples in a log contain all the information about random choices made during execution and all the non-sample forms in Anglican programs are deterministic.

We define a trace x to be a sequence of triples (F, x, α), and say that x is feasible if the trace is precisely the triple part of the log of some execution. A feasible trace uniquely determines the rest of the execution and its log. In particular, it decides the pair part of the log i.e. the sequence of (G, y) for observed values. We define the image y of a trace x to be the sequence of (G, y).

A probabilistic program defines a probability distribution over finite feasible traces x .

5. Syntax

5.3 Special Forms

Anglican provides special forms for probabilistic inference:

(sample distribution)

(observe distribution value)

A good way to understand the sample and observe forms is to consider the imaginary interpreter previously described. Under this interpreter, the idealised semantics are:

  • The sample form draws a value x from its parameter distribution F and generates a unique label α for this random draw. Then it appends (F, x, α), the record of this random draw, to the log of the current execution.
  • For the observe form, the interpreter just appends its two parameters, the distribution G and observed value v, to the log of the current execution, and returns nil.

The actual computation involved in handling sample and observe depends on the inference algorithm. Different inference algorithms may treat sample and observe differently, as long as they compute the same distribution on traces (sequences of sampled values).

The forms sample and observe may appear anywhere in the code of an Anglican program. To support these forms, the inference engine must be able to intervene into execution of the program, perform computations related to inference, and control further execution of the program.

Last updated on 13 Nov 2020
Published on 13 Nov 2020