Probabilistic Effects. λθ

Probabilistic Language Design

Relevant Existing Probabilistic Programming Libraries/Languages

  • Hakuru

  • Hakaru10

  • Hansei

  • monad-bayes

  • webppl

  • probability

  • Anglican

  • PyMC

Probabilistic Language Design

Usability of Probabilistic Programming Languages

Embedded Domain-Specific Languages for Probabilistic Programming - Oleg Kiselyov’s Page

WebPPL

The Design and Implementation of Probabilistic Programming Languages (Learning Resource) - Noah D. Goodman and Andreas Stuhlmüller

Anglican

MonadBayes

Hansei (Graphical Models)

Hakaru10 (Graphical Models)

Hakaru

probability

General Probabilistic Programming/Inference Implementation Techniques

Inference Metaprogramming

Embedded Probabilistic Programming (Hansei, OCaml) - Oleg Kiselyov, IFIP 2009

Two general techniques for implementing a DSL with less overhead are the finally-tagless (shallow) embedding of object programs, and the direct-style representation of side effects. They use these techniques to build a DSL for probabilistic programming, embedded as an ordinary OCaml library, and represent probability distributions as ordinary OCaml programs. They use delimited continuations to reify probabilistic programs as lazy search trees, which inference algorithms may traverse without imposing overhead on deterministic parts of a model. Inference algorithms can be easily embedded in probabilistic programs themselves.

They aim to build a DSL for discrete probability distributions. A fundamental and popular way to represent a probability distribution is as a graphical model (a Bayesian network) whose nodes represent random variables and edges indicate conditional dependencies. However graphical models do not scale well to real problems involving thousands or more of random variables and dependencies.

It helps to embed a language of probability distributions in a host language such as Haskell - the embedded language is called a toolkit for probabilistic reasoning, because it is structured as a library of data types of graphical models and random variables along with inference procedures that operate on them. Similarly, distributions can be represented and reasoned about in a logic using formulas that talk about probabilities. However, a toolkit defines its own data types; e.g. random integers are distinct from regular integers, and cannot be added using the addition operation of the host language.

The main drawback of a standalone probabilistic language is that it can’t rely on the infrastructure of an existing host language. In contrast an embedded probabilistic language can piggyback on its host language’s compilers to remove some interpretive overhead.

Their Approach: A Very Shallow Embedding

They combine the advantages of embedded and standalone probabilistic languages by embedding a language of probabilistic distributions into a general-purpose language in a very shallow way. That is, probabilistic functions are embedded as host-language functions, calls to them are embedded as host function calls, and random integers can be added using the addition operation of the host language.

Contributions

The paper describes their design of a practical DSL for probabilistic programming, by a novel combination of two existing techniques for eliminating notational and interpretive overhead in embedded languages, using OCaml:

  1. The direct-style representation of side effects using delimited control operators lets us reify a probabilistic program as a weighted search tree (in the form of a search-tree monad). They also use continuation-passing-style via the Cont monad to eliminate run-time and notational overhead.
  2. The finally-tagless embedding of object programs using combinator functions allows multiple inference algorithms to operate on the same model without the run-time overhead of tagging values with their types. They apply these general techniques to probabilistic programming and demonstrate how they make embedded DSL programs faster to run and easier to write.

Then, they show how to implement interpreters for our language that perform exact and approximate inference efficiently. Specifically:

  1. They implement variable elimination, an existing algorithm for exact inference.
  2. They develop importance sampling, a general algorithm for approximate inference.
  3. They support lazy evaluation in probabilistic programs.
  4. Users of the library are free to implement their own inference algorithms.
Probabilistic Programming Language and its Incremental Evaluation (Hakaru10, Haskell) - Oleg Kiselyov, APLAS 2016

This introduces Hakaru10 for expressing and performing inference on general graphical models. It supports discrete and continuous distributions, mixture distributions, and conditioning. This is a DSL embedded in Haskell and supports MCMC inference.

It is designed to address two main challenges of probabilistic programming: performance and correctness. In the presence of conditional branches, Hakaru10 solves the non-trivial problems of maintaining dependencies and correctly computing the acceptance ratio. Hakaru10 is typed, so that its type system prevents meaningless conditioning, enforcing that the values to condition upon must come from outside the model.

The reasoning we wish to perform on the model is finding out the probability distribution of some of its random variables. Often we are interested in the distribution conditioned on the fact that some random variables have been observed to be a certain value. We are thus inferring the likelihood of an unobservable variable from observed variables.

The nature of this research (and much of Oleg’s) is driven by two main challenges: performance, and correctness.

  1. An expressive, well abstracted probabilistic programming language may be all for nothing if doing inference with realistic models takes unreasonable time. For example, the probability monad (which adds weights to the List monad) is straightforward and easy to understand, but is disastrously inefficient, failing for even toy problems. Hence it is common in machine learning to tailor the model to a specific inference method and tune the inference code for a specific model.
  2. Correctness is a challenge that may come as surprising. Despite the long history of probabilistic programming, problems are still being found in published work, and even widely used systems such as STAN turn out to give plain wrong answers.

Related Work Wingate et al. deserve special mention for proposing a technique of adding probabilistic programming facilities to just about any language.

Contributions:

  1. It presents the probabilistic programming language Hakaru10 embedded as a DSL in Haskell
  2. It describes the design of Hakaru10, specifically its type system which ensures both that a model is well-typed, and that it is also well-conditioned. That is, the values used for conditioning really come from the sources external to the model, rather than being produced from random sources and computations within the model.
  3. It describes the implementation of the Metropolis-Hastings over the graphical model.
  4. It presents the method to improve the efficiency of Metropolis-Hastings by avoiding redundant re-computations. The idea is that, upon resampling, recompute only those parts of the model that depend on the changed value - this is simple, but the challenge is to minimize the overhead of determining the dependencies and their order. The challenge is found mostly in the presence of conditional branching.
Functional Programming for Modular Bayesian Inference (Monad Bayes, Haskell) - Adam Scibior, ICFP 2018

They present an architectural design of a library for Bayesian modelling and inference in modern functional programming languages. The novel aspect of their approach are modular implementations of existing inference algorithms. Their design relies on three inherently functional features:

  1. Higher-order functions
  2. Inductive data-types
  3. Support for type-classes or an expressive module system

In this architecture, they use the following core abstractions:

  1. Inference representations
  2. Inference transformations
  3. Inference representation transformers

They then implement concrete instances of these abstractions for particle filters and Metropolis-Hastings samplers, which form the basic building blocks of their library. By composing these building blocks, they obtain state-of-the-art inference algorithms.

Existing Problems They Engage With

  • Approximate Bayesian inference deals with approximations to the model evidence and posterior distribution. Traditionally, such inference is done by hand: the descriptions of the prior and likelihood and constructed manually, the equations of the inference algorithm are derived with pen and paper, and implemented as a module of the intended application.

In an attempt to partially automate this process, many libraries automatically derive selected inference algorithms from a suitable intermediate representation of the model. For example, PyMC is a python module that implements Bayesian statistical models and fits them with inference algorithms such as Markov chain Monte Carlo. (These are eDSLs)

To relieve users from the burden of manually constructing the intermediate representation, probabilistic programming systems such as Stan provide a special-purpose modelling language which is human-readable, where their compiler automatically generates the required simulations that approximate the model evidence or sample predictions from the posterior distribution. (These are non-embedded DSLs). The problem with non-embedded DSLs are that, while users express their models as probabilistic programs, the expressive power of the languages they use is limited, and the inference process cannot be directly incorporated into larger applications, which resort to external, file-based communication.

  • One of the goals of probabilistic programming is to separate probabilistic programming is to separate probabilistic modelling from inference. The idea is that domain experts can specify probabilistic models according to their knowledge of the underlying process, and then an automated inference engine makes probabilistic inferences using this model.

Approach & Contributions

They describe a Haskell library constituting a probabilistic programming system that extends an existing language with probabilistic effects. It provides a monadic typeclass with probabilistic effects that can be used to construct probabilistic programs using arbitrary pure Haskell code.

Their high-level view of Bayesian inference is:

  • We have a model, containing both sampling and conditioning operations, written by the user in a probabilistic programming language.
  • We transform the model into a probabilistic program, the “sampler”, containing only sampling operations. While the model is non-executable due to the conditioning operations, the sampler is executable.
  • We can therefore run the sampler, and with some post-processing, approximate the posterior distribution of the original model.

The main novelty in the approach is decomposing the inference step (of converting the model into a sampler) into a sequence of intermediate steps; these intermediate steps consist of inference-specific transformations between inference-specific representations.

Their library provides such inference representations and inference transformers that manipulate these representations. Users can then define custom intermediate representations by composing transformers to obtain inference transformer stacks.

They express the building blocks for inference algorithms in Haskell, and distinguish three types of basic building blocks:

  1. Inference representations (data structures representing distributions), expressed as type-classes. There are three separate type-classes: the sampling representation MonadSample, the conditioning representation MonadCond, and the sampling and conditioning representation MonadInfer.
  2. Inference transformations, mappings between inference representations.
  3. Inference transformers, compositional building blocks of inference representations, expressed as monad transformers.

They then show how to compose these building blocks to obtain advanced inference algorithms.

Practical Probabilistic Programming With Monads (Adam Ścibior, Haskell 2015)
Probabilistic inference by program transformation in Hakaru (Hakaru, Haskell) - Praveen Narayanan, Chung-chieh Shan, 2016

A major challenge faced by every probabilistic programming system is that probabilistic models and inference algorithms do not compose in tandem: just because a model we’re interested in can be built naturally from two submodels does not mean a good inference algorithm for the model can be built naturally from good inference algorithms for the two submodels. Due to this challenge, many systems with good support for model composition resort to a fixed or monolithic inference algorithm and do their best to optimize it.

Hakaru demonstrates a new way to address this challenge. On one hand, Hakaru supports model composition like any other embedded monadic DSL does: on top of primitive combinators such as dirac, bind, and superpose, users can define Haskell functions to express common patterns of samplers and measures. On the other hand, because each inference building block is a transformation on this DSL, Hakaru supports inference composition like a compiler construction kit or computer algebra system does: users can define Haskell functions to express custom pathways from models to inference.

Contributions Hakaru is a proof-of-concept probabilistic programming system that allows composable reuse of discrete and continuous distributions, queries, and inference algorithms. It achieves unprecedented modularity by two means:

  1. A language of measures that represents distributions and queries, as well as inference algorithms.
  2. Semantics-preserving program transformations based on computer algebra.

The system implements two automatic and semantics-preserving program transformations:

  1. Disintegration, which computes conditional distributions and probability densities.
  2. Simplification, which subsumes exact inference by computer algebra and supports approximate inference.

All their transformations take input and produce output in the same language, so we can compose them to express inference algorithms.

Design Hakaru is an embedded DSL in Haskell in finally tagless form, so the code is parsed and type-checked by the Haskell compiler GHC. Hakaru transforms a probabilistic program to other programs in the same language that generate samples or otherwise perform inference. This major design decision contrasts with most other probabilistic programming systems, which handle a probabilistic program either by producing code in a different language, or by generating samples or otherwise performing inference directly—without staging. They use staged interpreters.

Comparisons with other embeddings

Similar to Oleg Kiselyov and Adam Scibior’s work, they embed a probabilistic language in a general-purpose functional language (respectively OCaml and Haskell). They also express and compose inference techniques as transformations that produce programs in the same language.

However, Oleg and Adam’s embeddings are “shallower”: the language defines a handful of constructs for manipulating distributions, and reuses the host languages' primitives for all other expressions. Their transformations consequently cannot inspect most of the input source code, notably deterministic computations and the RHS of binds >>=. Thus, Hakaru can compute densities and conditional distributions in the face of deterministic dependencies, and Hakaru can generate Metropolis-Hastings samplers using a variety of proposal distributions.

Workflow The workflow of a Hakaru user is that of Bayesian inference:

  1. Model the world as a prior probability distribution on what is observed and what is to be inferred. Hakaru defines a language of distributions that formalizes this modeling.
  2. Turn the prior distribution into a conditional distribution, which is a function that maps what is observed to a distribution on what is to be inferred. Hakaru provides transformations that automate this conditioning.
  3. Apply the function to what is actually observed to get the posterior distribution on what is to be inferred. Hakaru can represent the distribution as both a generation of a stream of samples, and also a term in the language.

Transformations

Disintegration computes conditional distributions and probability densities from a given distribution. Hakaru’s disintegrate transformation turns a Hakaru program of the type HMeasure (HPair a b) into a Hakaru function a (:->) HMeasure b, i.e a distribution producing pairs of values (a, b) into a function which produces values of b conditioned on a certain value of a.

Simplification subsumes exact inference by computer algebra and supports approximate inference. After applying disintegration to compute a conditional distribution, simplify performs linear-algebra-like reductions to produce a compact representation of the conditional distribution. In this sense, simplification subsumes exact inference. The simplified posterior is a Hakaru program and it can be run as a sampler.

Inference by composable program transformations

Hakaru transforms a probabilistic program to other programs in the same language that either generate samples or perform inference. This major design decision contrasts with most other probabilistic programming systems, which handle a probabilistic program either by producing code in a different language, or by generating samples or performing inference directly without staging.

Because Hakaru transformations stay in the same language, we can compose them - they use disintegration and simplification together to generate efficient densities, as well as conditional distributions. They can also compose inference techniques.

Expressing Semantic Distinctions by Types

They make crucial use of types to capture semantic distinctions in Hakaru.

  • Distinguishing Hakaru and Haskell types: The foremost distinction between Hakaru and Haskell is their type systems. They distinguish between the universe of Hakaru types, Hakaru, and the universe of Haskell types, *. There are many types in * which we don’t want to allow in Hakaru - Hakaru allows arbitrary user-defined regular recursive polynomial data types, but Haskell’s data types are far richer - we musn’t allow users to embed arbitrary Haskell data types into Hakaru.

  • Distinguishing values and distributions: Hakaru’s type system draws a hard distinction between individual values (of type a) and distributions on values (of type HMeasure a). To see why this is necessary, consider the program x = uniform 0 1; x + x - it is unclear whether the value x is drawn from the distribution and we add it to itself, or if x is the distribution itself, and then we draw two samples from this distribution and add them together.

    Hakaru distinguishes these two meanings by distinguishing between let-bindings and monadic-bindings.

    If x is monadically bound, then it has type HReal.

    sampleOnce  = uniform 0 1 >>= \x -> dirac (x + x)
    

    If x is let-bound, then it has type HMeasure HReal.

    sampleTwice = let_ x = uniform 0 1 in liftM2 (+) x x 
    
  • Distinguishing values in different domains: Hakaru provides four primitive numeric types to distinguish between various numeric domains: HNat, HInt, HProb, and HReal.

  • Distinguishing different interpretations of Hakaru: They use a finally tagless embedding of Hakaru in Haskell. The class they use to represent their syntax is called Mochastic Thus an interpretation of Mochastic is implemented as an instance; they have encoded several such interpretations - a sampler, a pretty printer, etc.

    data Prob
    data Measure a
    data Dist a
    
    class Mochastic repr where
      type Type repr a :: Constraint
      real        :: Double -> repr Double
      bool        :: Bool -> repr Bool
      add, mul    :: repr Double -> repr Double -> repr Double
      neg         :: repr Double -> repr Double
      neg         =  mul (real (-1))
      logFloat, logToLogFloat
                  :: repr Double -> repr Prob
      unbool      :: repr Bool -> repr c -> repr c
                  -> repr c
      pair        :: repr a -> repr b -> repr (a, b)
      unpair      :: repr (a, b) -> (repr a -> repr b -> repr c)
                  -> repr c
      inl         :: repr a -> repr (Either a b)
      inr         :: repr b -> repr (Either a b)
      uneither    :: repr (Either a b) -> (repr a -> repr c) -> (repr b -> repr c)
                  -> repr c
      nil         :: repr [a]
      cons        :: repr a -> repr [a] -> repr [a]
      unlist      :: repr [a] -> repr c -> (repr a -> repr [a] -> repr c)
                  -> repr c
      ret         :: repr a -> repr (Measure a)
      bind        :: repr (Measure a) -> (repr a -> repr (Measure b)) -> repr (Measure b)
      conditioned, unconditioned :: repr (Dist a) -> repr (Measure a)
      factor      :: repr Prob -> repr (Measure ())
      dirac       :: (Type repr a) => repr a -> repr (Dist a)
      categorical :: (Type repr a) => repr [(a, Prob)] -> repr (Dist a)
      bern        :: (Type repr Bool) => repr Double -> repr (Dist Bool)
      bern p      =  categorical $
                    cons (pair (bool True) (logFloat p)) $
                    cons (pair (bool False) (logFloat (add (real 1) (neg p)))) $
                    nil
      normal, uniform
                  :: repr Double -> repr Double -> repr (Dist Double)
      poisson     :: repr Double -> repr (Dist Int)
    

    They use higher-order-abstract-syntax to encode Hakaru binding.

Last updated on 13 Nov 2020
Published on 13 Nov 2020