Lightweight Implementations of Probabilistic Programming Languages
This describes a general method of transforming arbitrary programming languages into probabilistic programming languages with straight-forward MCMC inference engines. Random choices in the program are “named” with information about their position in an execution trace; these names are used in conjunction with a database holding values of random variables to implement MCMC inference in the space of execution traces.
Probabilistic programming languages simplify the development of probabilistic models by allowing programmers to specify a stochastic (random) process using syntax resembling modern programming languages. The resulting programs define prior distributions: running the program forward many times results in a distribution over execution traces, with each trace generating a sample of data from the prior. The goal of inference in such programs is to reason about the posterior distribution over execution traces conditioned on a particular program output (i.e. determine the posterior distribution over execution traces given a program output).
This paper presents a general technique for turning any programming language into a probabilistic programming language with an accompanying MCMC inference engine. The key idea is to give each random choice of a fixed program a unique “name” that depends on its position in a given execution trace. We then convert stochastic functions into deterministic ones which use these names to look up their return values in a database that stores the values of all random choices. By controlling the values in this database, we can control execution traces of the program, which allows us to construct the key operations needed for the metropolis-hastings algorithm.
Different techniques for naming random choices result in different MCMC dynamics. This paper chooses to name random choices based on their structural position in the execution trace - this allows fine-grainedsharing of random choice values between subsequent MCMC states. The act of naming information can be provided by a side computation that is constructed by a source-to-source transformation at compile time: the original probabilistic program can be augmented with naming information to create a new program that makes the names available. The resulting new program can be executed at full speed, with only minimal overhead to track names and control the database.
2. Overview Of The Method
We define an unconditioned probabilistic program to be a parameterless function f
with a mix of stochastic and deterministic elements. The function must be self-contained with no external side-effects which would impact the execution of the function from one run to another.
The stochastic elements of f
must come from a set of known, fixed elementary random primitives, or ERPs. These may be functions such as rand
(sample uniformly from [0,1]) or randn
(sample from a standard normal distribution). Each ERP type t
is a parametric family of distributions p(x|θt), i.e. a collection of probability density functions on Rn indexed by the parameter space Θt.
As the function f
is executed, it encounters a series of ERPs. For example:
for i = 1:1000
if (rand > 0.5)
X(i) = randn
else
X(i) = gammarnd
end
end
Here there are three syntactic ERPs: rand
, randn
, and gammarnd
. During execution, depending on the return value of each call to rand
, different paths will be taken through the program, and different ERPs will be encountered. We call this path an execution trace. A total of 2000 random choices will be made when executing this f
.
Let f_k | x_1 .. x_{k - 1}
be the k’th ERP encountered while executing f
, and let x_k
be the value it returns. Note that the parameters passed to the k’th ERP may change depending on the previous x_k
’s. We denote by x
all of the random choices which are made by f
, meaning that f
defines the probabilistic distribution p(x)
. The probability of x
is therefore the product of the probability of all the ERP choices made:
p(x) = {K}∏{k=1} p(x_k | θ__tk, x__1, .., x__{k - 1})
To simplify notation,
- By
f_k
, we meanf_k | x_1 .. x_{k - 1}
- By
p(x_k | θ_tk)
, we meanp(x_k | θ_tk, x_1, .. , x_{k - 1})
Generative functions as described above are easy to write.
A harder problem (and the goal of this paper) is to reason about the posterior conditional distribution p(x\c | x_c)
, where x_c
is a subset of random choices which we condition on, and x\c
represents the remaining random choices. For example, we may condition our example program on the values of the X(i)
’s, and reason about the sequence of rand
’s most likely to generate the X(i)
’s.
This paper proposes an approach to inference based on the following intuition: consider repeatedly running an unconditioned function f
– every time an f_k
is encountered (i.e. the k’th ERP encountered while running f
), the program samples an x_k
(the value the k’th ERP returns) and continues execution. Importantly, if two runs of f
sample exactly the same values, their execution traces will be the same.
This suggests the following procedure for controlling the execution of f:
-
- Give each
f_k
(kth ERP encountered) in every possiblie execution trace a “name”. This name does not have to be unique, and can depend on previousx_k
’s or other information about the program state.
- Give each
-
- Rewrite the source code of
f
to generatef'
, replacing random functionsf_k
with deterministic functionsf'_k_
. When encountered in the execution trace, these functionsf'_k
deterministically use their name to look up a current valuex_k
in a database and return it (if no value exists, they samplex_k
from the appropriate ERP parametric family of distributions p(.|θ__tk), storex_k
in the data base, and then return it). Behind the scenes, these functions also accummulate the likelihood of each random choice.
- Rewrite the source code of
Thus, the execution trace of f'
can be controlled by manipulating the values in the database. This makes MCMC inference possible by enabling proposals, scoring, and the ability to accept or reject proposals. We make a proposal to an execution trace by picking an x_k
in the database, modifying it according to a proposal distribution, and then re-executing f'
and computing the likelihood of the new trace.
2.1 MCMC in Trace Space
This describes the MCMC algorithm used. Here we assume that some naming scheme has been defined, and that a mechanism for computing those names has been implemented.
Let a database D be defined as a mapping N → T × X × L × θ where:
- N is the name of a random choice
- T is its ERP type
- X is the random value
- L is this random value’s likelihood
- θ is the ERP parameters
We can control the execution of f'
by controlling the values in the database D
. When f'
encounters an ERP f'_k
, it computes its name n ϵ N
, its parameters θ_c
(meaning current parameters), and its current type t_c ϵ T
. It then looks up (t, x, l, θ_db) = D(n)
. If a value is found and the types match, we set the return value of f_k
to be x
. Otherwise, the value x
is sampled from the appropriate ERP, its likelihood is computed, and the corresponding entry in the database is updated. We accumulate the log-probability of new randomness ll_fresh
and the log-probability of randomness that is no longer used ll_stale
.