SMC and PMMH
Particle Filter
Sequential Monte Carlo (SMC) methods, also known as Particle Filters, are a set of simulation-based methods (to artificially generate data to test a hypothesis) which provide an approach to computing the posterior distribution. The objective is to compute the posterior distributions of the states of some hidden Markov Model process, where the system consists of both hidden and observable variables.
The observable variables (observation process) are related to the hidden variables (state process) by some functional form that is known. The dynamical system describing the evolution of the state variables is also known probabilistically.
A generic particle filter estimates the posterior distribution of the hidden states using the observation measurement process.
Consider the state-space shown below of a Hidden Markov Model.
The filtering problem is to estimate sequentially the values of the hidden states Xk, given the values of the observation process Y0, …, Yk, at any time step k. All Bayesian estimates of Xk follow from the posterior density p(xk|y0,..,yk). The particle filter methodology provides an approximation of these conditional probabilities. (In contrast, MCMC or importance sampling approach would model the full posterior p(x0,..,xk|y0,..,yk).
Particle filters update their prediction in an approximate statistical manner. The samples (generated) from the distribution are represented by a set of particles; each particle has a likelihood weight assigned to it which represents the probability of that particle being sampled from the probability density function.
Particle Filter Explained
The situation is as such:
- There is something we want to know
- We can measure something, related to what we want to know
- We know something about the relationship between the measurements and what we want to know
For example, consider the flight of an aircraft:
- We can measure the elevation relative to the sea
- We can measure the distance to the ground
- We have the map
But we don’t know where we are on the map.
We will use a particle filter to determine the position of our aircraft.
First, we generate a set of hypotheses; these are our samples/particles generated from a (in this case, uniform) random distribution.
- We have measured the distance to the ground (with some uncertainty)
- We know how high we are flying (with some uncertainty from the measurement device)
- We have the map
For every particle, we evaluate its likelihood of being the current position of our aircraft on the map, by using all the given information of our known variables: its measured flight elevation, and its distance from the ground. For example, the selected particle below fits what is measured quite well.
To record how likely we find every particle, we assign a weight to it corresponding to its likelihood.
As some particles appear to be much more likely than others, we decide that the unlikely particles are redundant. Therefore we do a process called resampling, which is needed to avoid depletion among particles - otherwise we may end up with only unlikely particles. This generates a set of new particles (the same number as before), but based on the existing particles with their assigned weights. All the re-generated particles are given equal weights.
The aircraft moves on, so we will move our particles as well. Assuming we know the direction of the aircraft, we have an idea of how fast the aircraft can move. For each particle, we pick a random number within the likely range of the speed of our aircraft, and then move each particle with this corresponding speed to the right. This step also involves a minor update of the particle weights depending on how likely the randomly picked speed was.
Repeating the particle filter steps
Now we repeat the previous process. We update the weights (likelihood) of our particles using the new measurements of our known variables - the distance to the ground and measured flight elevation.
Again, we can see that some particles are observed to be more likely than others. We hence use resampling based on this.
The aircraft then moves on - so we propagate our particles in time using randomly selected speeds for each particle like before. Again, we also perform a minor update of the weights depending on the likelihood of the speed selected for each particle.
With our time-propagated particles, we now update the weights using our new measurements of our known variables. Then we resample. Then we propagate. And so on.
Particle MCMC
PMCMC methods are essentially MCMC algorithms in which SMC acts as a MH proposal (where MH is an MCMC method).
Particle MCMC (PMCMC) methods are Bayesian approximate inference techniques to estimate posterior distributions over the parameters. PMCMC methods are normal MCMC algorithms that use particle filters to estimate the likelihood of the data given the parameter generated by the proposal (this is as opposed to normal Metropolis Hastings which just uses the likelihood function to compute the likelihood, if it is tractable). We use Metropolis Hastings to sample from some non-normalised density g(θ). This non-normalised density in Bayesian contexts is the prior times the likelihood - it is the likelihood that is estimated using the particle filter - this algorithm is known as Particle Marginal Metropolis Hastings (there is some difference between Particle-Independent Metropolis Hastings and PMMH but I don’t know).
Some more elaboration:
PMMH and Particle filters (to the extent of my knowledge) are used for Markov Models. PMMH uses Metropolis-Hastings (an iteration of generating candidate parameters for proposal distributions and accepting or rejecting), but rather than using the likelihood function to compute the likelihood P(D | θ) for the accept/reject step,
P(acceptance) = (P(θt+1) * P(D | θt+1)) / (P(θt) * P(D | θt))
we instead use a particle filter to estimate the likelihood of the data given the parameter generated by the proposal. The particle filter uses the model that we’re trying to learn parameters for, by running it during the particle filter process. As the particle filter is simulated, every step that the particles are transitioned, they are compared to the next corresponding data point (observed state) of the HMM. The particles represent a set of possible latent states, and each transition of these particles represents a transition of latent state in the HMM - the likelihood/weightings of the particles are changed appropriately to how well they relate to the current step’s corresponding observed state (provided beforehand as a data set/list of observed states).