Implementing HMM Simulation and Inference (using PMMH)

Simulating Data

We aim to model a HMM and use it to generate data by simulating a sample path.

First, for simplicity we consider our latent states xi to be integers, and the corresponding observed states yi to also be integers.

data ObservedState = Obs { obs :: Int } deriving Show
data LatentState   = Lat { lat :: Int } deriving Show

We then choose the parameters of the HMM (i.e. those which parameterise the transition and observation functions) to consist of two doubles - one parameterising each function.

data Params = Params {
   transition_p  :: Double
   observation_p :: Double
}

Next we define the observation function and transition function. They can be defined in many different ways - there is no typical choice of algorithm.

Here we choose the observation function to sample from a binomial distribution B(n, p) using the latent state as n and the observation parameter as p.

observationModel :: MonadSample m => Params -> LatentState -> m ObservedState
observationModel (Params _ observation_p) (Lat l) =
   fmap Obs (binomial l observation_p)

We choose the transition function to first compute the value dL by sampling from a bernoulli distribution Ber(p) using the transition parameter as p. The new latent state is then computed by adding dL to the current latent state l.

transitionModel :: MonadSample m => Params -> LatentState -> m LatentState
transitionModel (Params transition_p _) (Lat l) =
  do
    dI <- fmap boolToInt $ bernoulli transition_p
    let l' = l + dL
    return (Lat l')

Finally, we define how a step in the HMM is simulated. A step consists of transitioning the current latent state to a new latent state, and then using this to acquire the corresponding new observed state. By using the MonadState constraint with the type [ObservedState] as the state, we can also accumulate all the observed states along the sample path.

simulateStep :: (MonadSample m, MonadState [ObservedState] m)
  => Params
  -> LatentState
  -> m LatentState
simulateStep params latentState = do
  l' <- transitionModel params latentState
  o  <- observationModel params l'
  modify (++ [o])
  return l'

Simulating n steps in a HMM means that we simulate n state transitions and observations. This requires that we give an initial latent state and a set of parameters for the HMM.

repeatFunction :: Monad m => Int -> (a -> m a) -> (a -> m a)
repeatFunction n f = foldl (>=>) return (replicate n f)


simulateModelNSteps :: MonadSample m => LatentState -> Params -> Int -> m [ObservedState]
simulateModelNSteps initialState params nsteps =
  execStateT (repeatFunction nsteps (simulateStep params) initialState) []

Executing this to generate data simply consists of calling sampleIO on simulateModelNSteps.

generateData :: LatentState -> Params -> Int -> IO [ObservedState]
generateData initialState params nsteps =
  sampleIO $ simulateModelNSteps initialState params nsteps

Inference

We aim to use PMMH as an inference method in order to learn the model parameters of a HMM.

First, we create a function scoreModel which is similar to simulateStep - but rather than simulating and accumulating a list of observed states, we instead “score” which updates the probability of the sample path we’re on. We intend for this function to return a distribution over all possible sample paths the program takes, parameterised by the parameters we provide.

scoreModel :: MonadInfer m
   => [ObservedState]
   -> LatentState
   -> Params
   -> m Params
scoreModel observedStates latentState params
   = do
       let observe n p y = score (binomialPdf n p y)
           go []     x = return x
           go (y:ys) x = do x' <- transitionModel params x
                            observe x' ((fromIntegral $ getLat x') * (observation_p params)) y
                            go ys x'

       (go $! (map getObs observedStates)) $! latentState

       return params

As input, we are given some model parameters for the HMM, an initial latent state, and a data set consisting of a sequence of observed states.
We transition the current latent state x to a new latent state x'.
Then we use score to update the probability of the sample path we’re on. This multiplies the current probability by a specified factor - we choose this factor to be determined by the binomial probability density function, binomialPdf. Whereas the function binomial lets us draw a random sample from its distribution, the function binomialPdf tells us how likely we are to draw a certain sample given some certain parameters/variables.

If our new latent state is x' and the observed state is y, we score our sample path by a factor of “how likely we are to observe y given the latent state x'”, where the binomial pdf parameter p is given by some arbitrary computation (fromIntegral $ getLat x') * (observation_p params). Why this specific choice of factor is used is simply incidental to how the HMM works.
We repeat this process for all of the observed data, transitioning the current latent state and scoring based on the new latent state and its corresponding observed state.
At the end, the function produces all the sample paths executed during the program, along with their associated probabilities, and the parameters we used which produced all these sample paths. Hence, this returns a distribution over all sample paths which is parameterised by the parameters we provided.

Next, we use PMMH as an inference method to learn the parameters of a HMM. This should return a distribution over all the possible parameters of our HMM.

The monad-bayes function pmmh is given below. Typically, the type variables b and a are both Params.

-- | Particle Marginal Metropolis-Hastings sampling.
pmmh ::  MonadInfer m =>
  -- | number of Metropolis-Hastings steps
  Int ->
  -- | number of time steps (should be same as number of data points we're processing)
  Int ->
  -- | number of particles
  Int ->
  -- | model parameters prior
  Traced m b ->
  -- | model
  (b -> Sequential (Population m) a) ->
  -- | [[(model parameters, the likelihood of the particular parameter values)]]
  m [[(a, Log Double)]]
pmmh t k n param model =

To perform inference, we elaborate on the parameters we need to provide pmmh.

The first three parameters are simply integers, specific to how PMMH works (explained later on in inferModel).

The type Traced m b is the prior distribution over our model’s parameters, where b represents parameters. We can declare this like so:

paramsPrior :: MonadSample m => m Params
paramsPrior = do
  p_transition  <- Control.Monad.Bayes.Class.gamma 2 1
  p_observation <- Control.Monad.Bayes.Class.beta 2.0 7.0
  return (Params p_transition p_observation)

The type (b -> Sequential (Population m) a) is a function which takes some parameters b, and returns a model which is a distribution over all sample paths using those parameters. For this, we use our previous function scoreModel :: MonadInfer m => [ObservedState] -> LatentState -> Params -> m Params and partially apply it.
```
initialLatState :: LatentState
initialLatState = LatentState 0

scoreModel obsData initialLatState :: MonadInfer m => Params -> m Params
```

The return type m [[(a, Log Double)]] represents a distribution over all possible parameters for our model. The type a represents parameters and Log Double represents the corresponding likelihood of those parameters for our model. More precisely, it returns the history of all the particles and their associated probabilities/weightings for each step during Metropolis-Hastings.

Finally, we can put all this together to create a function inferModel which uses pmmh on our HMM.

inferModel  :: Int -> Int -> IO [[(Params, Numeric.Log.Log Double)]]
inferModel n_mhsteps n_timesteps n_particles = do
    ys <- parseFromFile dataParser "data/datafile"
    case ys of
      (Left _)    -> error "naughty"
      (Right obsData) -> sampleIO $ do
              pmmhRes <- prior $ pmmh n_mhsteps n_timesteps n_particles  paramsPrior
                                      (scoreModel obsData initialLatState)
              return pmmhRes

Running pmmh on our HMM returns a value of type:
```
pmmh n_mhsteps n_timesteps n_particles paramsPrior (scoreModel obsData initialLatState)
   :: MonadInfer m => m [[(Params, Log Double)]]
```
The number of metropolis-hastings steps is how long we run the Markov chain for, which is as long as you want. Running it for longer just gives you a better estimate, tending towards the true value. Similarly, the number of particles is the population size we want to use in our particle filter - as the particle filter is a means of estimating the likelihood, increasing the number of particles increases the likelihood estimate accuracy, so the number of particles can be anything we choose. The number of time steps is the amount of particle filter steps to run during each call to the particle filter - this is the same as the amount of data points we provide it in our datafile. Each data point in this file corresponds to the true values of each time step, which the particle filter will weight the simulated particles against.

Applying the monad-bayes function prior to an inference representation will execute the program (with runStateT) and return us a value from the distribution over values of type a. (This could be imagined as turning the inference representation into a prior distribution from which we can sample parameters from.)
```
-- | Compute the sample and discard the weight.
-- This operation introduces bias.
prior :: Functor m => Weighted m a -> m a
prior = fmap fst . runWeighted
```
(Applying the monad-bayes function runWeighted to an inference representation will unwrap its Weighted newtype constructor and execute the program (with runStateT) using the prior distribution, while accumulating likelihood. This returns a value from a distribution and its likelihood.)
```
-- | Obtain an explicit value of the likelihood for a given value.
runWeighted :: Weighted m a -> m (a, Log Double)
runWeighted (Weighted m) = runStateT m 1
```
In this context however, using prior on pmmh will return us parameters in the form of the history of all the particles and their associated probabilities/weightings for each step during Metropolis-Hastings (for some confusing reason).
```
prior $ pmmh nsteps 14 nparticles paramsPrior (scoreModel obsData initialLatState)
   :: MonadSample m => m [[(Params, Log Double)]]
```

PMMH Explanation

PMMH uses Metropolis-Hastings (an iteration of generating candidate parameters for proposal distributions and accepting or rejecting), but rather than using the likelihood function to compute the likelihood P(D | θ) for the accept/reject step:

P(acceptance) = (P(θt+1) * P(D | θt+1)) / (P(θt) * P(D | θt))

we instead use a particle filter to estimate the likelihood of the data given the parameter generated by the proposal. The particle filter uses the model that we’re trying to learn parameters for, by running it during the particle filter process. As the particle filter is simulated, every step that the particles are transitioned, they are compared to the next corresponding data point (observed state) of the HMM. The particles represent a set of possible latent states, and each transition of these particles represents a transition of latent state in the HMM - the likelihood/weightings of the particles are changed appropriately to how well they relate to the current step’s corresponding observed state (provided beforehand as a data set/list of observed states).

Last updated on 13 Nov 2020
Published on 13 Nov 2020