Inlining Monad Bayes

Original benchmark, using a data file containing 500 data points:

scoreModel ::
   MonadInfer m
   => LatentState
   -> [ObservedState]
   -> Params
   -> m Params
scoreModel latentState observedStates params
   = do  let observe n p y = score (binomialPdf n p y)

             go []     x = return x
             go (y:ys) x = do x' <- transitionModel params x
                              observe (getLat x') ((fromIntegral $ getLat x') * (observation_p params)) y
                              go ys x'

         (go $! (map getObs observedStates)) $! latentState

         return params

inferModel :: Int -> Int -> Int -> IO [[(Params, Numeric.Log.Log Double)]]
inferModel n_mhsteps n_timesteps n_particles = do
    ys <- parseFromFile dataParser "data/datafile"
    case ys of
      (Left _)    -> error "naughty"
      (Right obsData) -> sampleIO $ do
              pmmhRes <- prior $ pmmh n_mhsteps n_timesteps n_particles paramsPrior (scoreModel initLatentState (map Obs obsData))
              return pmmhRes

runPmmh :: (Int, Int, Int) -> IO ()
runPmmh (n_mhsteps, n_timesteps, n_particles) = do
  particleWeightings <- inferModel n_mhsteps n_timesteps n_particles
  print particleWeightings

main = defaultMain [
  bgroup "runPmmh" [ bench "(100, 100, 100)"  $ whnfIO $ runPmmh (100, 100, 100)
                   ]
  ]

Original profiling report for PMMH inference on a HMM:
Analysing the dumped core of Population.hs and Weighted.hs of monad-bayes:

It turns out that we should expect to see these definitions, however if we use optimization:2, then we should also expect them to not actually be used. This is because GHC generates a lot of garbage that it doesn’t seem to want to remove, but when we go to places where they would have been used, we should be able to find that they aren’t actually there.
Here is scoreModel from PmmhHmm.hs of probabilistic-programming.

We can see that there aren’t any of the abstractions in there; this core looks pretty good, however logProbability is a numerical computation which could be inlined.
The hope is that we won’t find any >>=s if the optimiser has successfully removed them from the function. If we look at the pmmh function from PMMH.hs of monad-bayes:

This function itself won’t be able to optimise because it needs a concrete instance of MonadInfer, but this isn’t necessarily a problem. What we need to look at is when we use this function with something concrete. So let’s look at inferModel from PmmhHmm.hs of probabilistic-programming which uses pmmh; the MonadInfer m concretises to IO because we called sampleIO of monad-bayes on it.

The hope is that the call to pmmh (represented by the green-highlighted lvl) would have been inlined - however this is not the case, as it simply references pmmh from the monad-bayes library, which has not been marked as INLINE or INLINABLE. This means GHC isn’t specialising it, so the program still performs dictionary lookups when that code is compiled.
The monad-bayes library from hackage was removed as a build dependency, and the local clone of monad-bayes is now used as a dependency instead.
The INLINE pragma was added to pmmh of monad-bayes.

Revisiting the inferModel function, we can now see that the call to pmmh (previously represented by lvl) has been inlined with what is now called $w$spmmh.
- $w is usually prepended to functions that the compiler has unwrapped some of the arguments for (which eliminates a pattern match).
- $s is usually prepended to functions that the compiler has specialised.
Let’s now visit the new version of pmmh, called $w$spmmh.

This immediately looks a lot better than before, as it has been specialised and has had its arguments unwrapped. There still exists some possible optimisations. When seeing occurrences of calls such as $s$fMonadStateT_$c>>=, we can see it’s being passed to mhTrans - this is concerning and suggests that mhTrans along with other functions in similar situations such as pushEvidence and hoist can be inlined.

Placing an INLINE pragma on these functions results in the following core, with the previous calls being inlined:

Additionally, the $w$c>>= (from Bayes.Traced.Static) at the top is suspicious - it looks like one of the arguments is a monadic value, and so it’s forced to use a >>= because it doesn’t know what the monadic value is.
Let’s then inline the definition of >>= for the Traced monad inside Bayes.Traced.Static.

This successfully inlines the call to >>= inside inferModel.
There are two references to applyWeight of Weighted.hs and a reference to mh of Traced.Static inside inferModel, so let’s inline these. The profiling report now looks like:

Added inlines to prior, extractWeight, flatten & applyWeight of Bayes.Weighted.

prior :: Functor m => Weighted m a -> m a
prior = fmap fst . runWeighted

{-# INLINABLE prior #-}

extractWeight :: Functor m => Weighted m a -> m (Log Double)
extractWeight = fmap snd . runWeighted

{-# INLINABLE extractWeight #-}

flatten :: Monad m => Weighted (Weighted m) a -> Weighted m a
flatten m = withWeight $ (\((x, p), q) -> (x, p * q)) <$> runWeighted (runWeighted m)

{-# INLINABLE flatten #-}

applyWeight :: MonadCond m => Weighted m a -> m a
applyWeight m = do
  (x, w) <- runWeighted m
  factor w
  return x

{-# INLINABLE applyWeight #-}

Added inlines to resampleGeneric, resampleSystematic, runWeighted, sir.

resampleGeneric ::
  MonadSample m =>
  (V.Vector Double -> m [Int]) ->
  Population m a ->
  Population m a
resampleGeneric resampler m = fromWeightedList $ do
  pop <- runPopulation m
  let (xs, ps) = unzip pop
  let n = length xs
  let z = sum ps
  if z > 0
    then do
      let weights = V.fromList (map (exp . ln . (/ z)) ps)
      ancestors <- resampler weights
      let xvec = V.fromList xs
      let offsprings = map (xvec V.!) ancestors
      return $ map (,z / fromIntegral n) offsprings
    else
      return pop

{-# INLINABLE resampleGeneric #-}

resampleSystematic ::
  (MonadSample m) =>
  Population m a ->
  Population m a
resampleSystematic = resampleGeneric (\ps -> (`systematic` ps) <$> random)

{-# INLINABLE resampleSystematic #-}

sir ::
  Monad m =>
  (forall x. Population m x -> Population m x) ->
  Int ->
  Int ->
  Sequential (Population m) a ->
  Population m a
sir resampler k n = sis resampler k . Seq.hoistFirst (spawn n >>)

{-# INLINABLE sir #-}

runWeighted :: (Functor m) => Weighted m a -> m (a, Log Double)
runWeighted (Weighted m) = runStateT m 1

{-# INLINABLE runWeighted #-}

The profiling report now looks like:

Looking at scoreModel, it makes a reference call to >> of the Control.Monad.Coroutine module.

Therefore, we locally cloned this module and added an INLINABLE pragma to the definition of >>.

This results in the run-time of our benchmark program dropping further:
scoreModel also makes a reference call to >>= of the Control.Monad.Coroutine module.

Adding an INLINABLE pragma to >>= as well, results in the following benchmarks:

The profiling report now looks like:
Added bang-patterns to Params datatype of PmmhHmm.hs. Added INLINABLE to await of Control.Monad.Coroutine.SuspensionFunctors.
On analysis of part of the core in inferModel, the Trace data type of Bayes.Traced.Common is used a lot.

Unpacking and adding bang-patterns to some of the parameters results in a better looking core but a (very slightly) slower run-time.
```
data Trace a
  = Trace
      { variables :: [Double],
        output    :: !a,
        density   :: {-# UNPACK #-} !(Log Double)
      }
```
Unpacking and adding bang-patterns to all parameters of the Trace datatype results in both a better looking core and a faster run-time. (Only strict, single-constructor types can be unpacked).
```
data Trace a
  = Trace
      { variables :: !([Double]),
        output    :: !a,
        density   :: {-# UNPACK #-} !(Log Double)
      }
```
On analysis of part of the core in inferModel, there are two reference calls to collapse of Bayes.Population.

To try and get rid of this, INLINABLE pragmas were added to runPopulation, explicitPopulation, extractEvidence, proper, evidence, and collapse of Bayes.Population.

Also, the arguments of the Traced datatype from Bayes.Traced.Static were made strict.
```
data Traced m a
  = Traced
      { model     :: !(Weighted (FreeSampler m) a),
        traceDist :: !(m (Trace a))
      }
```
This made things slower, so these changes were undone.

Last updated on 13 Nov 2020
Published on 13 Nov 2020