Probabilistic Effects. λθ

Haskell Core

Understanding Haskell Core

The best way to understand why a profiling report outputs certain performance metrics (for Haskell) is to dive into the actual core itself, which is what the surface-level Haskell code compiles down into. To inspect the core in a coherent manner, we can add the following to either our package.yaml file (when using stack):

    ghc-options:
    - dump-core
    dependencies:
    - -fplugin=DumpCore

or our .cabal file (when using cabal):

build-depends: dump-core
ghc-options:   -fplugin-DumpCore

When we build our program (stack build or ghc), this creates a folder called dump-core containing html files. Opening one of these files gives us a nice visualisation of the core code of our program. The code we see is how Haskell looks in reality.

Usually, if one is confused about something in the core, it is best to compare it with the original program.

regTest :: Parser Int
regTest = newRegister_ (code 7) (\r -> modify_ r makeQ (succ @Int) [||succ @Int||]) *> (let g = get r in g *> g))

This makes a new register, which in this context would be an STRef, containing 7; it then modifies it with +1, and afterwards it gets the value out twice, and returns it the second time.

(32) lvl =
 λs1 ->
   -- Using 'newMutVar', it makes the new register containing a value 'lvl' we expect to be 7.
   let! (#, #) ipv ipv1 = newMutVar# lvl s1 in
   -- Using 'readMutVar', it then reads then 7 to get an int out in ipv1
   let! (#, #) ipv ipv1 = readMutVar# ipv1 ipv in
   -- It performs the succ function on value 7 (ipv1) and puts it back in the hole using 'writeMutVar'
   let! s2# = writeMutVar# ipv1 ($fEnumInt_$csucc ipv1) ipv in
   -- We read from the register twice, using 'readMutVar', but we only care about the second value...
   let! (#, #) ipv _ = readMutVar# ipv1 s2# in
   let! (#, #) ipv ipv1 = readMutVar# ipv1 ipv in
   -- ... which is returned the second time after reading.
   (#, #) ipv (Just ipv1)

Is there a way of seeing if what your code compiles down into is nonoptimal? Usually we will see something we’re unhappy with. In the above case, there is no chance that the core version of the program could be better. If we start seeing function applications or random case statements, we might be suspicious.

Consider the following code, which is supposed to take a sequence of alternating a’s and b’s and roll them all up:

manyTest :: Parser [Char]
manyTest = many (string "ab" $> (code 'c'))

In the core version of manyTest, one of the problems we can observe within it is that a particular function called exit is created twice:

  λs1 ->
    ...
    let
      exit =
        λo# eta ipv ->
          case ==# o# ipv of
            DEFAULT ->
              (#, #) eta Nothing
            1 ->
              let! (#, #) ipv ipv1 = readMutVar# ipv1 eta in
              (#, #) ipv (Just (ipv1 []))
      exit =
        λo# eta ipv ->
          case ==# o# ipv of
            DEFAULT ->
              (#, #) eta Nothing
            1 ->
              let! (#, #) ipv ipv1 = readMutVar# ipv1 eta in
              (#, #) ipv (Just (ipv1 []))

The question becomes “why has the compiler not worked out that these are the same functions, and duplicated it when it didn’t need to?”. In the body of the let statement of the core version, we can observe where exit is used - it tells us that the program checks if there is enough input, and if so, is that input the character ‘b’. So we expect the call of exit to be somewhere where we read characters.

in
   let ...
   in case <# ipv dt of
        DEFAULT ->
          exit o# eta ipv
        1 ->
          case indexWideCharArray# input# ipv of
            DEFAULT ->
              exit o# eta ipv
            'b' ->
              ...

These exit variables would have been defined in our original code somewhere. In this case, we can see the two different functions sat and emitLengthCheck both use the expression $$bad which represents the exit variable.

sat ... =
  ... if ... then ... else $$bad

emitLengthCheck ... =
  ... if ... then ... else $$bad

We fail in the case in which we don’t have enough input, and we fail in the case where the character doesn’t match. So we ask why these exit variables are two different values, when they could have been the same one - these calls to $$bad should really come from the same place. The question then becomes: “how do we work out where they came from, and how do we trace them back to their common source?”.

We can find that in some function evalSat, that we’ve defined a function maybeEmitCheck that defines the bad variable in a let statement, and uses it twice in its body.

evalSat .. = do
  ...
  where
    maybeEmitCheck ... =
      [|| let bad = $$(raise y)
          in $$(emitLengthCheck n (sat (genDefunc p) mk [||bad||]) [||bad||] y)
       ||]

The first usage of bad is the one that gets passed to sat, and the second usage is the one that gets passed to emitLengthCheck. We can correspond this to the previous core code: the first exit corresponds to the bad passed to emitLengthCheck, and the second exit corresponds to the bad passed to sat. In this case, we don’t currently know why GHC has chosen to define exit twice.

We could also be curious as to whether certain operations can be factored out. For example, in the following code we could ask can these length checks <# o# dt and <# ipv dt be factored out - why is it doing these twice?

  λo# eta ->
    case <# o# dt of
      DEFAULT ->
        exit eta
      1 ->
        ...
        let
          ipv =
            +# o# 1
        in case <# ipv dt of
          DEFAULT ->
            ...

And we can actually fix this by changing the definition of manyTest to:

manyTest :: Parser [Char]
manyTest = many (try (string "ab") $> (code 'c'))

Going back to the core, we can see now that the first length check has been replaced with the second length check i.e. it’s been factored out. Because we always want to read an a character followed by a b character, this code will now always check if there are atleast two characters before trying to read any characters at all, rather than reading an a and then failing when there are no further characters.

  λo# eta ->
    case <# (+# o# 1) dt of
      DEFAULT ->
        exit eta
      1 ->
        ...
        let
          ipv =
            +# o# 1
        in
          ...

We can also make the distinction between lazy and strict values. When we see a let, this is a lazy variable declaration, where a let! signifies a strict variable declaration. We can spot code where values are lazily constructed and determine if that value doesn’t actually need to be lazy. For example, the following code is lazy, but we should be able to determine it strictly. In theory, we should be able to find where the creation of ipv is in the surface program and then bang-pattern it, resulting in let becoming let! in the core program.

  let
    ipv =
      +# o# 1

This entire process is the general work flow when searching for optimisations by looking at Haskell core:

  • Writing the program
  • Generating the core
  • Analysing the core to see if we are unhappy about how some code has been generated
  • Tweaking the way the code generates so that the ineffiency disappears. (If we are using staging, this means changing the code from the staging point-of-view.)

If we are trying to determine what is causing slowness in a certain effect system, one thing that could be interesting to inspect is abstraction - if we have a lot of function calls that could have been inlined, or unwrapping and rewrapping of data structures.

Another thing to look out for is if we certain arguments haven’t been unboxed when instead they could have been. Most types in GHC are boxed (values of that type are represented by a pointer to a heap object, e.g. the representation of Int is a two-word heap object). An unboxed type is represented by the value itself, no pointers or heap allocation is involved. These correspond to the “raw machine” types we would use in C. Most unboxed types end in the character #, e.g. Int#, Double#, Addr#, (#, #). Similarly, the primitive operations on these types look like, for example, <#, +#, *#.

To illustrate the consequence of boxed/unboxed types, consider the situation where we have a recursive call which takes an integer as an argument. If this integer had not been unboxed in the core, then the program would box the integer to the call of the recursive function; within the function it will then unbox the integer to do the work on it, and afterwards box up another integer as an argument to the recursive call. This is essentially a looping of unnecessarily boxing up and unboxing values.

The following code illustrates a recursive loop where the integer argument passed in (#o) is already successfully unboxed:

  let
    rec
      loop =
        λo# eta ->
          ...
          loop (+# ipv 1) s2#

So we are looking for constructors where we see a call to function wrap something up, and immediately on the other end of the function, wrap something up again. (This is something monad-bayes might be falling prey to).

As a helpful mindset, (if we were crazy) we could actually write the generated core code manually - we can look at this manual code and figure out what we can do to improve it ourselves. We then identify that we could have done something better, and wonder why hasn’t the compiler done it. Getting the compiler to play ball is then the hard part - we have to convince the compiler that a certain optimisation is possible.

The canonical way of fixing performance issues in effect systems is removing the monad transformers and hand-rolling the entire instance. Although this is against the point of effect systems, at the same it can be very helpful. If we handroll the instance (which we can do by basically taking the monad transformed instances and inlining the definitions ourselves until we get to something primitive), we are essentially looking to see whether the core has generated that. This is a pattern of creating the ideal program, and working out which bits the compiler has missed out on that we have recognised. This is always a good approach when we don’t understand what can be done better to the original program yet, and is a way to recognise and understand unfamiliar structures in core, such as (#, #).

The relationship between the profiling information and the core, is more obvious than the relationship between the profiling information and the original code. We can use the profiler to guide us to the parts of the core that we want to look at.

Additionally, although memory leaks might not be what our performance issues are, they also could be. In Haskell, it’s not called a memory leak, it’s a space leak which is a bunch of unevaluated thunks which is causing a long chain of computation. The memory is held on to by the thunks even though it is potentially not used. If we see lots of heap activity, this could be a sign that laziness in our program is adding overhead. Also, if we are seeing a lot of objects being churned through (even if they are going through quite fast), we could ask “are those objects necessary, or could they have been unboxed and thrown away?”.

In the core visualisation, we also can highlight over certain fragments of code, and in the top right corner, this shows us the type information and also strictness properties. Even though something may appear to be lazy, we can identify that something is actually a strict binding (because it says S rather than L).

Last updated on 13 Nov 2020
Published on 13 Nov 2020