Haskell Core
Understanding Haskell Core
The best way to understand why a profiling report outputs certain performance metrics (for Haskell) is to dive into the actual core itself, which is what the surface-level Haskell code compiles down into. To inspect the core in a coherent manner, we can add the following to either our package.yaml
file (when using stack
):
ghc-options:
- dump-core
dependencies:
- -fplugin=DumpCore
or our .cabal
file (when using cabal):
build-depends: dump-core
ghc-options: -fplugin-DumpCore
When we build our program (stack build
or ghc
), this creates a folder called dump-core
containing html
files. Opening one of these files gives us a nice visualisation of the core code of our program. The code we see is how Haskell looks in reality.
Usually, if one is confused about something in the core, it is best to compare it with the original program.
regTest :: Parser Int
regTest = newRegister_ (code 7) (\r -> modify_ r makeQ (succ @Int) [||succ @Int||]) *> (let g = get r in g *> g))
This makes a new register, which in this context would be an STRef
, containing 7; it then modifies it with +1
, and afterwards it gets the value out twice, and returns it the second time.
(32) lvl =
λs1 ->
-- Using 'newMutVar', it makes the new register containing a value 'lvl' we expect to be 7.
let! (#, #) ipv ipv1 = newMutVar# lvl s1 in
-- Using 'readMutVar', it then reads then 7 to get an int out in ipv1
let! (#, #) ipv ipv1 = readMutVar# ipv1 ipv in
-- It performs the succ function on value 7 (ipv1) and puts it back in the hole using 'writeMutVar'
let! s2# = writeMutVar# ipv1 ($fEnumInt_$csucc ipv1) ipv in
-- We read from the register twice, using 'readMutVar', but we only care about the second value...
let! (#, #) ipv _ = readMutVar# ipv1 s2# in
let! (#, #) ipv ipv1 = readMutVar# ipv1 ipv in
-- ... which is returned the second time after reading.
(#, #) ipv (Just ipv1)
Is there a way of seeing if what your code compiles down into is nonoptimal? Usually we will see something we’re unhappy with. In the above case, there is no chance that the core version of the program could be better. If we start seeing function applications or random case statements, we might be suspicious.
Consider the following code, which is supposed to take a sequence of alternating a’s and b’s and roll them all up:
manyTest :: Parser [Char]
manyTest = many (string "ab" $> (code 'c'))
In the core version of manyTest
, one of the problems we can observe within it is that a particular function called exit
is created twice:
λs1 ->
...
let
exit =
λo# eta ipv ->
case ==# o# ipv of
DEFAULT ->
(#, #) eta Nothing
1 ->
let! (#, #) ipv ipv1 = readMutVar# ipv1 eta in
(#, #) ipv (Just (ipv1 []))
exit =
λo# eta ipv ->
case ==# o# ipv of
DEFAULT ->
(#, #) eta Nothing
1 ->
let! (#, #) ipv ipv1 = readMutVar# ipv1 eta in
(#, #) ipv (Just (ipv1 []))
The question becomes “why has the compiler not worked out that these are the same functions, and duplicated it when it didn’t need to?”. In the body of the let
statement of the core version, we can observe where exit
is used - it tells us that the program checks if there is enough input, and if so, is that input the character ‘b’. So we expect the call of exit
to be somewhere where we read characters.
in
let ...
in case <# ipv dt of
DEFAULT ->
exit o# eta ipv
1 ->
case indexWideCharArray# input# ipv of
DEFAULT ->
exit o# eta ipv
'b' ->
...
These exit
variables would have been defined in our original code somewhere. In this case, we can see the two different functions sat
and emitLengthCheck
both use the expression $$bad
which represents the exit
variable.
sat ... =
... if ... then ... else $$bad
emitLengthCheck ... =
... if ... then ... else $$bad
We fail in the case in which we don’t have enough input, and we fail in the case where the character doesn’t match. So we ask why these exit
variables are two different values, when they could have been the same one - these calls to $$bad
should really come from the same place. The question then becomes: “how do we work out where they came from, and how do we trace them back to their common source?”.
We can find that in some function evalSat
, that we’ve defined a function maybeEmitCheck
that defines the bad
variable in a let
statement, and uses it twice in its body.
evalSat .. = do
...
where
maybeEmitCheck ... =
[|| let bad = $$(raise y)
in $$(emitLengthCheck n (sat (genDefunc p) mk [||bad||]) [||bad||] y)
||]
The first usage of bad
is the one that gets passed to sat
, and the second usage is the one that gets passed to emitLengthCheck
. We can correspond this to the previous core code: the first exit
corresponds to the bad
passed to emitLengthCheck
, and the second exit
corresponds to the bad
passed to sat
. In this case, we don’t currently know why GHC has chosen to define exit
twice.
We could also be curious as to whether certain operations can be factored out. For example, in the following code we could ask can these length checks <# o# dt
and <# ipv dt
be factored out - why is it doing these twice?
λo# eta ->
case <# o# dt of
DEFAULT ->
exit eta
1 ->
...
let
ipv =
+# o# 1
in case <# ipv dt of
DEFAULT ->
...
And we can actually fix this by changing the definition of manyTest
to:
manyTest :: Parser [Char]
manyTest = many (try (string "ab") $> (code 'c'))
Going back to the core, we can see now that the first length check has been replaced with the second length check i.e. it’s been factored out. Because we always want to read an a
character followed by a b
character, this code will now always check if there are atleast two characters before trying to read any characters at all, rather than reading an a
and then failing when there are no further characters.
λo# eta ->
case <# (+# o# 1) dt of
DEFAULT ->
exit eta
1 ->
...
let
ipv =
+# o# 1
in
...
We can also make the distinction between lazy and strict values. When we see a let
, this is a lazy variable declaration, where a let!
signifies a strict variable declaration. We can spot code where values are lazily constructed and determine if that value doesn’t actually need to be lazy. For example, the following code is lazy, but we should be able to determine it strictly. In theory, we should be able to find where the creation of ipv
is in the surface program and then bang-pattern it, resulting in let
becoming let!
in the core program.
let
ipv =
+# o# 1
This entire process is the general work flow when searching for optimisations by looking at Haskell core:
- Writing the program
- Generating the core
- Analysing the core to see if we are unhappy about how some code has been generated
- Tweaking the way the code generates so that the ineffiency disappears. (If we are using staging, this means changing the code from the staging point-of-view.)
If we are trying to determine what is causing slowness in a certain effect system, one thing that could be interesting to inspect is abstraction - if we have a lot of function calls that could have been inlined, or unwrapping and rewrapping of data structures.
Another thing to look out for is if we certain arguments haven’t been unboxed when instead they could have been. Most types in GHC are boxed (values of that type are represented by a pointer to a heap object, e.g. the representation of Int
is a two-word heap object). An unboxed type is represented by the value itself, no pointers or heap allocation is involved. These correspond to the “raw machine” types we would use in C. Most unboxed types end in the character #
, e.g. Int#
, Double#
, Addr#
, (#, #)
. Similarly, the primitive operations on these types look like, for example, <#
, +#
, *#
.
To illustrate the consequence of boxed/unboxed types, consider the situation where we have a recursive call which takes an integer as an argument. If this integer had not been unboxed in the core, then the program would box the integer to the call of the recursive function; within the function it will then unbox the integer to do the work on it, and afterwards box up another integer as an argument to the recursive call. This is essentially a looping of unnecessarily boxing up and unboxing values.
The following code illustrates a recursive loop where the integer argument passed in (#o
) is already successfully unboxed:
let
rec
loop =
λo# eta ->
...
loop (+# ipv 1) s2#
So we are looking for constructors where we see a call to function wrap something up, and immediately on the other end of the function, wrap something up again. (This is something monad-bayes
might be falling prey to).
As a helpful mindset, (if we were crazy) we could actually write the generated core code manually - we can look at this manual code and figure out what we can do to improve it ourselves. We then identify that we could have done something better, and wonder why hasn’t the compiler done it. Getting the compiler to play ball is then the hard part - we have to convince the compiler that a certain optimisation is possible.
The canonical way of fixing performance issues in effect systems is removing the monad transformers and hand-rolling the entire instance. Although this is against the point of effect systems, at the same it can be very helpful. If we handroll the instance (which we can do by basically taking the monad transformed instances and inlining the definitions ourselves until we get to something primitive), we are essentially looking to see whether the core has generated that. This is a pattern of creating the ideal program, and working out which bits the compiler has missed out on that we have recognised. This is always a good approach when we don’t understand what can be done better to the original program yet, and is a way to recognise and understand unfamiliar structures in core, such as (#, #)
.
The relationship between the profiling information and the core, is more obvious than the relationship between the profiling information and the original code. We can use the profiler to guide us to the parts of the core that we want to look at.
Additionally, although memory leaks might not be what our performance issues are, they also could be. In Haskell, it’s not called a memory leak, it’s a space leak which is a bunch of unevaluated thunks which is causing a long chain of computation. The memory is held on to by the thunks even though it is potentially not used. If we see lots of heap activity, this could be a sign that laziness in our program is adding overhead. Also, if we are seeing a lot of objects being churned through (even if they are going through quite fast), we could ask “are those objects necessary, or could they have been unboxed and thrown away?”.
In the core visualisation, we also can highlight over certain fragments of code, and in the top right corner, this shows us the type information and also strictness properties. Even though something may appear to be lazy, we can identify that something is actually a strict binding (because it says S
rather than L
).