Embedding DSLs

Tagless Final Style

2. Interpreters for first-order languages

2.1 Initial and Final embeddings

We start with a very simple language to be later extended, which has integer literals, negation, and addition. Our running example is the expression:

8 + (- (1 + 2))

The initial embedding of the language encodes the expressions of the language as the values of an algebraic data type.

data Exp = Lit Int
         | Neg Exp
         | Add Exp Exp

Our running example is written as follows:

ti1 = Add (Lit 8) (Neg (Add (Lit 1) (Lit 2)))

The first interpreter of the language is an evaluator which proceeds by pattern-matching:

eval :: Exp -> Int
eval (Lit n) = n
eval (Neg e) = - (eval e)
eval (Add e1 e2) = eval e1 + eval e2

We can embed our language in Haskell differently, using a final embedding. If all we ever need is the value of an expression, we can represent the term in our arithmetic language by its value, or equivalentally by a Haskell expression that computes that value.

We introduce:

A representation type for the meaning of an expression, Int.
Functions computing the meaning of the three expressions forms of the language (literals, negation, and addition).

type Repr = Int

lit :: Int -> Repr
lit n = n

neg :: Repr -> Repr
neg e = - e

add :: Repr -> Repr -> Repr
add e1 e2 = e1 + e2

The computation is compositional: the meaning of addition is computed from the meaning of the summands (i.e. the value of the summands).

Our running example then has form:

tf1 = add (lit 8) (neg (add (lit 1) (lit 2)))

Comparing the initial embedding ti1 to the final embedding tf1, they differ in terms of the identifiers.

The initial embedding seems more general as it permits other interpretations of the language, for example, pretty-printing.

view :: Exp -> String
view (Lit n) = show n
view (Neg e) = "(-" ++ view e ++ ")"
view (Add e1 e2) = "(" ++ view e1 ++ " + " ++ view e2 ++ ")"

The final embedding, on the other hand, hardwires the evaluator into the representation tf1, making it impossible to interpret the object term as something other than an integer.

Using type classes in final embedding

We want final embedding to permit multiple interpretations too; therefore we must find a way to parameterize the “constructor functions” such as lit and neg by the result type. We can use Haskell’s type class system as a means of parameterization.

class ExpSYM repr where
  lit :: Int  -> repr
  neg :: repr -> repr
  add :: repr -> repr

Denotational Semantics of Final Style

The declaration of ExpSYM should remind us of denotational semantics, over the semantic domain repr. As befits denotational semantics, the meaning of an expression (whatever repr happens to be) is computed from the meaning of the components.

The running example has the same final embedded form as before, but the inferred type is different: it is no longer Repr (that is, Int), but rather it is ExpSYM repr => repr, hence being polymorphic over the semantic domain. An object term is not represented by its abstract syntax, but by its meaning (its denotation in a semantic domain).

Interpreting Final-Encoded Expressions

To interpret finally-encoded expressions, we write an instance for ExpSYM, specifying the semantic domain. For example, we may interpret expressions as integers, by mapping object language integers to Haskell integers, and object language negation to Haskell negation, etc:

instance ExpSYM Int where
  lit n     = n
  neg e     = - e
  add e1 e2 = e1 + e2

We write the evaluator as the identity function, so that eval tf1 has the value 5:

eval :: Int -> Int
eval = id

The function eval has a strange type for an evaluator, and an even stranger definition (the identity function). It is more proper to call eval a “selector” of an interpretation – a finally-encoded expression has an indefinite number of interpretations; eval selects one of them. In this case, it selects an interpretation as an integer.

Unlike the initial interpreter, the final interpreter has no pattern-matching on abstract syntax, and so has no syntax-dispatch overhead.

The SYM in ExpSYM stands for Symantics: the class declaration defines the syntax of the embedded language (its expression forms); the class instances define the semantics (the interpretations). Multiple interpretations are now possible through defining different type class instances and corresponding interpreters where their type definition is what selects which instance is used.

Comparing Properties of Initial vs Final Encoded Objects

In the initial embedding, encoded object terms and their interpreters are ordinary monomorphic Haskell values, and hence are first-class. We may collect terms into a list, of the type [Exp], and interpret uniformly by mapping an evaluator.

til1 = [Lit 1, Add (Lit 1) (Lit 3)]

> map eval til1 
> [1, 4]

In the final embedding, encoded object terms are represented as polymorphic Haskell values, which are not fully first-class: storing them in data structures or passing them as arguments generally loses polymorphism.

2.2 Extensibility and the Expression Problem

We have already considered one kind of extensibility: defining different interpreters for the same object language. Now, consider extending the object language itself, by adding a new syntactic form, multiplication.

The initial encoding represents the object language as an algebraic data type. To add new forms to the language, we have to add new variants to the data type.

data Exp = Lit Int
         | ...
         | Mul Exp Exp

We have to change the data type declaration and hence adjust all code that directly refers to that declaration. This affirms the concept of the expression problem. The expression problem, cast in terms of embedded languages, is about 1) defining new interpreters for the existing terms and 2) extending interpreters to handle an enriched language. In basic functional programming, it is easy to add new operations on data but it is hard to add new data variants.

Suppose the final encoding of the original language, the type class ExpSYM and its instances, are imported as F.

class ExpSYM repr where
  lit :: Int  -> repr
  neg :: repr -> repr
  add :: repr -> repr -> repr

instance ExpSYM Int where
  lit n     = n
  neg e     = - e
  add e1 e2 = e1 + e2

To add multiplication, we define a new type class, just for the new language form:

class MulSYM repr where
  mul :: repr -> repr -> repr

This can be used immediately to write extended language terms:

tfm1 = add (lit 7) (neg (mul (lit 1) (lit 2))  )

Note that we do not have to extend the interpreter eval. We merely define the meaning of mul in the semantic domain of Int. Recall that the evaluator eval is simply a selector of an interpretation in the desired semantic domain (the result type).

Brief conclusion:

The final encoding makes it easy to add not only new interpretations, but also new language forms, making the interpreters extensible by default.
The initial encoding can also be made extensible by using the data types a la carte approach, by using fix points of a constructor signature, and combining constructor signatures by taking a co-product and automating the injections. However, the automation requires overlapping instances and the explicit enumeration of all constructor signatures in the type definition of the interpreters. In contrast, the final encoding enjoys type inference and encodes terms that cannot be represented as fixpoints of a functor.

2.3 The de-serialization problem

The expression problem (in terms of embedded languages) is about:

Defining new interpreters for the existing terms
Extending interpreters to handle an enriched language

In both cases, we are given embedded language terms as input. So far, the process of evaluation has consisted of entering Haskell code that when compiled and run, will produce the values representing the desired terms. This method works well if we know the terms to process beforehand. The method of writing new Haskell code for each new embedded language term does not work well for communicating terms between computers, or storing terms in files and processing them later.

The first direction, the storing and sending of the terms, is unproblematic.

2.4 Pattern-matching in the Final approach

Evaluators and other processors of embedded language terms have been interpreters, which fold over a term. This section describes operations that do not look like folds.

The initial approach lets us write such operations easily, with pattern-matching and general recursion.

The final approach does not “seem” to permit these operations or pattern-matching. However, it is an illusion that this is impossible.

First, lets recall the principle that underlies interpreters - compositionality. This principle states that “the meaning of a complex expression is determined by its structure and the meaning of its constituents”. For example, eval (Add e1 e2) = eval e1 + eval e2.

Our embedded language interpreters have been compositional; they define the language’s denotational semantics, which is required to be compositional. The compositional interpretation of a term is epitomized in a fold. Our interpreters are all folds. In the final approach, the fold is wired in in the definition of the interpreters. Compositionality, i.e. context insensitivity, lets us build the meaning of a larger expression from the bottom-up, from the leaves to the root.

There are, however, many operations that do not seem compositional because the handling of a sub-expression does depend on where it appears in a larger expression, i.e. it is context sensitive.

Folding Domain-Specific Languages: Deep and Shallow Embeddings

Convo with Sam

ugh, sorry poor memory, where did you suggest that i should add some extra type information in this? class ExprSym repr where const_ :: val -> repr var_ :: Var -> repr if_ :: repr -> repr -> repr -> repr app_ :: [repr] -> repr sample_ :: [repr] -> repr observe_ :: [repr] -> repr fn_ :: [Var] -> repr prim_ :: Var

Sam 4:22 PM parameterise repr 4:22 e.g. const_ :: Val -> repr Val 4:23 or var_ :: Var -> repr Var 4:23 its means you can make sure the first arg to the if is a Bool - a repr Bool

Sam 4:24 PM hm 4:24 lemme think 4:24 / check :+1: 1

4:26 yes youd make it a fixed version

Minh 4:26 PM ah, so like a phantom parameter which isn’t used?

Sam 4:26 PM Nick’s paper is the best reference for comparing the embedding styles https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.432.1702&rep=rep1&type=pdf :+1: 1

4:27 it is used, it marks the recursion points :+1: 1

4:29 actually youre right 4:29 it would be a phantom type cos atm your deep dsl is not typed, which is why it doesnt match with the typed classy embedding 4:30 fixing is different, its just another thing that you can do 4:30 sorry im tired today lol

Minh 4:31 PM ooh, but i could type it! no worries at all, you’ve been super helpful :heart: :heart: New

Sam 4:31 PM im glad

Last updated on 13 Nov 2020
Published on 13 Nov 2020