Embedding DSLs
Tagless Final Style
2. Interpreters for first-order languages
2.1 Initial and Final embeddings
We start with a very simple language to be later extended, which has integer literals, negation, and addition. Our running example is the expression:
8 + (- (1 + 2))
The initial embedding of the language encodes the expressions of the language as the values of an algebraic data type.
data Exp = Lit Int
| Neg Exp
| Add Exp Exp
Our running example is written as follows:
ti1 = Add (Lit 8) (Neg (Add (Lit 1) (Lit 2)))
The first interpreter of the language is an evaluator which proceeds by pattern-matching:
eval :: Exp -> Int
eval (Lit n) = n
eval (Neg e) = - (eval e)
eval (Add e1 e2) = eval e1 + eval e2
We can embed our language in Haskell differently, using a final embedding. If all we ever need is the value of an expression, we can represent the term in our arithmetic language by its value, or equivalentally by a Haskell expression that computes that value.
We introduce:
- A representation type for the meaning of an expression,
Int
. - Functions computing the meaning of the three expressions forms of the language (literals, negation, and addition).
type Repr = Int
lit :: Int -> Repr
lit n = n
neg :: Repr -> Repr
neg e = - e
add :: Repr -> Repr -> Repr
add e1 e2 = e1 + e2
The computation is compositional: the meaning of addition is computed from the meaning of the summands (i.e. the value of the summands).
Our running example then has form:
tf1 = add (lit 8) (neg (add (lit 1) (lit 2)))
Comparing the initial embedding ti1
to the final embedding tf1
, they differ in terms of the identifiers.
-
The initial embedding seems more general as it permits other interpretations of the language, for example, pretty-printing.
view :: Exp -> String view (Lit n) = show n view (Neg e) = "(-" ++ view e ++ ")" view (Add e1 e2) = "(" ++ view e1 ++ " + " ++ view e2 ++ ")"
- The final embedding, on the other hand, hardwires the evaluator into the representation
tf1
, making it impossible to interpret the object term as something other than an integer.
Using type classes in final embedding
We want final embedding to permit multiple interpretations too; therefore we must find a way to parameterize the “constructor functions” such as lit
and neg
by the result type. We can use Haskell’s type class system as a means of parameterization.
class ExpSYM repr where
lit :: Int -> repr
neg :: repr -> repr
add :: repr -> repr
Denotational Semantics of Final Style
The declaration of ExpSYM
should remind us of denotational semantics, over the semantic domain repr
. As befits denotational semantics, the meaning of an expression (whatever repr
happens to be) is computed from the meaning of the components.
The running example has the same final embedded form as before, but the inferred type is different: it is no longer Repr
(that is, Int
), but rather it is ExpSYM repr => repr
, hence being polymorphic over the semantic domain. An object term is not represented by its abstract syntax, but by its meaning (its denotation in a semantic domain).
Interpreting Final-Encoded Expressions
To interpret finally-encoded expressions, we write an instance for ExpSYM
, specifying the semantic domain. For example, we may interpret expressions as integers, by mapping object language integers to Haskell integers, and object language negation to Haskell negation, etc:
instance ExpSYM Int where
lit n = n
neg e = - e
add e1 e2 = e1 + e2
We write the evaluator as the identity function, so that eval tf1
has the value 5
:
eval :: Int -> Int
eval = id
The function eval
has a strange type for an evaluator, and an even stranger definition (the identity function). It is more proper to call eval
a “selector” of an interpretation – a finally-encoded expression has an indefinite number of interpretations; eval
selects one of them. In this case, it selects an interpretation as an integer.
Unlike the initial interpreter, the final interpreter has no pattern-matching on abstract syntax, and so has no syntax-dispatch overhead.
The SYM
in ExpSYM
stands for Symantics: the class declaration defines the syntax of the embedded language (its expression forms); the class instances define the semantics (the interpretations). Multiple interpretations are now possible through defining different type class instances and corresponding interpreters where their type definition is what selects which instance is used.
Comparing Properties of Initial vs Final Encoded Objects
In the initial embedding, encoded object terms and their interpreters are ordinary monomorphic Haskell values, and hence are first-class. We may collect terms into a list, of the type [Exp]
, and interpret uniformly by mapping an evaluator.
til1 = [Lit 1, Add (Lit 1) (Lit 3)]
> map eval til1
> [1, 4]
In the final embedding, encoded object terms are represented as polymorphic Haskell values, which are not fully first-class: storing them in data structures or passing them as arguments generally loses polymorphism.
2.2 Extensibility and the Expression Problem
We have already considered one kind of extensibility: defining different interpreters for the same object language. Now, consider extending the object language itself, by adding a new syntactic form, multiplication.
The initial encoding represents the object language as an algebraic data type. To add new forms to the language, we have to add new variants to the data type.
data Exp = Lit Int
| ...
| Mul Exp Exp
We have to change the data type declaration and hence adjust all code that directly refers to that declaration. This affirms the concept of the expression problem. The expression problem, cast in terms of embedded languages, is about 1) defining new interpreters for the existing terms and 2) extending interpreters to handle an enriched language. In basic functional programming, it is easy to add new operations on data but it is hard to add new data variants.
Suppose the final encoding of the original language, the type class ExpSYM
and its instances, are imported as F
.
class ExpSYM repr where
lit :: Int -> repr
neg :: repr -> repr
add :: repr -> repr -> repr
instance ExpSYM Int where
lit n = n
neg e = - e
add e1 e2 = e1 + e2
To add multiplication, we define a new type class, just for the new language form:
class MulSYM repr where
mul :: repr -> repr -> repr
This can be used immediately to write extended language terms:
tfm1 = add (lit 7) (neg (mul (lit 1) (lit 2)) )
Note that we do not have to extend the interpreter eval
. We merely define the meaning of mul
in the semantic domain of Int
. Recall that the evaluator eval
is simply a selector of an interpretation in the desired semantic domain (the result type).
Brief conclusion:
-
The final encoding makes it easy to add not only new interpretations, but also new language forms, making the interpreters extensible by default.
-
The initial encoding can also be made extensible by using the data types a la carte approach, by using fix points of a constructor signature, and combining constructor signatures by taking a co-product and automating the injections. However, the automation requires overlapping instances and the explicit enumeration of all constructor signatures in the type definition of the interpreters. In contrast, the final encoding enjoys type inference and encodes terms that cannot be represented as fixpoints of a functor.
2.3 The de-serialization problem
The expression problem (in terms of embedded languages) is about:
- Defining new interpreters for the existing terms
- Extending interpreters to handle an enriched language
In both cases, we are given embedded language terms as input. So far, the process of evaluation has consisted of entering Haskell code that when compiled and run, will produce the values representing the desired terms. This method works well if we know the terms to process beforehand. The method of writing new Haskell code for each new embedded language term does not work well for communicating terms between computers, or storing terms in files and processing them later.
The first direction, the storing and sending of the terms, is unproblematic.
2.4 Pattern-matching in the Final approach
Evaluators and other processors of embedded language terms have been interpreters, which fold over a term. This section describes operations that do not look like folds.
The initial approach lets us write such operations easily, with pattern-matching and general recursion.
The final approach does not “seem” to permit these operations or pattern-matching. However, it is an illusion that this is impossible.
First, lets recall the principle that underlies interpreters - compositionality. This principle states that “the meaning of a complex expression is determined by its structure and the meaning of its constituents”. For example, eval (Add e1 e2) = eval e1 + eval e2
.
Our embedded language interpreters have been compositional; they define the language’s denotational semantics, which is required to be compositional. The compositional interpretation of a term is epitomized in a fold
. Our interpreters are all folds. In the final approach, the fold is wired in in the definition of the interpreters. Compositionality, i.e. context insensitivity, lets us build the meaning of a larger expression from the bottom-up, from the leaves to the root.
There are, however, many operations that do not seem compositional because the handling of a sub-expression does depend on where it appears in a larger expression, i.e. it is context sensitive.
Folding Domain-Specific Languages: Deep and Shallow Embeddings
Convo with Sam
ugh, sorry poor memory, where did you suggest that i should add some extra type information in this? class ExprSym repr where const_ :: val -> repr var_ :: Var -> repr if_ :: repr -> repr -> repr -> repr app_ :: [repr] -> repr sample_ :: [repr] -> repr observe_ :: [repr] -> repr fn_ :: [Var] -> repr prim_ :: Var
Sam 4:22 PM parameterise repr 4:22 e.g. const_ :: Val -> repr Val 4:23 or var_ :: Var -> repr Var 4:23 its means you can make sure the first arg to the if is a Bool - a repr Bool
Minh 4:23 PM mmm, this would mean that my original PExpr would also be functor, right (or atleast take a type parameter): data PExpr = PConst Val | PVar Var | PIf PExpr PExpr PExpr | PApp [PExpr] | PFn [Var] PExpr | PPrim Var | PSample [PExpr] | PObserve [PExpr] (edited)
Sam 4:24 PM hm 4:24 lemme think 4:24 / check :+1: 1
4:26 yes youd make it a fixed version
Minh 4:26 PM ah, so like a phantom parameter which isn’t used?
Sam 4:26 PM Nick’s paper is the best reference for comparing the embedding styles https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.432.1702&rep=rep1&type=pdf :+1: 1
4:27 it is used, it marks the recursion points :+1: 1
4:29 actually youre right 4:29 it would be a phantom type cos atm your deep dsl is not typed, which is why it doesnt match with the typed classy embedding 4:30 fixing is different, its just another thing that you can do 4:30 sorry im tired today lol
Minh 4:31 PM ooh, but i could type it! no worries at all, you’ve been super helpful :heart: :heart: New
Sam 4:31 PM im glad