Generalised Abstract Data Types (GADTs)

Through my self-study and self-experimentation with GADTs, i’ve learnt why this topic is such a page turner and its related (i think) to kind polymorphism (embodied in the {-# LANGUAGE PolyKinds #-} GHC extension) for the reasons that GADTs generalised recursive data types (i.e. data types are defined in terms of itself). In this post, i wanted to share with you what i’ve learnt about GADTs and how it solved a problem for me, which in turn might solve yours as well.

GADTs in 1-line

This topic, is best realized by this one liner from the Haskell documentation:

Generalised Algebraic Data Types generalise ordinary algebraic data types by
allowing constructors to have richer return types.

Admittedly, when i first encountered this, i was scratching my head with fervent intensity. How i learnt this technique is by working on a small-enough problem and re-working that problem in this new technique to examine, for myself, where it solved the issue and of course whether i got anything else FOC. Disclaimer: I constantly remind myself that whatever solution i’ve worked on might not be canonical or even best-in-class.

Having said that, let’s push forward by starting with a small example.

The “Problem”

Imagine the following interpreter of expressions, the usual method of writing evaluation functions is given below and it works great.

data Expr = I Int | Add Expr Expr | Mul Expr Expr

The expression above basically models a very limited featured calculator, minusing some common features you would expect from a COTS calculator. This form of modelling a language is also known as a domain-specific language, otherwise commonly called a DSL. The interpreter of such expressions can be written as follows

eval :: Expr -> Int
eval (I n) = n
eval (Add e1 e2) = eval e1 + eval e2
eval (Mul e1 e2) = eval e1 * eval e2

When the above formulation is run, there’re no surprises and you can the expected results

eval (Add (I 32) (I 10)) -- 42
eval (Add (Add (I 32) (I 10)) (I 33)) -- 75

Adding Scope to the “Problem”

Now, if we expand the former expression to include other data types, we start to discover that we had just opened a can of worms and it seems that Haskell is unable to support what looks like simple expressions…

data Expr = I Int | B Bool | Add Expr Expr | Mul Expr Expr | Eq Expr Expr

This time, eval can capture both Ints and Booleans so i want to express that via the Either data type; soon, i run into an issue w.r.t semantics

eval :: Expr -> Either Int Bool
eval (I n) = Left n
eval (B n) = Right n
eval (Add e1 e2) = undefined -- what does it mean for bool + bool ? or int + bool or bool + int ? clearly, it does not make sense and it does not compute ... so we have to capture that as well
eval (Mul e1 e2) = undefined -- same reasoning here
eval (Eq e1 e2) = undefined

At this point in time, i am stuck because while the code compiles but its not useful to me as well because i am unable to assign an implementation to eval (Add e1 e2), eval (Mul e1 e2) etc because it simply does not make any sense, whatsoever. The next thing i could do is to encapsulate the idea of capturing both to be very explicit with my expressions (see below) and have the pattern-matching perform the heavy-lifting but its prone to errors and i need to be mindful that this DSL is really small.

-- The following is how it might work

eval :: Expr -> Maybe (Either Bool Int)
eval (I n) = Just (Right n)
eval (B n) = Just (Left n)
eval (Add (I e1) (I e2)) = Just . Right $ (e1 + e2)
eval (Add (B e1) (B e2)) = Just . Left  $ (e1 && e2)
eval (Mul (I e1) (I e2)) = Just . Right $ (e1 * e2)
eval (Mul (B e1) (B e2)) = Just . Left  $ (e1 && e2)
eval (Add e1 e2) = (<*>) (fmap (liftA2 (+)) (eval e1)) (eval e2)
eval (Mul e1 e2) = (<*>) (fmap (liftA2 (*)) (eval e1)) (eval e2)
eval (Eq e1 e2)  = let a = fmap (fromLeft False) (eval e1)
                       b = fmap (fromLeft False) (eval e2)
                    in liftM Left ((<*>) (fmap (==) a) b)

The above expressions have been crafted because i wanted to capture both the ideas of Integers and Booleans; the downside of this approach is that i did find that it was a lot of work given that the DSL i’m using here is small.

In any case, the interpretation seems to work; see below

eval (Add (I 32) (I 10)) -- 42
eval (Add (Add (I 32) (I 10)) (I 33)) -- 75
eval (Eq (Add (B False) (B True)) (B False)) -- Just (Left True), hint: the implementation is correct
eval (Eq (Add (I 32) (I 10)) (I 42)) -- Just (Left True), hint: the implementation is wrong, can you see why?

A solution!

The question i have (in addition to the fact that the implementation for Eq is incorrect) is whether there’s a better way to do it? Seems like its a problem where GADTs are designed to solve since i want to write code that is correct and the logic should just be correct, readable and explicit immediately. Below is how you would model the previous problem to take the form of a GADT-syntax.

data Expr a where
  I   :: Int -> Expr Int
  B   :: Bool -> Expr Bool
  Add :: Expr Int -> Expr Int -> Expr Int
  Mul :: Expr Int -> Expr Int -> Expr Int
  Eq  :: Eq a => Expr a -> Expr a -> Expr Bool

The important thing to remember about the above is that the constructors have now a richer way to return data types. You can see it in I, B etc and with this new encoding, i am able to write the eval family of functions in this manner where the logic/implementation is in plainsight!

eval :: Expr a -> a
eval (I n) = n
eval (B n) = n
eval (Add e1 e2) = eval e1 + eval e2
eval (Mul e1 e2) = eval e1 * eval e2
eval (Eq  e1 e2) = eval e1 == eval e2

The immediate thing that stood out from this implementation is that some expressions are no longer permissible - which is a much relief! The Haskell compiler has delivered me from my dilemma!

eval (Eq (Add (B False) (B True)) (B False)) -- no longer allowed, makes a lot of sense
eval (Eq (Mul (B False) (B True)) (B False)) -- no longer allowed

And now, when i re-ran the hypotheses its correct.

eval (Add (I 32) (I 10)) -- 42
eval (Add (Add (I 32) (I 10)) (I 33)) -- 75
eval (Eq (Add (I 32) (I 10)) (I 42)) -- True
eval (Eq (Add (I 33) (I 10)) (I 42)) -- False

The astute readers amongst you must have gathered that this technique has quite a few particular benefits and you might wonder whether it has other applications besides crafting DSLs; afterall, you might wonder (as i did) : “most of us might not ever needed to do that”. The way i think about the use of DSLs in these examples is that the grammar of the mini-language should presents an intuition of how an expression should behave before we get to actually coding the solution; the second part about using DSLs is that it makes it easier to reason about the validity of scenarios because we are guided by what we understood as the boundaries of the DSL (e.g. I know that it makes absolute no sense to add or multiply 2 booleans unless we choose to interpret them as conjunctions or disjunctions (i.e. the logical operators && or ||)).

Having said that, let me share with you another example of how i wish to serialise the results of the operations into bit-strings; represented by the data type Bit.

data Bit = O | Z deriving Show

serialise :: Expr a -> [Bit]
serialise (I x) = foldl (\acc -> \e -> if e == '1' then O:acc else Z:acc) [] $ showIntAtBase 2 ("01" !!) x ""
serialise (B True) = [O]
serialise (B False) = [Z]
serialise (Add e1 e2) = serialise $ I ((eval e1) + (eval e2))
serialise (Mul e1 e2) = serialise $ I ((eval e1) * (eval e2))

In the interpretation of such values, i have considered 2 scenarios:

When i run into True, it would return a 1 (encoded by O)
When i run into False, it would return a 0 (encoded by Z)
When i run into anything else , i would serialise their individual parts and merge those parts back.

serialise (Add (Add (I 33) (I 10)) (I 42)) -- gives [O,Z,O,Z,O,Z,O]
serialise (Mul (Add (I 33) (I 10)) (I 42)) -- gives [Z,O,O,O,Z,Z,Z,Z,O,O,O]

When i evaluate those expressions, it gives me back the expected bit-string representations (see below); again, this is not by magic but due to the type inference of the haskell compiler that it safe-guards the validity (i.e. type-safety) so that i do not accidentally inject expressions that are not type-safe. What is not explicit throughout this exercise i’ve shown here is that pattern-matching causes type refinement e.g. in the equation

serialise :: Expr a -> [Bit]
serialise (I x) = ...
... -- omitted

the a is refined to Int because of the definition of the smart constructor I.

The full code is below

{-# LANGUAGE GADTs #-}
import Numeric

data Expr a where
  I :: Int -> Expr Int
  B :: Bool -> Expr Bool
  Add :: Expr Int -> Expr Int -> Expr Int
  Mul :: Expr Int -> Expr Int -> Expr Int
  Eq :: Eq a => Expr a -> Expr a -> Expr Bool

eval :: Expr a -> a
eval (I n) = n
eval (B n) = n
eval (Add e1 e2) = eval e1 + eval e2
eval (Mul e1 e2) = eval e1 * eval e2
eval (Eq  e1 e2) = eval e1 == eval e2

data Bit = O | Z deriving Show

serialise :: Expr a -> [Bit]
serialise (I x) = foldl (\acc -> \e -> if e == '1' then O:acc else Z:acc) [] $ showIntAtBase 2 ("01" !!) x ""
serialise (B True) = [O]
serialise (B False) = [Z]
serialise (Add e1 e2) = serialise $ I ((eval e1) + (eval e2))
serialise (Mul e1 e2) = serialise $ I ((eval e1) * (eval e2))

main :: IO ()
main = do
  putStrLn . show $ eval (Add (I 32) (I 10))
  putStrLn . show $ eval (Add (Add (I 32) (I 10)) (I 33))
  putStrLn . show $ eval (Eq (Add (I 32) (I 10)) (I 42))
  putStrLn . show $ eval (Eq (Add (I 33) (I 10)) (I 42))
  putStrLn . show $ serialise (Add (Add (I 33) (I 10)) (I 42))
  putStrLn . show $ serialise (Mul (Add (I 33) (I 10)) (I 42))

References (for further reading)

Go back to main site