# Writing a Single-Stepping Interpreter with Monads

One nice thing about Haskell monads - perhaps the nicest - is how they permit separation of concerns. You can write code in a straightforward way, without knowing that underneath it all, all kinds of implicit control flow may be happening. Usually, when we use monads in Haskell, we use the standard ones: `Maybe`

, `Either`

, lists, and so on. When we design our programs, use tend to use compositions of `Reader`

, `Writer`

, `State`

, and probably `IO`

somewhere. This is fine. Going overboard with odd monadic effects is probably not good for readability. But, sometimes it is important to remember that defining your own idiosyncratic monad can allow an elegant implementation of something seemingly very complicated. In this Literate Haskell program, we will see how to implement an single-stepping interpreter for a very simple Lisp dialect, structured in a monadic style. In fact, the single-stepping support is going to be a minuscule part of the overall code. Please note that when we write “Lisp”, we mean a highly cut-down and simplified language that bears only a superficial resemblance to modern industrial Lisp dialects.

This is not a monad tutorial. This program will not be comprehensible unless you already know how to work with monads in Haskell. All I am trying to do is demonstrate a technique that I find cool.

First, we need the usual module boilerplate. Three of the four imports are for the small command line interface we will add near the end.

```
module Main (main) where
import Control.Monad (ap, liftM)
import Data.Char (isSpace, isDigit)
import System.Environment (getArgs)
import Text.ParserCombinators.ReadP
```

## S-Expressions

In Lisp, the same structure is used for representing both code and data values: the so-called *S-expression*. An S-expression is either an *atom* or a *cons cell* (or just *cons*). A cons is just a pair of two other values, called `car`

and `cdr`

for ancient and arcane reasons. An atom is either a symbol or a number. We encapsulate this as one data type:

`data SExp = Cons SExp SExp | Symbol String | Number Int`

We print a cons as `(a . b)`

, where `a`

and `b`

are its constituents. There are two special cases:

if

`b`

is itself a cons`(c . d)`

, we print`(a c . d)`

, and recursively.if

`b`

is the symbol`nil`

, then we print simply`(a)`

.

By convention, the empty list is represented as the symbol `nil`

. These conventions let us print the common case of a linked list as `(a b c)`

instead of `(a . (b . (c . nil)))`

. We implement the printing as an instance of the `Show`

typeclass - I am not normally a fan of using this for human-readable information, but I do not wish to include a prettyprinting library for this program.

```
instance Show SExp where
show (Symbol s) = s
show (Number x) = show x
show (Cons a b) = "(" ++ show a ++ recurse b ++ ")"
where recurse (Symbol "nil") = ""
Cons c d) = " " ++ show c ++ recurse d
recurse (= " . " ++ show x recurse x
```

We can turn any Haskell list of S-expressions into an S-expression that represents that same list, using the `car`

s to contain the head, and the `cdr`

the link to the tail of the list. We use the symbol `nil`

to represent the empty list.

```
toSExp :: [SExp] -> SExp
= Symbol "nil"
toSExp [] :xs) = Cons x (toSExp xs) toSExp (x
```

We can also try to convert a SExp to a Haskell list of SExps, although this is not guaranteed to work.

```
fromSExp :: SExp -> Maybe [SExp]
Symbol "nil") = Just []
fromSExp (Cons car cdr) = do cdr' <- fromSExp cdr
fromSExp (Just $ car : cdr'
= Nothing fromSExp _
```

## The Monad

Before we write the evaluation functions, we’ll define the `InterpM`

monad we will be using. Computation can be in three different states: either we have a value (`Result`

), we have encountered an error during evaluation (`Error`

), or we are at a *stepping point* (`Step`

). This is a labeled breakpoint in execution where we have the option of continuing execution by executing the embedded monadic value. Or we can stop. Or we can save the state for later. This is very similar to continuation-passing style (and indeed, the `InterpM`

monad is just a special case of a continuation monad).

```
data InterpM a = Result a
| Error String
| Step String (InterpM a)
```

The `Monad`

instance is completely straightforward by the types.

```
instance Monad InterpM where
return = Result
fail = Error
Result x >>= f = f x
Error s >>= _ = Error s
Step desc path >>= f = Step desc (path >>= f)
```

The `Functor`

and `Applicative`

instances, which are required in GHC versions newer than 7.8, are completely mechanical.

```
instance Functor InterpM where
fmap = liftM
instance Applicative InterpM where
pure = return
<*> x = f `ap` x f
```

We wrap the `Step`

constructor in a function to create a veneer of abstraction. In principle, the monad could be much more complicated (it would be in most practical implementations), but with a similarly simple interface.

```
step :: String -> InterpM a -> InterpM a
= Step step
```

## The Interpreter

We are now ready to implement our Lisp interpreter. First, we define our variable table, which is a mapping from variable names to their values. Values are, of course, just S-expressions.

`type VarTable = [(String,SExp)]`

Evaluation with the `eval`

function happens in the context of a variable table, and evaluates an S-expression, giving back an `InterpM`

action. Thus, we might get a result (if evaluation finished), an error, or a stepping point, where we can choose to continue execution.

`eval :: VarTable -> SExp -> InterpM SExp`

We will need a few building blocks before we can define the `eval`

function itself. And most importantly, we must understand how Lisp is evaluated. A Lisp *form* is a list `(x y z...)`

. The first element of the list is called the *operator*, and the remaining elements the *operands*. In the general case, the operator is the name of a function, and the operands are recursively evaluated, with the operator applied to the results. However, some forms are *special forms*, which have specialised evaluation semantics. Our Lisp dialect has a few of those, which we will discuss when we get to them.

The most basic evaluation rule is that the operator of a form must be a symbol, so we define a helper function for determining whether an S-expression is a symbol.

```
isSymbol :: SExp -> Maybe String
Symbol v) = Just v
isSymbol (= Nothing isSymbol _
```

The operands of a form must constitute a valid list, which we can check using the previously defined `fromSExp`

function.

```
getOperands :: SExp -> InterpM [SExp]
= maybe (fail "Invalid form") return . fromSExp getOperands
```

Once we have the name of a function, we must also be able to call it. This is the job of `getFunction`

, which returns a Haskell function corresponding to a named Lisp function.

`getFunction :: String -> InterpM (VarTable -> [SExp] -> InterpM SExp)`

Before we give `getFunction`

a definition, let’s define `eval`

. We use `step`

to create a stepping point before any S-expression is evaluated, followed by matching on the structure of the S-expression.

```
= step ("Evaluating " ++ show sexp) $
eval vt sexp case sexp of
```

The simplest case is a number, which evaluates to itself.

`Number x -> return $ Number x `

Symbols are looked up in the variable table.

```
Symbol v -> case lookup v vt of
Nothing -> fail $ "Unknown variable: " ++ v
Just se -> return se
```

The only special form in our Lisp is `quote`

, which produces its single operand *without evaluating it*. Thus, `(quote a)`

evaluates to the symbol `a`

, and `(quote (a b c))`

to the list `(a b c)`

. This special form is how we write literal Lisp data.

```
Cons (Symbol "quote") cdr ->
case fromSExp cdr of
Just [x] -> return x
-> fail $ "Bad arguments to quote: " ++ show cdr _
```

If the operator is not `quote`

, then it must be the name of a function. We use `getFunction`

to get the corresponding Haskell function, evaluate the operands, then apply the function to the evaluated operands. We also create a stepping point, just for good measure.

```
Cons (Symbol operator) rest -> do
<- getFunction operator
operator' <- getOperands rest
args <- mapM (eval vt) args
args' "Applying " ++ operator ++ " to " ++ show (toSExp args')) $
step ( operator' vt args'
```

If none of the previous cases matched, then the S-expression must be malformed.

```
Cons car cdr ->
fail ("Bad form: " ++ show (Cons car cdr))
```

The definition of `getFunction`

is mostly uninteresting. This is where we put built-in functions. One of those is `list`

, which takes any number of arguments, and simply returns them.

`"list" = return $ const $ return . toSExp getFunction `

The `cons`

function takes two arguments, and returns a cons cell containing them.

```
"cons" = return $ \_ args ->
getFunction case args of
-> return $ Cons x y
[x, y] -> fail $ "Wrong number of arguments to cons: " ++ show args _
```

The functions `car`

and `cdr`

are used to access the components of a cons cell.

```
"car" = return $ \_ args ->
getFunction case args of
Cons car _] -> return car
[-> fail $ "Bad arguments to car: " ++ show args
_ "cdr" = return $ \_ args ->
getFunction case args of
Cons _ cdr] -> return cdr
[-> fail $ "Bad arguments to cdr: " ++ show args _
```

We also have the usual arithmetic operators on numbers.

```
"+" = return $ \_ args ->
getFunction case args of
Number x, Number y] -> return $ Number $ x+y
[-> fail $ "Invalid arguments to +: " ++ show args
_ "-" = return $ \_ args ->
getFunction case args of
Number x, Number y] -> return $ Number $ x-y
[-> fail $ "Invalid arguments to -: " ++ show args
_ "/" = return $ \_ args ->
getFunction case args of
Number x, Number y] -> return $ Number $ x `div` y
[-> fail $ "Invalid arguments to /: " ++ show args
_ "*" = return $ \_ args ->
getFunction case args of
Number x, Number y] -> return $ Number $ x*y
[-> fail $ "Invalid arguments to *: " ++ show args _
```

The most interesting built-in function is `apply`

. The `apply`

function takes two arguments: the first represents a function, the second a list of arguments, and `apply`

applies the function to those arguments. The simplest case is the one where the first argument is a symbol - we just call the function with that name.

```
"apply" = return $ \vt args ->
getFunction case args of
Symbol fname, fargs]
[| Just fargs' <- fromSExp fargs -> do
<- getFunction fname
f "Calling " ++ fname ++ " with arguments " ++ show fargs) $
step ( f vt fargs'
```

More interestingly, we also permit the function argument to be a *lambda form*. This is an S-expression with the structure `(lambda (params...) body)`

. When `apply`

is given a lambda form, it binds the named parameters to given arguments (they must match in number) and evaluates the body.

```
Cons (Symbol "lambda") rest, fargs]
[| Just fargs' <- fromSExp fargs,
Just [params, body] <- fromSExp rest,
Just params' <- mapM isSymbol =<< fromSExp params,
length params' == length fargs' -> do
"Calling lambda with parameters " ++ show params ++
step (" bound to " ++ show fargs) $
zip params' fargs' ++ vt) body
eval (-> fail $ "Invalid arguments to funcall: " ++ show args _
```

If `apply`

is given anything else, it reports an error.

`= fail ("Unknown function: " ++ f) getFunction f `

Apart from the calls to `step`

, the above Lisp interpreter is completely bog-standard. The single-stepping capability has not intruded upon the way we structured the functions. Yet, the capability is there, and we can use it to create an interesting frontend, where the user is prompted before every evaluation step.

```
stepIO :: SExp -> IO ()
= loop . eval mempty
stepIO where loop (Result v) = do
putStrLn $ "Evaluation finished. Result: " ++ show v
Error err) = do
loop (error err
Step desc s) = do
loop (putStrLn $ desc ++ " (press Enter to continue)"
<- getLine
_ loop s
```

This frontend could easily be made much more sophisticated - for example, it could allow the user to move backwards in time, or inspect the variable table (although this would require an extension of the monad).

## A Small Parser

To tie it all together, we define a quick little Lisp parser. Note that this one does not support the dotted-pair notation for conses.

```
token :: ReadP a -> ReadP a
= skipSpaces >> p
token p
schar :: Char -> ReadP Char
= token . char
schar
numberOrSymbol :: ReadP SExp
= token $ do s <- munch1 $ \c -> not (isSpace c || c `elem` "()")
numberOrSymbol return $ if all isDigit s then Number $ read s
else Symbol s
readSExp :: ReadP SExp
= numberOrSymbol
readSExp +++ between (schar '(') (schar ')') sexps
where sexps = many readSExp >>= return . toSExp
parseString :: String -> Either String SExp
=
parseString s case readP_to_S (do {e <- readSExp; token eof; return e}) s of
-> Right e
[(e, [])] -> Left "Parse error" _
```

## Command Line Usage

Finally, we create a `main`

function that allows us to run this very file as a Haskell program.

```
main :: IO ()
= do
main <- getArgs
args case args of
-> do s <- readFile f
[f] case parseString s of
Right sexp -> stepIO sexp
Left err -> error err
-> error "Needs an argument." _
```

While this Lisp dialect is quite limited (in particular, dynamic scoping is an anachronism), it is still more expressive than you may think. We can use lambda forms to simulate local variables and named functions. For example, if we put the following program in `double.lisp`

:

```
(apply (quote (lambda (square)
(apply square (list 2))))
(list (quote (lambda (x) (* x x)))))
```

Then we can run our interpreter on it as such:

```
$ runhaskell stepping.lhs double.lisp
Evaluating (apply (quote (lambda (square) (apply square (list 2)))) (list (quote (lambda (x) (* x x))))) (press Enter to continue)
Evaluating (quote (lambda (square) (apply square (list 2)))) (press Enter to continue)
Evaluating (list (quote (lambda (x) (* x x)))) (press Enter to continue)
Evaluating (quote (lambda (x) (* x x))) (press Enter to continue)
Applying list to ((lambda (x) (* x x))) (press Enter to continue)
Applying apply to ((lambda (square) (apply square (list 2))) ((lambda (x) (* x x)))) (press Enter to continue)
Calling lambda with parameters (square) bound to ((lambda (x) (* x x))) (press Enter to continue)
Evaluating (apply square (list 2)) (press Enter to continue)
Evaluating square (press Enter to continue)
Evaluating (list 2) (press Enter to continue)
Evaluating 2 (press Enter to continue)
Applying list to (2) (press Enter to continue)
Applying apply to ((lambda (x) (* x x)) (2)) (press Enter to continue)
Calling lambda with parameters (x) bound to (2) (press Enter to continue)
Evaluating (* x x) (press Enter to continue)
Evaluating x (press Enter to continue)
Evaluating x (press Enter to continue)
Applying * to (2 2) (press Enter to continue)
Evaluation finished. Result: 4
```

## Limitations and Extensions

The interpreter implemented here is a toy. The primary limitation is that the monad is entirely special-purpose. We do not re-use a standard error monad, nor do we use a `Reader`

for passing around the variable table. Futhermore, the stepping points give us too little information. One bit that would be nice to have would be the *evaluation depth*, which would allow us to create a command for saying “evaluate the current expression to completeness without prompting me about the intermediate steps”. This command would be implemented by always following `Step`

s (without prompting) until the starting depth is reached again. Such a facility is crucial for fast-forwarding to the interesting parts of the computation.