Writing a Single-Stepping Interpreter with Monads

One nice thing about Haskell monads - perhaps the nicest - is how they permit separation of concerns. You can write code in a straightforward way, without knowing that underneath it all, all kinds of implicit control flow may be happening. Usually, when we use monads in Haskell, we use the standard ones: Maybe, Either, lists, and so on. When we design our programs, use tend to use compositions of Reader, Writer, State, and probably IO somewhere. This is fine. Going overboard with odd monadic effects is probably not good for readability. But, sometimes it is important to remember that defining your own idiosyncratic monad can allow an elegant implementation of something seemingly very complicated. In this Literate Haskell program, we will see how to implement an single-stepping interpreter for a very simple Lisp dialect, structured in a monadic style. In fact, the single-stepping support is going to be a minuscule part of the overall code. Please note that when we write “Lisp”, we mean a highly cut-down and simplified language that bears only a superficial resemblance to modern industrial Lisp dialects.

This is not a monad tutorial. This program will not be comprehensible unless you already know how to work with monads in Haskell. All I am trying to do is demonstrate a technique that I find cool.

First, we need the usual module boilerplate. Three of the four imports are for the small command line interface we will add near the end.

> module Main (main) where
> 
> import Control.Monad (ap, liftM)
> import Data.Char (isSpace, isDigit)
> import System.Environment (getArgs)
> import Text.ParserCombinators.ReadP

S-Expressions

In Lisp, the same structure is used for representing both code and data values: the so-called S-expression. An S-expression is either an atom or a cons cell (or just cons). A cons is just a pair of two other values, called car and cdr for ancient and arcane reasons. An atom is either a symbol or a number. We encapsulate this as one data type:

> data SExp = Cons SExp SExp | Symbol String | Number Int

We print a cons as (a . b), where a and b are its constituents. There are two special cases:

By convention, the empty list is represented as the symbol nil. These conventions let us print the common case of a linked list as (a b c) instead of (a . (b . (c . nil))). We implement the printing as an instance of the Show typeclass - I am not normally a fan of using this for human-readable information, but I do not wish to include a prettyprinting library for this program.

> instance Show SExp where
>  show (Symbol s) = s
>  show (Number x) = show x
>  show (Cons a b) = "(" ++ show a ++ recurse b ++ ")"
>    where recurse (Symbol "nil")          = ""
>          recurse (Cons c d)              = " " ++ show c ++ recurse d
>          recurse x                       = " . " ++ show x

We can turn any Haskell list of S-expressions into an S-expression that represents that same list, using the cars to contain the head, and the cdr the link to the tail of the list. We use the symbol nil to represent the empty list.

> toSExp :: [SExp] -> SExp
> toSExp []     = Symbol "nil"
> toSExp (x:xs) = Cons x (toSExp xs)

We can also try to convert a SExp to a Haskell list of SExps, although this is not guaranteed to work.

> fromSExp :: SExp -> Maybe [SExp]
> fromSExp (Symbol "nil") = Just []
> fromSExp (Cons car cdr) = do cdr' <- fromSExp cdr
>                              Just $ car : cdr'
> fromSExp _              = Nothing

The Monad

Before we write the evaluation functions, we’ll define the InterpM monad we will be using. Computation can be in three different states: either we have a value (Result), we have encountered an error during evaluation (Error), or we are at a stepping point (Step). This is a labeled breakpoint in execution where we have the option of continuing execution by executing the embedded monadic value. Or we can stop. Or we can save the state for later. This is very similar to continuation-passing style (and indeed, the InterpM monad is just a special case of a continuation monad).

> data InterpM a = Result a
>                | Error String
>                | Step String (InterpM a)

The Monad instance is completely straightforward by the types.

> instance Monad InterpM where
>   return = Result
>   fail = Error
> 
>   Result x       >>= f = f x
>   Error s        >>= _ = Error s
>   Step desc path >>= f = Step desc (path >>= f)

The Functor and Applicative instances, which are required in GHC versions newer than 7.8, are completely mechanical.

> instance Functor InterpM where
>   fmap = liftM
> 
> instance Applicative InterpM where
>   pure = return
>   f <*> x = f `ap` x

We wrap the Step constructor in a function to create a veneer of abstraction. In principle, the monad could be much more complicated (it would be in most practical implementations), but with a similarly simple interface.

> step :: String -> InterpM a -> InterpM a
> step = Step

The Interpreter

We are now ready to implement our Lisp interpreter. First, we define our variable table, which is a mapping from variable names to their values. Values are, of course, just S-expressions.

> type VarTable = [(String,SExp)]

Evaluation with the eval function happens in the context of a variable table, and evaluates an S-expression, giving back an InterpM action. Thus, we might get a result (if evaluation finished), an error, or a stepping point, where we can choose to continue execution.

> eval :: VarTable -> SExp -> InterpM SExp

We will need a few building blocks before we can define the eval function itself. And most importantly, we must understand how Lisp is evaluated. A Lisp form is a list (x y z...). The first element of the list is called the operator, and the remaining elements the operands. In the general case, the operator is the name of a function, and the operands are recursively evaluated, with the operator applied to the results. However, some forms are special forms, which have specialised evaluation semantics. Our Lisp dialect has a few of those, which we will discuss when we get to them.

The most basic evaluation rule is that the operator of a form must be a symbol, so we define a helper function for determining whether an S-expression is a symbol.

> isSymbol :: SExp -> Maybe String
> isSymbol (Symbol v) = Just v
> isSymbol _          = Nothing

The operands of a form must constitute a valid list, which we can check using the previously defined fromSExp function.

> getOperands :: SExp -> InterpM [SExp]
> getOperands = maybe (fail "Invalid form") return . fromSExp

Once we have the name of a function, we must also be able to call it. This is the job of getFunction, which returns a Haskell function corresponding to a named Lisp function.

> getFunction :: String -> InterpM (VarTable -> [SExp] -> InterpM SExp)

Before we give getFunction a definition, let’s define eval. We use step to create a stepping point before any S-expression is evaluated, followed by matching on the structure of the S-expression.

> eval vt sexp = step ("Evaluating " ++ show sexp) $
>   case sexp of

The simplest case is a number, which evaluates to itself.

>     Number x -> return $ Number x

Symbols are looked up in the variable table.

>     Symbol v -> case lookup v vt of
>                     Nothing -> fail $ "Unknown variable: " ++ v
>                     Just se -> return se

The only special form in our Lisp is quote, which produces its single operand without evaluating it. Thus, (quote a) evaluates to the symbol a, and (quote (a b c)) to the list (a b c). This special form is how we write literal Lisp data.

>     Cons (Symbol "quote") cdr ->
>       case fromSExp cdr of
>         Just [x] -> return x
>         _        -> fail $ "Bad arguments to quote: " ++ show cdr

If the operator is not quote, then it must be the name of a function. We use getFunction to get the corresponding Haskell function, evaluate the operands, then apply the function to the evaluated operands. We also create a stepping point, just for good measure.

>     Cons (Symbol operator) rest -> do
>       operator' <- getFunction operator
>       args <- getOperands rest
>       args' <- mapM (eval vt) args
>       step ("Applying " ++ operator ++ " to " ++ show (toSExp args')) $
>         operator' vt args'

If none of the previous cases matched, then the S-expression must be malformed.

>     Cons car cdr ->
>      fail ("Bad form: " ++ show (Cons car cdr))

The definition of getFunction is mostly uninteresting. This is where we put built-in functions. One of those is list, which takes any number of arguments, and simply returns them.

> getFunction "list" = return $ const $ return . toSExp

The cons function takes two arguments, and returns a cons cell containing them.

> getFunction "cons" = return $ \_ args ->
>   case args of
>     [x, y] -> return $ Cons x y
>     _      -> fail $ "Wrong number of arguments to cons: " ++ show args

The functions car and cdr are used to access the components of a cons cell.

> getFunction "car" = return $ \_ args ->
>   case args of
>     [Cons car _] -> return car
>     _            -> fail $ "Bad arguments to car: " ++ show args
> getFunction "cdr" = return $ \_ args ->
>   case args of
>     [Cons _ cdr] -> return cdr
>     _            -> fail $ "Bad arguments to cdr: " ++ show args

We also have the usual arithmetic operators on numbers.

> getFunction "+" = return $ \_ args ->
>   case args of
>     [Number x, Number y] -> return $ Number $ x+y
>     _                    -> fail $ "Invalid arguments to +: " ++ show args
> getFunction "-" = return $ \_ args ->
>   case args of
>     [Number x, Number y] -> return $ Number $ x-y
>     _                    -> fail $ "Invalid arguments to -: " ++ show args
> getFunction "/" = return $ \_ args ->
>   case args of
>     [Number x, Number y] -> return $ Number $ x `div` y
>     _                    -> fail $ "Invalid arguments to /: " ++ show args
> getFunction "*" = return $ \_ args ->
>   case args of
>     [Number x, Number y] -> return $ Number $ x*y
>     _                    -> fail $ "Invalid arguments to *: " ++ show args

The most interesting built-in function is apply. The apply function takes two arguments: the first represents a function, the second a list of arguments, and apply applies the function to those arguments. The simplest case is the one where the first argument is a symbol - we just call the function with that name.

> getFunction "apply" = return $ \vt args ->
>   case args of
>     [Symbol fname, fargs]
>       | Just fargs' <- fromSExp fargs -> do
>           f <- getFunction fname
>           step ("Calling " ++ fname ++ " with arguments " ++ show fargs) $
>             f vt fargs'

More interestingly, we also permit the function argument to be a lambda form. This is an S-expression with the structure (lambda (params...) body). When apply is given a lambda form, it binds the named parameters to given arguments (they must match in number) and evaluates the body.

>     [Cons (Symbol "lambda") rest, fargs]
>       | Just fargs' <- fromSExp fargs,
>         Just [params, body] <- fromSExp rest,
>         Just params' <- mapM isSymbol =<< fromSExp params,
>         length params' == length fargs' -> do
>        step ("Calling lambda with parameters " ++ show params ++
>              " bound to " ++ show fargs) $
>          eval (zip params' fargs' ++ vt) body
>     _ -> fail $ "Invalid arguments to funcall: " ++ show args

If apply is given anything else, it reports an error.

> getFunction f = fail ("Unknown function: " ++ f)

Apart from the calls to step, the above Lisp interpreter is completely bog-standard. The single-stepping capability has not intruded upon the way we structured the functions. Yet, the capability is there, and we can use it to create an interesting frontend, where the user is prompted before every evaluation step.

> stepIO :: SExp -> IO ()
> stepIO = loop . eval mempty
>   where loop (Result v) = do
>           putStrLn $ "Evaluation finished.  Result: " ++ show v
>         loop (Error err) = do
>           error err
>         loop (Step desc s) = do
>           putStrLn $ desc ++ " (press Enter to continue)"
>           _ <- getLine
>           loop s

This frontend could easily be made much more sophisticated - for example, it could allow the user to move backwards in time, or inspect the variable table (although this would require an extension of the monad).

A Small Parser

To tie it all together, we define a quick little Lisp parser. Note that this one does not support the dotted-pair notation for conses.

> token :: ReadP a -> ReadP a
> token p = skipSpaces >> p
> 
> schar :: Char -> ReadP Char
> schar = token . char
> 
> numberOrSymbol :: ReadP SExp
> numberOrSymbol = token $ do s <- munch1 $ \c -> not (isSpace c || c `elem` "()")
>                             return $ if all isDigit s then Number $ read s
>                                      else Symbol s
> readSExp :: ReadP SExp
> readSExp = numberOrSymbol
>        +++ between (schar '(') (schar ')') sexps
>   where sexps = many readSExp >>= return . toSExp
> 
> parseString :: String -> Either String SExp
> parseString s =
>   case readP_to_S (do {e <- readSExp; token eof; return e}) s of
>       [(e, [])] -> Right e
>       _         -> Left "Parse error"

Command Line Usage

Finally, we create a main function that allows us to run this very file as a Haskell program.

> main :: IO ()
> main = do
>   args <- getArgs
>   case args of
>     [f] -> do s <- readFile f
>               case parseString s of
>                 Right sexp -> stepIO sexp
>                 Left err   -> error err
>     _ -> error "Needs an argument."

While this Lisp dialect is quite limited (in particular, dynamic scoping is an anachronism), it is still more expressive than you may think. We can use lambda forms to simulate local variables and named functions. For example, if we put the following program in double.lisp:

(apply (quote (lambda (square)
                 (apply square (list 2))))
 (list (quote (lambda (x) (* x x)))))

Then we can run our interpreter on it as such:

$ runhaskell stepping.lhs double.lisp
Evaluating (apply (quote (lambda (square) (apply square (list 2)))) (list (quote (lambda (x) (* x x))))) (press Enter to continue)

Evaluating (quote (lambda (square) (apply square (list 2)))) (press Enter to continue)

Evaluating (list (quote (lambda (x) (* x x)))) (press Enter to continue)

Evaluating (quote (lambda (x) (* x x))) (press Enter to continue)

Applying list to ((lambda (x) (* x x))) (press Enter to continue)

Applying apply to ((lambda (square) (apply square (list 2))) ((lambda (x) (* x x)))) (press Enter to continue)

Calling lambda with parameters (square) bound to ((lambda (x) (* x x))) (press Enter to continue)

Evaluating (apply square (list 2)) (press Enter to continue)

Evaluating square (press Enter to continue)

Evaluating (list 2) (press Enter to continue)

Evaluating 2 (press Enter to continue)

Applying list to (2) (press Enter to continue)

Applying apply to ((lambda (x) (* x x)) (2)) (press Enter to continue)

Calling lambda with parameters (x) bound to (2) (press Enter to continue)

Evaluating (* x x) (press Enter to continue)

Evaluating x (press Enter to continue)

Evaluating x (press Enter to continue)

Applying * to (2 2) (press Enter to continue)

Evaluation finished.  Result: 4

Limitations and Extensions

The interpreter implemented here is a toy. The primary limitation is that the monad is entirely special-purpose. We do not re-use a standard error monad, nor do we use a Reader for passing around the variable table. Futhermore, the stepping points give us too little information. One bit that would be nice to have would be the evaluation depth, which would allow us to create a command for saying “evaluate the current expression to completeness without prompting me about the intermediate steps”. This command would be implemented by always following Steps (without prompting) until the starting depth is reached again. Such a facility is crucial for fast-forwarding to the interesting parts of the computation.