Abstract Interpretation of a Very Simple Language

This literate Haskell program demonstrates how to implement abstract interpretation, a form of nonstandard interpretation, for a very simple expression-oriented functional language. It is not a Haskell tutorial, and standard programming techniques will not be explained.

Syntax

We start by defining an abstract syntax for the language. For brevity, we will not have a parser, so all terms will be written out as their Haskell representation.

We will represent names as strings.

type Id = String

Our language is extremely simple - it is dynamically typed, the only types of values are booleans, integers, tuples, and functions, and we support only a handful of operations on them. We don't even have value bindings (but we can simulate them with functions), but we do have a limited form of for-loops. We will describe their semantics when we define an evaluation function.

data Exp
  = CstInt Int
  | CstBool Bool
  | Add Exp Exp
  | Sub Exp Exp
  | Mul Exp Exp
  | Eql Exp Exp
  | If Exp Exp Exp
  | Tuple [Exp]
  | Project Exp Int
  | Var Id
  | Lambda Id Exp
  | Apply Exp Exp
  | ForLoop (Id, Exp) (Id, Exp) Exp
  deriving (Eq, Ord, Show)

We next define utilities for maintaining sets of names. We maintain the invariant that these sets are sorted, which means we can easily compare them for equality.

newtype Ids = Ids [Id]
  deriving (Eq, Show)

merge :: Ord a => [a] -> [a] -> [a]
merge [] [] = []
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys)
  | x < y  = x : merge xs (y:ys)
  | x > y  = y : merge (x:xs) (y:ys)
  | otherwise = x : merge xs ys

instance Semigroup Ids where
  Ids x <> Ids y = Ids $ merge x y

instance Monoid Ids where
  mempty = Ids mempty

idsSingle :: Id -> Ids
idsSingle v = Ids [v]

idsWithout :: Ids -> Id -> Ids
idsWithout (Ids xs) x = Ids $ filter (/=x) xs

unIds :: Ids -> [Id]
unIds (Ids xs) = xs

This is used to define a function for computing the free names of an expression. Pay particular attention to the case for ForLoop, as this defines its scoping rules.

freeIds :: Exp -> Ids
freeIds (CstInt _) = mempty
freeIds (CstBool _) = mempty
freeIds (Add x y) = freeIds x <> freeIds y
freeIds (Sub x y) = freeIds x <> freeIds y
freeIds (Mul x y) = freeIds x <> freeIds y
freeIds (Eql x y) = freeIds x <> freeIds y
freeIds (If x y z) = freeIds x <> freeIds y <> freeIds z
freeIds (Tuple xs) = foldMap freeIds xs
freeIds (Project x _) = freeIds x
freeIds (Var x) = idsSingle x
freeIds (Lambda p x) = freeIds x `idsWithout` p
freeIds (Apply x y) = freeIds x <> freeIds y
freeIds (ForLoop (p,e) (i,ie) body) =
  freeIds e <> freeIds ie <> (freeIds body `idsWithout` p `idsWithout` i)

Environments

Evaluation takes place in a conventional environment mapping variable names to values. An environment is then merely a list of pairs of names and associated information. We define a polymorphic type to represent this.

type Env a = [(Id, a)]

In a real implementation you would use a more efficient representation, but they will be semantically equivalent. We then define some functions for inserting or looking up information in an environment.

envEmpty :: Env a
envEmpty = []

envExtend :: Id -> a -> Env a -> Env a
envExtend v val env = (v, val) : filter ((/=v).fst) env

envLookup :: Id -> Env a -> a
envLookup v env = case lookup v env of
                    Nothing -> error $ "Unknown variable: " <> v
                    Just x -> x

Although evaluation will always associate names with Vals, the polymorphic definition of Env will come in handy later. But for plain evaluation, we need a value representation, and then we can define an evaluation environment.

data Val
  = ValInt Int
  | ValBool Bool
  | ValTuple [Val]
  | ValFun (Env Val) Id Exp
  deriving (Eq, Ord, Show)

type EvalEnv = Env Val

Evaluation

Now we are ready to define evaluation rules. Evaluation is done by the function eval, which accepts an environment and an expression, and produces a value.

eval :: EvalEnv -> Exp -> Val

The cases are straightforward and will not be remarked upon further - this program is about explaining nonstandard interpretation, and this is standard interpretation. We are simplifying a lot to avoid worrying about error handling and what exactly it means for two functions to be equal.

eval _env (CstInt x) = ValInt x
eval _env (CstBool x) = ValBool x
eval env (Var v) = envLookup v env
eval env (Add x y) = case (eval env x, eval env y) of
                       (ValInt x', ValInt y') -> ValInt $ x' + y'
                       _ -> error "Invalid operands"
eval env (Sub x y) = case (eval env x, eval env y) of
                       (ValInt x', ValInt y') -> ValInt $ x' - y'
                       _ -> error "Invalid operands"
eval env (Mul x y) = case (eval env x, eval env y) of
                       (ValInt x', ValInt y') -> ValInt $ x' * y'
                       _ -> error "Invalid operands"
eval env (Eql x y) = ValBool $ eval env x == eval env y
eval env (If c x y) = case eval env c of
                        ValBool True -> eval env x
                        ValBool False -> eval env y
                        _ -> error "Invalid condition"
eval env (Tuple es) = ValTuple $ map (eval env) es
eval env (Project e i) = case eval env e of
                           ValTuple vs -> vs !! i
                           _ -> error "Cannot project non-tuple"
eval env (Lambda p e) = ValFun env p e
eval env (Apply x y) =
  case eval env x of
    ValFun x_env x_p x_e ->
      eval (envExtend x_p (eval env y) x_env) x_e
    _ -> error "Invalid function"

Loops are the only somewhat unusual construct. They model a particular kind of tail recursive function. Specifically, an expression

    ForLoop (p,pe) (i,ie) body

rcorresponds to the pseudocode

p = pe
for i < ie:
  p = body
return p
eval env (ForLoop (p,pe) (i,ie) body) =
  case eval env ie of
    ValInt n -> loop (eval env pe) (0, n)
    _ -> error "Invalid loop bound"
  where loop acc (j, n)
          | j < n = loop (eval (envExtend i (ValInt j) $
                                envExtend p acc env)
                               body)
                         (j+1, n)
          | otherwise = acc

Here an example of evaluating an expression that computes 10!:

> eval envEmpty (ForLoop ("acc", CstInt 1) ("i", CstInt 10)
>                  (Mul (Var "acc") (Add (Var "i") (CstInt 1))))
ValInt 3628800

While this simple language does not have a notion of "top level functions", we can pretend it does by having expressions that make use of free variables, and then evaluate them in environments where those variables are defined. This approach can also let us analyse expressions without evaluating them. After all, that is the core idea of static analysis: saying something about a program (or expression) without running it.

Data dependencies

To start with, let us evaluate an analysis that tells us something about the data dependencies of the result of evaluating an expression. This will essentially be an evaluation function, but instead of associating variables (and evaluations) producing values, they will produce dependency values, which in the extreme case are just sets of identifiers.

type Deps = Ids

A dependency value is similar Val, but represents only the static information we know of a value.

data DepVal
  = DepVal Deps
  | DepTuple [DepVal]
  deriving (Eq, Show)

In some cases, we may know that a value is a tuple (and then we store individual dependency information for the elements), but we can always fall back to the generic case DepVal.

depValDeps :: DepVal -> Deps
depValDeps (DepVal x) = x
depValDeps (DepTuple x) = foldMap depValDeps x

In fact, DepVal forms a lattice, with DepVal as the top element. We can think of depValJoin x y as expressing the case where we know a value will be either x or y.

depValJoin :: DepVal -> DepVal -> DepVal
depValJoin (DepTuple xs) (DepTuple ys)
  | length xs == length ys = DepTuple $ zipWith depValJoin xs ys
depValJoin x y = DepVal $ depValDeps x <> depValDeps y

We also add a function for injecting additional dependencies into all the DepVal leaves. This is useful when we want to add extra dependencies, but we also want to maintain knowledge about the value structure. We will see this used in the case for If.

depValInj :: Deps -> DepVal -> DepVal
depValInj x (DepVal y) = DepVal $ x <> y
depValInj x (DepTuple ys) = DepTuple $ map (depValInj x) ys

A lot of this nonstandard interpretation is just joining the dependencies of the subexpressions.

type DepsEnv = Env DepVal

deps :: DepsEnv -> Exp -> DepVal
deps _env (CstInt _) = DepVal mempty
deps _env (CstBool _) = DepVal mempty
deps env (Var v) = envLookup v env
deps env (Add x y) = deps env x `depValJoin` deps env y
deps env (Sub x y) = deps env x `depValJoin` deps env y
deps env (Mul x y) = deps env x `depValJoin` deps env y
deps env (Eql x y) = deps env x `depValJoin` deps env y

The first interesting case is tuple construction and projection. This is where we actually exploit the fact that we may have information about the structure of the value.

deps env (Tuple xs) = DepTuple $ map (deps env) xs
deps env (Project e i) =
  case deps env e of
    DepTuple x | i < length x -> x!!i
    x -> x

For If, we make use of depValInj. This is because we know that if the program actually terminates, then the condition will be a boolean, and hence not have any structure information.

deps env (If c x y) =
  depValDeps (deps env c) `depValInj`
    (deps env x `depValJoin` deps env y)

First class functions are notoriously tricky to handle in a static analysis, so we simply assume that the dependencies are all the free variables.

deps _env (Lambda p body) = DepVal $ freeIds $ Lambda p body
deps env (Apply x y) = deps env x `depValJoin` deps env y

The most interesting case is the one for ForLoop. Here we perform a fixed point iteration until the dependencies stop growing. We know this will terminate because the dependency set taken from the universe of variable names in the program, which is finite. (Actually termination is not quite true in our instance, but it would be if we knew that our programs were conventionally type-correct - try and see if you can figure out the edge case.)

deps env (ForLoop (p,pe) (i,ie) body) =
  depValDeps (deps env ie) `depValInj` loop (deps env pe)
  where loop loop_deps =
          let env' = envExtend i (DepVal mempty) $
                     envExtend p loop_deps env
              loop_deps' = deps env' body
          in if loop_deps' == loop_deps
              then loop_deps
              else loop $ loop_deps `depValJoin` loop_deps'

We define a utility function for constructing a 'DepsEnv' corresponding to the free variables of some given expression.

freeDeps :: Exp -> DepsEnv
freeDeps e = map (\v -> (v, DepVal $ idsSingle v)) (unIds (freeIds e))

And now we can easily apply our dependency analysis to a variety of functions.

fun1 :: Exp
fun1 = If (Var "c") (Tuple [Add (Var "x") (Var "y"), Var "z"])
                    (Tuple [Sub (Var "x") (Var "y"), Var "z"])
> deps (freeDeps fun1) fun1
DepTuple [DepVal (Ids ["c","x","y"]),DepVal (Ids ["c","z"])]

And here is one where a loop rotates the elements around a tuple. Note how the final result depends on all parameters - that is due to the fixed point iteration.

fun2 :: Exp
fun2 = Project
         (ForLoop
          ("p", Tuple [Var "x0",Var "x1",Var "x2",Var "x3",Var "x4"])
          ("i", Var "bound")
          (Tuple [Project (Var "p") 1,
                  Project (Var "p") 2,
                  Project (Var "p") 3,
                  Project (Var "p") 4,
                  Project (Var "p") 0]))
         0
> deps (freeDeps fun2) fun2
DepVal (Ids ["bound","x0","x1","x2","x3","x4"])

One downside of our approach is that it does not tell us anything about inner dependencies, but this can be solved by storing information from the intermediate environments, perhaps by using an appropriate state monad.