Abstract Interpretation of a Very Simple Language
This literate Haskell program demonstrates how to implement abstract interpretation, a form of nonstandard interpretation, for a very simple expression-oriented functional language. It is not a Haskell tutorial, and standard programming techniques will not be explained.
Syntax
We start by defining an abstract syntax for the language. For brevity, we will not have a parser, so all terms will be written out as their Haskell representation.
We will represent names as strings.
type Id = String
Our language is extremely simple - it is dynamically typed, the only
types of values are booleans, integers, tuples, and functions, and we
support only a handful of operations on them. We don't even have value
bindings (but we can simulate them with functions), but we do have a
limited form of for
-loops. We will describe their semantics when we
define an evaluation function.
data Exp
= CstInt Int
| CstBool Bool
| Add Exp Exp
| Sub Exp Exp
| Mul Exp Exp
| Eql Exp Exp
| If Exp Exp Exp
| Tuple [Exp]
| Project Exp Int
| Var Id
| Lambda Id Exp
| Apply Exp Exp
| ForLoop (Id, Exp) (Id, Exp) Exp
deriving (Eq, Ord, Show)
We next define utilities for maintaining sets of names. We maintain the invariant that these sets are sorted, which means we can easily compare them for equality.
newtype Ids = Ids [Id]
deriving (Eq, Show)
merge :: Ord a => [a] -> [a] -> [a]
= []
merge [] [] = xs
merge xs [] = ys
merge [] ys :xs) (y:ys)
merge (x| x < y = x : merge xs (y:ys)
| x > y = y : merge (x:xs) (y:ys)
| otherwise = x : merge xs ys
instance Semigroup Ids where
Ids x <> Ids y = Ids $ merge x y
instance Monoid Ids where
mempty = Ids mempty
idsSingle :: Id -> Ids
= Ids [v]
idsSingle v
idsWithout :: Ids -> Id -> Ids
Ids xs) x = Ids $ filter (/=x) xs
idsWithout (
unIds :: Ids -> [Id]
Ids xs) = xs unIds (
This is used to define a function for computing the free names of an
expression. Pay particular attention to the case for ForLoop
, as
this defines its scoping rules.
freeIds :: Exp -> Ids
CstInt _) = mempty
freeIds (CstBool _) = mempty
freeIds (Add x y) = freeIds x <> freeIds y
freeIds (Sub x y) = freeIds x <> freeIds y
freeIds (Mul x y) = freeIds x <> freeIds y
freeIds (Eql x y) = freeIds x <> freeIds y
freeIds (If x y z) = freeIds x <> freeIds y <> freeIds z
freeIds (Tuple xs) = foldMap freeIds xs
freeIds (Project x _) = freeIds x
freeIds (Var x) = idsSingle x
freeIds (Lambda p x) = freeIds x `idsWithout` p
freeIds (Apply x y) = freeIds x <> freeIds y
freeIds (ForLoop (p,e) (i,ie) body) =
freeIds (<> freeIds ie <> (freeIds body `idsWithout` p `idsWithout` i) freeIds e
Environments
Evaluation takes place in a conventional environment mapping variable names to values. An environment is then merely a list of pairs of names and associated information. We define a polymorphic type to represent this.
type Env a = [(Id, a)]
In a real implementation you would use a more efficient representation, but they will be semantically equivalent. We then define some functions for inserting or looking up information in an environment.
envEmpty :: Env a
= []
envEmpty
envExtend :: Id -> a -> Env a -> Env a
= (v, val) : filter ((/=v).fst) env
envExtend v val env
envLookup :: Id -> Env a -> a
= case lookup v env of
envLookup v env Nothing -> error $ "Unknown variable: " <> v
Just x -> x
Although evaluation will always associate names with Val
s, the
polymorphic definition of Env
will come in handy later. But for
plain evaluation, we need a value representation, and then we can
define an evaluation environment.
data Val
= ValInt Int
| ValBool Bool
| ValTuple [Val]
| ValFun (Env Val) Id Exp
deriving (Eq, Ord, Show)
type EvalEnv = Env Val
Evaluation
Now we are ready to define evaluation rules. Evaluation is done by the
function eval
, which accepts an environment and an expression, and
produces a value.
eval :: EvalEnv -> Exp -> Val
The cases are straightforward and will not be remarked upon further - this program is about explaining nonstandard interpretation, and this is standard interpretation. We are simplifying a lot to avoid worrying about error handling and what exactly it means for two functions to be equal.
CstInt x) = ValInt x
eval _env (CstBool x) = ValBool x
eval _env (Var v) = envLookup v env
eval env (Add x y) = case (eval env x, eval env y) of
eval env (ValInt x', ValInt y') -> ValInt $ x' + y'
(-> error "Invalid operands"
_ Sub x y) = case (eval env x, eval env y) of
eval env (ValInt x', ValInt y') -> ValInt $ x' - y'
(-> error "Invalid operands"
_ Mul x y) = case (eval env x, eval env y) of
eval env (ValInt x', ValInt y') -> ValInt $ x' * y'
(-> error "Invalid operands"
_ Eql x y) = ValBool $ eval env x == eval env y
eval env (If c x y) = case eval env c of
eval env (ValBool True -> eval env x
ValBool False -> eval env y
-> error "Invalid condition"
_ Tuple es) = ValTuple $ map (eval env) es
eval env (Project e i) = case eval env e of
eval env (ValTuple vs -> vs !! i
-> error "Cannot project non-tuple"
_ Lambda p e) = ValFun env p e
eval env (Apply x y) =
eval env (case eval env x of
ValFun x_env x_p x_e ->
eval (envExtend x_p (eval env y) x_env) x_e-> error "Invalid function" _
Loops are the only somewhat unusual construct. They model a particular kind of tail recursive function. Specifically, an expression
ForLoop (p,pe) (i,ie) body
rcorresponds to the pseudocode
p = pe
for i < ie:
p = body
return p
ForLoop (p,pe) (i,ie) body) =
eval env (case eval env ie of
ValInt n -> loop (eval env pe) (0, n)
-> error "Invalid loop bound"
_ where loop acc (j, n)
| j < n = loop (eval (envExtend i (ValInt j) $
envExtend p acc env)
body)+1, n)
(j| otherwise = acc
Here an example of evaluating an expression that computes 10!:
> eval envEmpty (ForLoop ("acc", CstInt 1) ("i", CstInt 10)
> (Mul (Var "acc") (Add (Var "i") (CstInt 1))))
ValInt 3628800
While this simple language does not have a notion of "top level functions", we can pretend it does by having expressions that make use of free variables, and then evaluate them in environments where those variables are defined. This approach can also let us analyse expressions without evaluating them. After all, that is the core idea of static analysis: saying something about a program (or expression) without running it.
Data dependencies
To start with, let us evaluate an analysis that tells us something about the data dependencies of the result of evaluating an expression. This will essentially be an evaluation function, but instead of associating variables (and evaluations) producing values, they will produce dependency values, which in the extreme case are just sets of identifiers.
type Deps = Ids
A dependency value is similar Val
, but represents only the static
information we know of a value.
data DepVal
= DepVal Deps
| DepTuple [DepVal]
deriving (Eq, Show)
In some cases, we may know that a value is a tuple (and then we store
individual dependency information for the elements), but we can always
fall back to the generic case DepVal
.
depValDeps :: DepVal -> Deps
DepVal x) = x
depValDeps (DepTuple x) = foldMap depValDeps x depValDeps (
In fact, DepVal
forms a lattice, with DepVal
as the top element.
We can think of depValJoin x y
as expressing the case where we know
a value will be either x
or y
.
depValJoin :: DepVal -> DepVal -> DepVal
DepTuple xs) (DepTuple ys)
depValJoin (| length xs == length ys = DepTuple $ zipWith depValJoin xs ys
= DepVal $ depValDeps x <> depValDeps y depValJoin x y
We also add a function for injecting additional dependencies into all
the DepVal
leaves. This is useful when we want to add extra
dependencies, but we also want to maintain knowledge about the value
structure. We will see this used in the case for If
.
depValInj :: Deps -> DepVal -> DepVal
DepVal y) = DepVal $ x <> y
depValInj x (DepTuple ys) = DepTuple $ map (depValInj x) ys depValInj x (
A lot of this nonstandard interpretation is just joining the dependencies of the subexpressions.
type DepsEnv = Env DepVal
deps :: DepsEnv -> Exp -> DepVal
CstInt _) = DepVal mempty
deps _env (CstBool _) = DepVal mempty
deps _env (Var v) = envLookup v env
deps env (Add x y) = deps env x `depValJoin` deps env y
deps env (Sub x y) = deps env x `depValJoin` deps env y
deps env (Mul x y) = deps env x `depValJoin` deps env y
deps env (Eql x y) = deps env x `depValJoin` deps env y deps env (
The first interesting case is tuple construction and projection. This is where we actually exploit the fact that we may have information about the structure of the value.
Tuple xs) = DepTuple $ map (deps env) xs
deps env (Project e i) =
deps env (case deps env e of
DepTuple x | i < length x -> x!!i
-> x x
For If
, we make use of depValInj
. This is because we know that if
the program actually terminates, then the condition will be a boolean,
and hence not have any structure information.
If c x y) =
deps env (`depValInj`
depValDeps (deps env c) `depValJoin` deps env y) (deps env x
First class functions are notoriously tricky to handle in a static analysis, so we simply assume that the dependencies are all the free variables.
Lambda p body) = DepVal $ freeIds $ Lambda p body
deps _env (Apply x y) = deps env x `depValJoin` deps env y deps env (
The most interesting case is the one for ForLoop
. Here we perform a
fixed point iteration until the dependencies stop growing. We know
this will terminate because the dependency set taken from the universe
of variable names in the program, which is finite. (Actually
termination is not quite true in our instance, but it would be if we
knew that our programs were conventionally type-correct - try and see
if you can figure out the edge case.)
ForLoop (p,pe) (i,ie) body) =
deps env (`depValInj` loop (deps env pe)
depValDeps (deps env ie) where loop loop_deps =
let env' = envExtend i (DepVal mempty) $
envExtend p loop_deps env= deps env' body
loop_deps' in if loop_deps' == loop_deps
then loop_deps
else loop $ loop_deps `depValJoin` loop_deps'
We define a utility function for constructing a 'DepsEnv' corresponding to the free variables of some given expression.
freeDeps :: Exp -> DepsEnv
= map (\v -> (v, DepVal $ idsSingle v)) (unIds (freeIds e)) freeDeps e
And now we can easily apply our dependency analysis to a variety of functions.
fun1 :: Exp
= If (Var "c") (Tuple [Add (Var "x") (Var "y"), Var "z"])
fun1 Tuple [Sub (Var "x") (Var "y"), Var "z"]) (
> deps (freeDeps fun1) fun1
DepTuple [DepVal (Ids ["c","x","y"]),DepVal (Ids ["c","z"])]
And here is one where a loop rotates the elements around a tuple. Note how the final result depends on all parameters - that is due to the fixed point iteration.
fun2 :: Exp
= Project
fun2 ForLoop
("p", Tuple [Var "x0",Var "x1",Var "x2",Var "x3",Var "x4"])
("i", Var "bound")
(Tuple [Project (Var "p") 1,
(Project (Var "p") 2,
Project (Var "p") 3,
Project (Var "p") 4,
Project (Var "p") 0]))
0
> deps (freeDeps fun2) fun2
DepVal (Ids ["bound","x0","x1","x2","x3","x4"])
One downside of our approach is that it does not tell us anything about inner dependencies, but this can be solved by storing information from the intermediate environments, perhaps by using an appropriate state monad.