If you wish, this assignment may be completed by groups of two. If working in a group, only one partner should submit to Gradescope, and should choose the other partner as a "Team Member" on Gradescope when submitting.
To submit your code, first commit and push your repository to GitHub. To see the output of the auto-grader tests, click 'Pull Requests' and select Feedback. You can see the output of the tests under the Checks tab.
When you have completed the assignment, submit your repo to gradescope by selecting your repository and the branch containing your completed assignment. Unless you created your own branches, the branch is called 'main'. You can access Gradescope through the Canvas assignment page.
The overall objective of this assignment is to better understand the notions of type checking, type inference, and polymorphism by implementing a type inference algorithm for Nano.
The assignment is in the file:
and
- tests/Test.hs has some sample tests, and testing code that you will use to check your assignments before submitting.
You should only need to modify the parts of the files which say:
error "TBD: ..."with suitable Haskell implementations.
Note: Start early! Type inference has many subtle points that can take a while to get right.
Most of the points, will be awarded automatically, by evaluating your functions against a given test suite.
Tests.hs contains a very small suite of tests which gives you a flavor of these tests. When you run
$ stack testYour last lines should have
All N tests passed (...)
OVERALL SCORE = ... / ...
or
K out of N tests failed
OVERALL SCORE = ... / ...
If your output does not have one of the above your code will receive a zero
If for some problem, you cannot get the code to compile,
leave it as is with the error ... with your partial
solution enclosed below as a comment.
The other lines will give you a readout for each test. You are encouraged to try to understand the testing code, but you will not be graded on this.
This assignment builds upon our previous implementation of Nano from HW4,
so most data types in Types.hs are the same.
We have added the following definitions to support type inference.
We will use the following Haskell datatypes to represent Nano's types and poly-types:
data Type
= TInt -- Int
| TBool -- Bool
| Type :=> Type -- function type: T1 -> T2
| TVar TVar -- type variable: a, b, c
| TList Type -- list type: [T]
data Poly = Mono Type -- mono-type
| Forall TVar Poly -- polymorphic typewhere TVar is just a type alias for String used to represent type variable names:
type TVar = StringFor example, a Nano type Int -> Int is represented inside your type checker as:
TInt :=> TIntwhereas a polymorphic Nano type forall a . List a -> a is represented as:
Forall "a" (Mono (TList (TVar "a") :=> TVar "a"))(or simply forall "a" (list "a" :=> "a") using convenience functions forall and list, also defined in Types.hs).
A type environment is simply a dictionary that maps program variables to poly-types. As before, the dictionary is implemented as a list of pairs:
type TypeEnv = [(Id, Poly)]A type substitution is also a dictionary, but it maps type variables to types they should be replaced with:
type Subst = [(TVar, Type)] Throughout the assignment, make sure that the keys (type variables) in the substitution are unique. You also get to assume this property when you implement operations on substitutions.
In this problem, you will implement some helper functions for your type-checker which you will need later on.
First implement two functions that compute the list containing the set of
free type variables in a type and in a poly-type. Even though a list allows for
duplicate elements, you want to make sure that the elements of your lists
for freeTVars are unique.
Note: to enable overloading (using a function with the same name for types and poly-types),
this function is defined as part of a type class HasTVars.
instance HasTVars Type where
freeTVars t = error "TBD: ..."
instance HasTVars Poly where
freeTVars s = error "TBD: ..."As always, you are allowed (and encouraged) to replace freeTVars t and freeTVars s with multiple equations for different patterns.
Throughout this assignment, you are allowed to use any library functions,
as long as you don't add new import statements.
In particular, some useful functions from the Data.List library include
delete (remove an element), nub (a.k.a "dedupe" from the midterm).
Some of the functions
in Data.List mimic set operations. In particular, you may find it useful
to learn about union and \\ (set difference).
You can read all about the Data.List library
on Hackage here.
You're free to use anything you find in there.
You may also look up functions on Hoogle.
Note: the Data.List library is imported qualified:
import qualified Data.List as LSo whenever you want to use a function from that library, you have to prefix its name with L. (e.g. L.delete).
When you are done you should get the following behavior:
>>> freeTVars TInt
[]
>>> freeTVars (TVar "a")
["a"]
>>> freeTVars (Forall "a" (Mono (TList (TVar "a") :=> (TVar "a"))))
[]
>>> freeTVars (TList (TList (TVar "a")))
["a"]
>>> sort (freeTVars (TVar "a" :=> TVar "b"))
["a", "b"]
>>> freeTVars (TVar "a" :=> TVar "a")
["a"]
>>> freeTVars (Forall "a" (Mono (TList (TVar "a") :=> (TVar "b"))))
["b"]Implement a function to lookup what a type variable maps to in a substitution:
lookupTVar :: TVar -> Subst -> Typeand a function to remove the mapping for a given type variable from a substitution (remember that all variables in a substitution are supposed to be unique):
removeTVar :: TVar -> Subst -> SubstIf your type variable a is not mapped in the substitution, then lookupTVar should
just return (TVar a) -- i.e. it makes no substitution -- and removeTVar should
return the substitution unchanged.
When you are done you should get the following behavior:
>>> lookupTVar "a" [("a", TInt)]
Int -- this is the pretty-printed `TInt`
>>> lookupTVar "a" [("b", TInt)]
a -- this is the pretty-printed `TVar "a"`
>>> lookupTVar "g" [("g", TBool), ("a", TInt)]
Bool -- this is the pretty-printed `TBool`
>>> removeTVar "a" [("a", TInt)]
[]
>>> removeTVar "a" [("b", TInt)]
[("b",Int)]
>>> removeTVar "a" [("a", TInt), ("b", TBool)]
[("b",Bool)]Next, use lookupTVar and removeTVar to write
two functions that apply a substitution
to a type and to a poly-type
(both named apply):
instance Substitutable Type where
apply sub t = error "TBD: ..."
instance Substitutable Poly where
apply sub p = error "TBD: ..."Once you have implemented this functionality and recompiled, you should get the following behavior:
>>> apply [("a",TInt)] (TList (TVar "a"))
[Int] -- i.e., TList TInt
>>> apply [("a",TInt)] (Forall "a" $ Mono (TList (TVar "a")))
forall a . [a] -- i.e., Forall "a" (Mono (TList (TVar "a")))
>>> apply [("a", TList (TInt)), ("b", TInt)] (TVar "b" :=> TVar "a")
Int -> [Int] -- i.e., TInt :=> TList (TInt)Finally, use apply to implement the function extendSubst,
which extends a substitution with a new type assignment:
extendSubst :: Subst -> TVar -> Type -> SubstWhen you are done you should get the following behavior. Pay close attention to the second and third test cases:
>>> extendSubst [("a", TInt)] "b" TBool
[("b",Bool), ("a",Int)]
>>> extendSubst [("a", list "b")] "b" TBool
[("b",Bool), ("a",[Bool])]
>>> extendSubst [("a", TInt)] "b" (TList (TVar "a"))
[("b",[Int]), ("a",Int)]
-- i.e. [("b", TList (TInt)), ("a", TInt)]Remember, you can assume that extendSubst will never be called with a
TVar that is already mapped in the substitution (and you should be careful
not to do this accidentally in Problem 2 -- even though you might not directly
call extendSubst you will likely call extendState which does call extendSubst).
Unification is a crucial part of type inference. In a nutshell, the inference algorithm works by:
- recursively traversing the program, assembling expression types from the types of their sub-expressions;
- when some type is not yet known, it picks a fresh type variable to represent this type;
- whenever two types must be the same, the algorithm tries to unify them and figure out what these type variables actually stand for.
A fresh type variable is a type variable that is distinct from all type variables the algorithm has generated before.
Our implementation will name the fresh type variables a0, a1, ....
In order to remember the index of the first unused fresh variable,
we introduce a new datatype:
data InferState = InferState {
stSub :: Subst -- ^ current substitution
, stCnt :: Int -- ^ number of fresh type variables generated so far
}which represents the "current state" of the unification and inference algorithms. It remembers how many fresh type variables the algorithm has generated so far, alongside the "current substitution".
Implement a function unifyTVar st a t
that tries to unify a type variable a with a type t.
If it succeeds, it records the new type assignment (for a)
by extending the state st.
If it fails, it throws an exception.
unifyTVar :: InferState -> TVar -> Type -> InferStateYou can use the provided helper function extendState
to extend a substitution inside a state -- it's probably simpler to use
that than to call extendSubst directly.
Assumption:
You can assume that a does not appear in the domain of stSub st
(i.e. among the type variables it maps);
remember to guarantee this property whenever you call unifyTVar.
When you are done you should get the following behavior. Note that
initInferState == []:
>>> stSub $ unifyTVar initInferState "a" (list "b")
[("a",[b])]
>>> stSub $ unifyTVar initInferState "a" "a"
[]
>>> stSub $ unifyTVar initInferState "a" (list "a")
*** Exception: Error {errMsg = "type error: cannot unify a and [a] (occurs check)"}Hints: This one is easier to do by not pattern matching on the
constructors for the Type in the input. Rather, there are basically
three different outcomes that can happen that correspond to the
three test cases above: either we learn a constraint (for the type
variable) and add it to our unifying substitution, or it turns out
no new constraint is needed, or we find ourselves in a situation
where we can't possible unify TVar "a" with the input type
and we have to throw an error. Think carefully about when exactly
we end up in each of these three situations and then implement your
unifyTVar function.
Note: When you need to throw an error, use
throw (Error "message goes here"). The tests that are looking for an error
won't pass if you use the error keyword instead. You don't need to reproduce
the whole fancy message above, but you do need to have the phrase "type error"
appear in your message.
Implement a function unify st t1 t2
that tries to unify the types t1 and t2.
If it succeeds, it records the new type assignment by extending the state st.
If it fails, it throws an exception.
unify :: InferState -> Type -> Type -> InferStateAssumption:
You can assume that any type variables in t1 and t2 do not appear in the domain of stSub st;
remember to guarantee this property whenever you call unify (either recursively here or
where you call unify from the infer function).
When you are done you should get the following behavior:
>>> stSub $ unify initInferState TInt TInt
[]
>>> stSub $ unify initInferState TInt TBool
*** Exception: Error {errMsg = "type error: cannot unify Int and Bool"}
>>> stSub $ unify initInferState (TInt :=> TInt) ("a" :=> "a")
[("a",Int)]
>>> stSub $ unify initInferState ("a" :=> TInt) (TBool :=> "b")
[("b",Int),("a",Bool)]
>>> stSub $ unify initInferState (list "a") (list $ list "a")
*** Exception: Error {errMsg = "type error: cannot unify a and [a] (occurs check)"}Hints: This function will need to be recursive, so think about how you might take a recursive approach -- which cases will have subproblems and what are those subproblems?
Now we have everything in place to implement type inference!
Let's first consider the fragment of the language without:
- binary operators (we already implemented
+for you) - conditionals
- let polymorphism
- recursion
- lists
Implement the function infer st gamma e for this fragment.
To be precise, you need to implement the following cases to pass
part (a):
- integer constants (EInt)
- boolean constants (EBool)
- variables (EVar), but you can assume for (a) that whatever you get
from your typing environment doesn't contain
foralls. - lambda abstractions (ELam)
- applications (EApp)
- let bindings without polymorphism (ELet), but don't do any generalizing in part (a)
This function attempts to infer the type of e in type environment gamma
given that the current state is st.
If e imposes constraints on the types of its sub-expressions,
infer calls unify to enforce these constraints.
If all unification steps succeed, infer returns the inferred type of e and a new state
(possibly extended with new type assignments generated during unification).
In this part, you can assume that the type environment maps all variables to mono-types,
i.e. every type in the type environment has the structure Mono t.
In particular, you should observe the following behavior:
>>> typeOfString "True"
Bool -- TBool
>>> typeOfString "1"
Int -- TInt
>>> typeOfString "(\\x -> x) 5"
Int -- TInt
>>> typeOfString "(\\x -> x + 1) True"
*** Exception: Error {errMsg = "type error: cannot unify Int and Bool"}Now let's add support for polymorphism!
-
In this version, variables in the type environment can have polymorphic types, like
forall a . a -> a. -
Whenever we use a variable, we have to instantiate its poly-type, i.e. replace all bound type variable with fresh free type variables.
-
Whenever we define a variable using a
let, we have to generalize its type into a poly-type.
First implement the function generalize gamma t that generalizes t into a poly-type,
i.e. binds all its type variables that are not free in gamma with a Forall:
generalize :: TypeEnv -> Type -> PolyWhen you are done you should observe the following behavior:
>>> generalize [] ("a" :=> "a")
forall a . (a) -> a -- Forall "a" $ Mono $ (TVar "a") :=> (TVar "a")
>>> generalize [("x", Mono "a")] ("a" :=> "a")
(a) -> a -- Mono $ (TVar "a") :=> (TVar "a")Next implement the function instantiate n s that instantiates a poly-type s
by replacing its bound type variables with fresh type variables.
Here n is the index of the first unused fresh type variable.
instantiate :: Int -> Poly -> (Int, Type)Assumption:
In instantiate n s, you can assume that bound type variables in s
cannot clash with ai where i >= n.
Remember to guarantee this property whenever you construct your own polymorphic types.
When you are done you should observe the following behavior:
>>> instantiate 2 (Forall "h" $ Mono $ TList (TVar "h"))
(3, [a2]) -- TList (TVar "a2")
>>> instantiate 2 (Forall "a" $ Mono $ (TVar "a") :=> (TVar "b"))
(3, (a2) -> b) -- (TVar "a2") :=> (TVar "b"))
>>> instantiate 0 (Forall "a" (Forall "b" (Mono ("a" :=> "b"))))
(2,(a0) -> a1)Finally, modify the function infer to support polymorphism.
You'll have to change the EVar and ELet cases you wrote in part (a).
When you are done, your implementation should be able to type-check the "double identity" example from the lecture:
>>> typeOfString "let id = \\x -> x in let y = id 5 in id (\\z -> z + y)"
(Int) -> IntHint: To help figure out where instantiate should be used in infer, think
about the return type of infer.
Instead of implementing separate type inference for binary operators, conditionals, and list, we decided to represent them as built-in functions and reuse the type inference for variables and function application.
Fill in the types of built-in functions inside preludeTypes
(we have pre-filled the one for +).
Remember that you are not allowed to use the names a0, a1, ... as bound type variables
because those are reserved for use as free type variables by the algorithm.
You are not writing any function definitions here: you just need to supply a type (object of type Poly) that expresses the most general type for these built-in functions.
Modify your implementation of infer to support recursive function definitions.
Once you are done, all tests should pass. See tests/input/3dtest1.hs through
tests/input/3dtest5.hs for some examples.