13 Types and Type Checking
| Goals | 
|  — | 
|  — | 
|  — | 
|  — | 
|  — | 
13.1 An Arithmetic Expression Language
We’re going to start with the simplest language that can demonstrate all the phenomena we care about. It’s a language of arithmetic expressions with integers and Booleans. Call it ZB:
Except for , we can understand this language as a subset
of Racket. Like Racket, it has integer literals, boolean literals,
addition, division, less-than-or-equal comparison, and 
.
What is ? A single integer, provided from outside
the program. You can think of a program in this language as a function
that takes a single, integer-valued argument named 
.
Without 
, the result of every expression can be known
immediately; we include it so that what follows won’t be trivial.
Exercise 33. Enumerate and characterize the potential errors in a ZB program.
Exercise 34. Some ZB programs, like
and
, always have errors, regardless of the input. Some, like
and
never have errors, regardless of the input. Find a program that errors on some inputs and not others.
13.2 Adding Types
There are a variety of ways we could statically check a ZB program for
errors. The strongest analysis we could do would tell us the set of
inputs that produce an error—
Here’s a simple recipe for designing a type system:
- First, classify the values of our language in some way; call these classes types. 
- Second, classify expressions in your language by the types of values that they compute. Do this compositionally, meaning that the type of each expression can be computed from the types of its subexpressions. (Typically, if you can’t figure it out for a particular program then you give up and reject that program.) 
When we design a type system, we have a choice
for the granularity of the classifications. We need
to think about what’s relevant to our language’s operations. For example,
to know that  will succeed, we need to know that
 and 
 both evaluate to integers, so it would be helpful
to have a type for that; we’ll call it 
. And we need some other
type for the result of 
 to prevent Booleans from flowing into
additions; we’ll call that type 
. So, we define just two
types 
 for ZB:
(We could increase the granularity by considering more operations and
sets of values— is a simple choice that works
well.)
That’s step one. Step two is to figure out how to classify
ZB expressions by types. We do this by defining a judgment, a relation that connects expressions to types.
The notation will be , which we read as
“expression 
 has type 
.” For example, we
would say that 
 and
, but it would be wrong to
say 
.
The usual way to define a typing judgment like
 is using logical inference rules. A
logical inference rule says that a particular conclusion can
be derived from some premisses. The conclusion—
 and the type of
any Boolean literal is 
:
Note that there are no premisses above the lines because these judgments are
true for every literal of the right form. Similarly, we conclude that
 has type 
:
The remaining rules require premisses. For example, successful addition
expression always produces an integer, but that depends on its
subexpressions producing integers. So we have a rule that says that
an expression  has type 
 whenever both
of its subexpressions have type 
 (and likewise for division):
Yes, you read this correctly: according to the typing rule,
division always produces an . That’s okay, because
if a program actually divides by 0, it gets terminated,
rendering the type of the division expression irrelevant.
Finally, we give two rules for typing conditionals:
Exercise 36. Articulate the
rules in English. Do they make sense? Is there a better way to write the rules?
13.3 Typing More Language Features
Most programming languages have more features than ZB, such as:
- More base types, such as strings and inexact numbers 
- Variables 
- Functions 
- Recursion and/or Loops 
- More compound types, such as tuples or records 
The first of these can be done with what you’ve already seen, and the last we will save for exercises. For others, however, we’ll need a few more ideas. In the next three subsections we’ll see how our type system can handle variables, functions, and recursion.
13.3.1 Variables
In order to extend our typed language with variables, we need a way to keep track of their types. We’ll use a type environment, which is a mapping from the currently bound variable names to their types. You can think of it like a partial function from names to types, but we’re going to define it syntactically:
We write  for the type environment whose domain is empty,
and 
 for a type environment that is the same as
 except for all the 
s, which it maps to their
corresponding 
s instead.
For example, the type environment
that variable 
 has type 
 and variable 
 has
type 
.
Then we can extending ZB with variables and single-variable
 expressions:
How does the type system work? The typing judgment now keeps track
of variables and their types using a type environment. We’ll write
 to mean that expression 
 has type
 given the variables in 
 have the types listed there.
Thus, binders (like 
) and variable occurences communicate
via 
. The type of a variable must given by the type
environment:
The rule for  is more complicated:
For an expression  to be well
typed in some type environment 
, the sub-expression 
must have a type in the same type environment. But the sub-expression
 has access to an environment where 
 is extended
with 
, which has the same type that 
 has. And in
that case,
the whole expression has the same type that 
 has.
The remaining rules are the same as for ZB, but they pass the type environment through. For example:
13.3.2 Functions
The next feature we’d like to add is functions. As with local variables,
the new syntax comes in pairs— and
function application:
Now our language has values that it didn’t have before, and that are neither integers nor Booleans, which means we need a new type. The simplest choice is to extend our type syntax to include arrow types:
We want to say that a value has type  when
1) it’s a procedure whose arity is k, and
and 2) if applied to k values of type are 
 through
, the result (if any!) will be a value of type 
.
The typing rule for applications extends this intuition from values
to expressions:
Since a  expression produces a function, it should have an arrow
type that matches its arity. Furthermore, the body of the 
 can
see both the 
’s surrounding context and its formal parameters, so
the body is typed in a type environment where the formal parameters have
types matching the domains of the arrow type. The result type of the body
must, of course, match the codomain of the arrow type. Like this:
The above rule is great! It’s true, it’s correct, and it will never
lead you wrong. The problem, however, is that it might not lead you
anyplace at all. Up until now, I’ve presented the type system from
a logical perspective—) before we added the rule for 
 has an
easy algorithm if you read it as a function from two arguments (
and 
) to one result 
.
Adding the above  rule breaks that property, because you need
to know the 
s to check 
 in the premiss, but the 
s
only appear in an “out” position of the conclusion of the rule.
We’ll adopt two complementary strategies to solve this problem. In this
section, we enrich the syntax of  expressions (diverging from
Racket) to contain the information that we need. Then, in
Bidirectional Type Inference, we will modify our typing relation so that the
context can provide a “hint” to an unnannotated a 
 expression
about what its type should be.
In the first approach, we add require that each of a ’s formal
parameters have a type annotation:
Then, we just use the type annotations as the types of the formal parameters for checking the body:
This rule, for  with type annotations, means that our typing
relation again gives us an algorithm. Next, I’ll show you how a
slightly more sophisticated algorithm can take advantage of contextual
information to let you omit some type annotations.
13.3.3 Recursion
In an untyped, higher-order language (like Racket or the λ calculus), it’s possible to do recursion using a fixpoint combinator (like Y) even if you don’t include any recursive binding forms. But many type systems, including the one we’re working with, can’t give the Y combinator a type. Instead, recursion must be built-in somehow.
One way is with a recursive binding form. For example, we might add
an expression form  that evaluates 
 so that
uses of 
 inside of 
 are bound to the
value of 
. (This means those uses have to be delayed,
typically under a λ, so that the value of 
 isn’t needed before
it’s ready. The Racket equivalent is 
.)
Exercise 38. Using the techniques we’ve seen so far, it’s possible to write a typing rule for
, but like the rule above for unannotated λ, that doesn’t give an algorithm. Write the rule and see that it needs to “guess” the type of
in order to figure out the type of
.
Exercise 39. Show that adding a type annotation to the
form makes it possible to write an algorithmic rule that doesn’t have to guess.
13.4 Bidirectional Type Inference
Above we saw how to type  by adding type annotations
to its formal parameters. Our goal going forward is to
reduce the number of mandatory type annotations.
Often, the information provided by a type annotation can be
acquired from the context instead, and if we can do that, it
can help make our language more pleasant to use.
We can think of types in the type system above as flowing in one direction: upward, to each expression from its sub-expressions. In birectional type inference, we define a second relation by which the type flows downward, from an expression to (some of) its subexpressions. Thus, the new system will be bidirectional.
In particular, we will define the type system as a pair of mutually-recursive relations having different modes:
- infers type - from the given - and - . 
- checks the given - , - , and - . 
Before presenting the algorithm, let’s simplify and restate our
language. Now that we have functions and variables, we can specify
many operations as primitive functions in some initial environment
:
Here is our new language, λZB→, in full:
It has one new form, , which has type 
when 
 also has type 
. This new form isn’t strictly
needed, but as we will see, 
 provides a convenient bridge
between inferring and checking.
Now let’s see the type system, starting with the inference (upward) direction. Some of the rules are the same as before, so let’s get through those first:
The remaining compute-mode rules, on the other hand, have
premisses that switch to check mode. The compute-mode
rule for  checks that condition has type
 and 
 types for the two branches,
succeeding only when they agree:
The final two compute-mode rules are the most interesting, as they can push (non-trivial) expected types into check-mode premisses:
The rule for  checks that 
 has type 
.
The rule for application first infers a type for the operator
(
), which must be an arrow type whose arity matches the
number of operands (
). Then the rule checks each of
the operands against the corresponding type 
.
That completes the rules for the compute-mode judgment. For the
check mode judgment, let’s see the most interesting rule first,
which can check a  without argument type annotations based
on the type it’s expected to have:
We also want rules for propagating an expected type (or
parts thereof) into subexpressions where that might let us
omit some type annotations. For example, if we know the
expected type of a  expression, then we know the
expected type of its body:
Other expression forms, such as , can benefit from
check-mode rules—
Unlike other rules we’ve seen, this delegation rule isn’t limited to a particular syntactic form for the expression in its conclusion, which means that it can apply to any form of expression, including those covered by the other check-mode rules. From a logical perspective, it isn’t harmful (if we’ve been careful) because in every case where compute mode can succeed (via the delegation), the check-mode rules give the same result. Algorithmically, this means that we can avoid considering the delegation rule for expression forms that are covered by more specific check-mode rules.
Exercise 40. Write the missing check-mode rule for
expressions.
Exercise 41. Add string literals and a string type to the language and type system. Then give types for these primitive functions:
,
, and
.