14 Typed Languages with Turnstile
Goals |
— |
— |
— |
14.1 A Preview
This afternoon, we’re going to learn how to implement a type system using macros. The end result will be a #lang that checks while it expands and raises a syntax error if it encounters a type error. Here is an example interaction that makes the type checker happy:
> (+ 3 4) 7
> (λ ([x Bool] [y Int]) (if x y (* 2 y))) #<procedure>
> (let ([x 5]) (let ([x 8] [y x]) y)) 5
And here’s an interaction demonstrating some type errors:
> (+ 3 #t) eval:4.0: #%app: type mismatch: expected Int, given Bool
expression: #t
at: #t
in: (#%app + 3 #t)
> (λ ([x Int] [y Bool]) (if x y (* 2 y))) eval:5.0: if: type mismatch: expected Bool, given Int
expression: x
at: x
in: (if x y (* 2 y))
> (λ (x) (+ x 1)) eval:6.0: λ: no expected type, add annotations
at: (λ (x) (+ x 1))
in: (λ (x) (+ x 1))
parsing context:
while parsing has-expected-type
term: (λ (x) (+ x 1))
location: eval:6.0
The last error message suggests that we can fix the function by adding a type annotation. Annotating either the whole function or just the parameter allows the function to type:
> (ann (λ (x) (+ x 1)) (-> Int Int)) #<procedure>
> (λ ([x Int]) (+ x 1)) #<procedure>
Beyond the rules we saw this morning, the stlc language adds a module-level definition form def, which is visible only to code that follows it. It also supports testing forms, such as check-type and typecheck-fail.These forms originate from the rackunit/turnstile library. Here’s a full-module example using these:
"use-stlc.rkt"
#lang s-exp "stlc.rkt" ;; Subtracts two integers. (def - (λ ([x Int] [y Int]) (+ x (* -1 y)))) (check-type - : (-> Int Int Int)) ; correct type for - (check-type (- 3 8) : Int -> -5) ; correct type and result (typecheck-fail (- 3 #t)) ; bad argument type (typecheck-fail (- 3 8 12)) ; too many arguments (typecheck-fail (- 3)) ; too few arguments ;; Finds the factorial of an integer. (def fact (rec self (-> Int Int) (λ (n) (if (<= n 1) 1 (* n (self (- n 1))))))) (check-type (fact 5) : Int -> 120) ;; (iter n f x) applies `f` to `x`, `n` times. (def iter (-> Int (-> Int Int) Int Int) (rec self (λ (n f x) (if (<= n 0) x (self (- n 1) f (f x)))))) ;; Integer exponentiation using `iter`. (def expt (-> Int Int Int) (λ (n m) (iter m (λ (acc) (* n acc)) 1))) (check-type (expt 2 10) : Int -> 1024)
14.2 Getting Our Theory Closer to Practice
To express turnstile-style rewriting using type inference rules, we’re going to define a second language as that target of our translation. This new language is like the typed source language λZB→, but without types or type annotations:
The compute-mode and check-mode judgments now have a
fourth position between the input expression and the
type—
We need one more change, to type environments. Part of how turnstile works involves replacing variable names with new ones, and to do this, it uses an environment to map each old name () to both a new name () and the variable’s type:
Now we can express the rule for inferring the type of a variable. It has to be present in the type environment, in which case we rewrite it to the new name from the environment and assign the type from the environment:
Some rules, such as the inference-mode rule for function applications, are the same as before, except for rewriting each piece before putting them back together:
Rules that bind variables also need to participate in the renaming. This means generating a fresh name for each bound variable and including that information in the environment for expressions in the variable’s scope. For example, here’s the check-mode rule for :
Finally, some rules need to eliminate syntactic forms that the target language lacks. For example, the inference-mode rules for type annotation expressions and type-annotated λs both erase the annotations in order to translate to the untyped target language:
14.3 Doing It with ’Stile
Before we see how to write those same rules in turnstile, let’s start with the basics. The easiest way to make a turnstile–based language is to write it in the #lang turnstile/quicklang language, which automatically provides #%module-begin, #%top-interaction, and require from our language:
#lang turnstile/quicklang
(If you want to customize the automatically provided forms then there are lower-level ways to set it up.)
Next, before we define the syntax of expressions, we need to define the syntax of types.
14.3.1 Working with Types
The turnstile library represents types as syntax objects. Types have an external form, which you write directly, and an internal form suitable for comparison and pattern matching. Each type written in a user program will be in the external representation, and it’s our responsibility to normalize it to its internal form. To help with this, turnstile provides a syntax class type (at phase 1). The syntax-parse pattern τ:type
matches only well-formed types,
normalizes type τ to its internal form, and
binds an attribute named τ.norm to that internal form for use in subsequent syntax templates. (Note that pattern variable τ gets bound to the original syntax, not the internal form.)
Setting all that up can be pretty complicated, but turnstile provides macros that will define the right things for you. So, let’s start by defining our types. As before, we want two base types for Booleans and integers, and one type constructor for function types:
"stlc.rkt"
19 (define-base-types Bool Int) 20 (define-type-constructor -> #:arity >= 1)
The notation #:arity >= 1 means that -> requires at least one parameter to be well formed.
Each type definition above binds four names. When you define a type named τ, you get:
τ – the external form of the type as it should be written in both user programs and your type checker;
τ- – the internal form of the type, which you won’t usually refer to directly;
τ? – a predicate, defined at phase 1, for recognizing type τ; and
~τ – a pattern expander, also defined at phase 1, that can be used in syntax-parse patterns to match type τ.
Let’s see how constructing and matching types works using the REPL. First we’ll require turnstile and define our types:
> (require turnstile/base) > (define-base-types Bool Int) > (define-type-constructor -> #:arity >= 1)
Now, what can we do with them? At run time, not much, since the operations aren’t available:
> Int Int: Cannot use tag at run time
> #'Int #<syntax:eval:5:0 Int>
> (Int? #'Int) Int?: undefined;
cannot reference an identifier before its definition
module: 'program
The error is because define-base-types defined Int? in phase 1 (the transformer environment), not phase 0 (the run-time environment). This is because turnstile’s types are meant to be used while type checking, which happens at compile time. Further, some turnstile operations, such as type normalization (and thus, the type syntax class), only work during macro expansion.
To make it easy to experiment with macro expansion in the REPL, we’ll make a variant of syntax-parse that pattern matches phase 0 syntax, evaluates the body of the matching clause in 1, and then shifts the resulting value back to phase 0. That’s tricky, but you don’t need to understand how it works in order to experiment with it. Here’s the definition:
> (require (for-syntax syntax/parse))
> (define-syntax (syntax-parse0 stx) (syntax-parse stx #:literals (syntax) [(_ #'e0 [p e1 ... er] ...+) #'(let-syntax ([m (λ (stx) (syntax-parse stx [(_ p) e1 ... #`'#,er] ...))]) (m e0))]))
To see how to use it, let’s start with something simple: Match the syntax of a type and format it as a string (using turnstile’s phase 1 function type->str):
> (syntax-parse0 #'Int ; ↓ note: this isn't quite right [τ:type (type->str #'τ)]) "Int"
That might seem like it’s not doing anything, but let’s try formatting a couple ill-formed types:
> (syntax-parse0 #'Intt ; ↓ still not right… [τ:type (type->str #'τ)]) eval:8.0: m: eval (10:0) not a well-formed type: Intt
at: Intt
in: (m Intt)
parsing context:
while parsing type
term: Intt
location: eval:10.0
> (syntax-parse0 #'(->) ; ↓ also not right… [τ:type (type->str #'τ)]) eval:8.0: m: eval (11:0) not a well-formed type: (->)
eval:11:0: ->: Improper usage of type constructor ->: (->),
expected >= 1 arguments
in: (->)
at: (->)
in: (m (->))
parsing context:
while parsing type
term: (->)
location: eval:11.0
In order to write a type checker, we also need to be able to recognize and decompose types, which is where pattern expanders, like ~Int and ~-> come in. To match type Int with syntax-parse, use the pattern ~Int; to match an one-parameter arrow type and bind the domain and codomain to pattern variables, you want the pattern (~-> dom cod). Let’s try matching an arrow type and returning codomain as a string:
> (syntax-parse0 #'(-> Int Bool) ; ↓ won't match ↑ [(~-> dom cod) (type->str #'cod)]) eval:8.0: m: expected the identifier `#%plain-app'
at: ->
in: (m (-> Int Bool))
What’s the problem now? The turnstile library’s pattern expanders work only on internal types, but types that come directly from the program’s syntax haven’t been normalized to internal types yet. That means we first need to match using the type syntax class, and then use the norm attribute to get the normalized (internal) type and match that using syntax-parse’s pattern directive #:with:
> (syntax-parse0 #'(-> Int Bool) [τ:type ; ↓ this is the solution #:with (~-> dom cod) #'τ.norm (type->str #'cod)]) "Bool"
It’s important know which types have been internalized and which are still external and unchecked. Since tracking this can be tricky, I adopt the following convention: external types are named using Greek letters (e.g., τ, σ), whereas internal types are named using Roman letters (e.g., t, s, dom, etc.).
As a final example of working with types, let’s write a macro that counts all the arrows in a type:
> (define-syntax (count-arrows stx) (syntax-parse stx [(_ τ:type) #:with (~-> dom ... cod) #'τ.norm #'(+ 1 (count-arrows dom) ... (count-arrows cod))] [_ #'0])) > (count-arrows Int) 0
> (count-arrows (-> Int Bool)) 1
> (count-arrows (-> Int Int (-> Int Bool))) 2
> (count-arrows (-> (-> Int Int) Int (-> Int Bool))) 3
14.3.2 Typing Rules in turnstile
Now we’re finally ready to see how to we can implement the rules of our type system using turnstile. We’ll use its define-typed-syntax macro to define each syntactic form in our typed language as a macro that infers or checks the type and expands into untyped Racket code. You can read define-typed-syntax in two complementary ways:
An embedded DSL for expressing bidirectional type inference with rewriting, as in Getting Our Theory Closer to Practice, but in the style of syntax-parse.
An extension to syntax-parse for expressing bidirectional type inference using syntax that resembles the traditional notation that we saw on the blackboard.
As a first example, let’s see the rule for ann, written in both forms:
"stlc.rkt"
29 (define-typed-syntax ann 30 ; to COMPUTE the type of ‘(ann e τ)’, . . . 31 [(_ e:expr τ:type) 32 ≫ 33 ; CHECK that ‘e’ has type ‘τ’ (normalized), calling its 34 ; expansion ‘e-’, . . . 35 [⊢ e ≫ e- ⇐ τ.norm] 36 ---- 37 ; then expand to ‘e-’ and COMPUTE type ‘τ’ (normalized). 38 [⊢ e- ⇒ τ.norm]])
There are three main differences you should notice:
The type environment is implicit in turnstile (though there’s a way to add things to it for a premiss).
- If a typed syntactic form requires multiple type inference rules on paper, then each of those rules becomes a case, à la syntax-parse, in the same macro definition using define-typed-syntax. This means that:
If there are both compute-mode and check-mode rules for the same syntactic form, all of them need to be cases in the single macro that implements that form.
As with syntax-parse, the cases are considered in the order they are written.
When in check mode, compute-mode rules are also considered to match, which means they should come after the check-mode rules.
Racket wraps each literal L as (#%datum . L), so all literals (integers and Booleans, in our case) need to be handled by an implementation of #%datum.
The two notations differ in where the expression-to-be-typed appears. In the traditional notation, appears as part of the judgment in the conclusion, beneath the horizontal line. By contrast, inputs to a define-typed-syntax case appear as patterns at the top of each case, leaving only outputs at the bottom. So the expression-to-be-typed is a syntax-parse pattern at the top of each case, and for check-mode cases, the expected type also appears as a pattern at the top.
Given all that, here’s the basic shape to define a form with one check-mode case followed by one infer-mode case:
(define-typed-syntax ⟨form-name⟩ [⟨typed-form-pattern⟩ ⇐ ⟨expected-type-pattern⟩ ; ← expected type means check mode ≫ ⟨premiss⟩ ... ---- [⊢ ⟨untyped-form-template⟩]] [⟨typed-form-pattern⟩ ≫ ⟨premiss⟩ ... ---- ; computed type ↓ means compute mode [⊢ ⟨untyped-form-template⟩ ⇒ ⟨computed-type-template⟩]])
In order to type a turnstile rule (using your keyboard), you need to be able to enter the special characters that turnstile uses. DrRacket has a mechanism for entering special characters by typing a particular sequence starting with \ and ending with Control-\. Here are the sequences you’ll need:
≫
\gg
⇐
\Leftarrow
⇒
\Rightarrow
≻
\succ
⊢
\vdash
(Alternatively, see turnstile/no-unicode.)
As a final, larger example, let’s consider how to type λ. For reference, here are the relevant blackboard rules:
And here’s the implementation using define-typed-syntax:
"stlc.rkt"
46 (define-typed-syntax λ 47 [(_ (x:id ...) e:expr) 48 ⇐ (~-> s ... t) 49 ≫ 50 #:fail-when (check-duplicate-identifier (stx->list #'(x ...))) 51 "repeated formal parameter name" 52 #:fail-unless (= (stx-length #'(x ...)) (stx-length #'(s ...))) 53 "wrong number of formal parameters for expected arrow type" 54 [[x ≫ x- : s] ... ⊢ e ≫ e- ⇐ t] 55 ---- 56 [⊢ (λ- (x- ...) e-)]] 57 58 [(_ ([x:id σ:type] ...) e:expr) 59 ≫ 60 #:fail-when (check-duplicate-identifier (stx->list #'(x ...))) 61 "repeated formal parameter name" 62 #:with (s ...) #'(σ.norm ...) 63 [[x ≫ x- : s] ... ⊢ e ≫ e- ⇒ t] 64 ---- 65 [⊢ (λ- (x- ...) e-) ⇒ (-> s ... t)]])
Everything else you need to know about turnstile for the exercises is used in the above macro, so let’s consider it line by line:
The form we’re defining, λ, is on line 1. Note that when expanding to Racket’s untyped λ, we’ll have to be careful not to make compilation go into a infinite loop.
Line 2 gives the pattern for the first case, which matches an unannotated λ because the formal parameters are decorated with the syntax class id.
Line 3 is a pattern for the expected type, which matches only in check mode, and only when the expected type is an arrow type. It binds pattern variable s to the expected argument types, and pattern variable t to the expected result type.
Lines 5 and 6 enforce the side condition that the formal parameter names are distinct. The generated error message will include the repeated name, because if check-duplicate-identifier finds a duplicate then returns the duplicate, and when #:fail-when’s condition evaluates to a syntax object then it includes that syntax object in its error message.
On lines 7 and 8 we enforce a second side condition, that the number of formal parameters on the λ equals the number of arguments of the expected type. Without this check, line 9 will still fail when they don’t match, but with an very unfriendly message.
Line 9 expands the body of the λ, expression . To the left of the turnstile (⊢), the syntax [x ≫ x- : s] ... extends the type environment for checking to map each formal parameter x to a fresh name x- and its type s. (We don’t need to generate the x-s, because that’s turnstile’s job—
x- in that context is a pattern variable that turnstile binds to a fresh name.) To the right of the turnstile, the syntax e ≫ e- ⇐ t expands e in check mode using t, binding the result to pattern variable .
The conclusion of the first case, on line 11, gives the expanded result, which uses the renamed x-s as the formal parameters and the rewritten e- as the body. This works because all the occurrences of x ... have been replaced by x- ... in e-.
There’s another subtlety here, which is that it expands to λ- instead of λ. What’s λ-? It’s actually Racket’s ordinary λ, renamed by turnstile to append a hyphen. We need something like this because it wouldn’t work to have our typed λ expand into itself. Yesterday we avoided this problem by giving our definitions a different name and renaming on provide, like so:
(provide (rename-out [my-λ λ] [my-app #%app])) (define-typed-syntax my-λ ... λ ...) ; expands to Racket's ‘λ’ (define-typed-syntax my-app ... #%app ...) ; expands to Racket's ‘#%app’ Turnstile uses a different convention: it re-provides every Racket binding suffixed by -, so you can avoid rename-out when providing your language’s bindings.
The second case, starting on line 13, matches λs whose parameters have type annotations. Note that the pattern variables for those types are a Greek letter, σ; because this pattern is matched against source syntax, those types are haven’t been normalized.
Because there is no expected type before line 14, that means this is an infer-mode case.
Lines 15 and 16 implement the same side condition as lines 5 and 6.
The #:with pattern directive on line 17 binds s ... to the internal forms of the external types σ .... (This isn’t required—
we could just refer to σ.norm on subsequent lines instead of s.) Line 18 differs from line 9 in one character: where the previous case used ⇐ to check the type of body, this case uses ⇒ to compute the type of the body. This makes sense, because without an expected type coming into the rule, we can’t know what type to expect for the body.
The conclusion of the second case, on line 19, gives the untyped expansion, just like the conclusion of the first case. But unlike the first case, it also reports the computed type. The first case has an expected type come in, so no type goes out; the second case has no expected type come in, to the computed type needs to go out.
14.3.3 Making Your Definitions Available
To make your language work, you need to provide the necessary bindings: syntax, types, and values. When making a turnstile-based language, you provide your typed syntax in the way you’ve already seen:
"stlc.rkt"
(provide ann if def λ let rec #%app #%datum (rename-out [λ lambda]))
To export types, you should use type-out, like so:
"stlc.rkt"
5 (provide (type-out Bool Int ->))
To export typed values (usually functions), use typed-out:
"stlc.rkt"
6 (provide (typed-out [not (-> Bool Bool)] 7 [+ (-> Int Int Int)] 8 [* (-> Int Int Int)] 9 [/ (-> Int Int Int)] 10 [<= (-> Int Int Bool)] 11 [zero? (-> Int Bool)]))
And finally, we want to require and re-provide the testing forms from rackunit/turnstile to our new language:
"stlc.rkt"
13 (require rackunit/turnstile) 14 (provide (all-from-out rackunit/turnstile))
Of course, we haven’t implemented all the syntax in that first provide yet, since that will be your job.