17.2 A Weak Type System

There are two significant problems with the language in Untyped, Slow Regular Expressions. First, it does not have a type system and second, it is unnecessarily slow. This section and the next section address the type system and Better Performance through Smarter Compilation address the performance issue. Feel free to tackle the performance first if that is more to your taste. If you choose that route, be sure to add types to the poorly-performing implementation first, as it is significantly easier. Once they are working, consider how to add types to the performant version.

Here are the type rules:

They require a few auxiliary definitions:

The type rule for r^* does not check that the Racket expression has the right type; short of insisting on only Typed Racket expressions that have the type Char, what else could be done to close this soundness hole?

Most of the rules are straightforward and can be turned into uses of turnstile directly. For example, any syntactically well-formed r^range expression has type Char. The type of the r^* expression is the type of its subexpression (as r^* returns the last match), etc.

One rule is worth spending a few minutes discussing, however: r^or. This rule does not offer an obvious translation into an implementation. There are two issues. First, when there are no arguments to r^or, then the rule lets us conclude any type. (Is this sound? Why or why not?) How do we know which one to pick? Second, when there is at least one argument, then all of the argument types must be the same. How should we express this constraint in turnstile?

The first issue can be handled by accepting an r^or only in checking () mode. The symmetric way to handle the second issue is to implement the rule so that it finds the types for all of the subexpression and then, separately, checks that they are the same. Alternatively, the rule implementation could be asymmetric, computing the type for one of the subexpressions and then checking that the others have that type. These issues came up for typechecking lambda expressions and if expressions on Thursday; see Bidirectional Type Inference for more.

Here are some test cases for this version.

Solution