On this page:
5.1 Flawed Macros
5.2 Macros Robust at Run-Time
5.3 Macros Robust at Compile-Time
5.4 Classy Syntax
5.5 Analysis in Syntax Classes
5.6 Defining Syntax Classes
5.7 Synthesis in Syntax Classes
5.8 What we did not cover
7.4.0.4

5 Advanced Racket Macros

Jay McCarthy

Goals

an appreciation of robust macros

an understanding of syntax classes

using of built-in syntax classes

defining of basic syntax classes

defining of advanced syntax classes

Every macro we wrote yesterday was a horrible embarrassment that brings shame to the generations of macrologists upon whose shoulders we stand. Today, we will learn about the tools syntax/parse provides for writing robust macros and how these tools enable an entirely new way to think about where macro computations occur.

5.1 Flawed Macros

Consider the first version of the iteration macro that we wrote yesterday:
(define-simple-macro (simple-for/list0 ([elem-name seq]) computation)
  (map (λ (elem-name) computation) seq))

We intended this macro to present an abstraction of iteration through a sequence where each element of seq is bound to the identifier elem-name inside of computation. However, our implementation of this macro fails to present this abstraction: instead, it exposes its implementation to clients.

First, we expose that we are using map in the implementation by not checking that seq evaluates to a list.
(simple-for/list0 ([x 5]) (add1 x))
; => ERROR: map: contract violation

Second, we expose that we take the elem-name value and put it directly inside the formals positions of a lambda.
(simple-for/list0 ([5 (list 1 2)]) (add1 x))
; => ERROR: λ: not an identifier

Macros are not simply rewriting rules for abstract syntax trees, where the client is expected to understand the details of the rewriting. Instead, they provide new syntactic abstractions. As with other abstractions in our programs, like functions and data-structures, it is essential that we ensure the integrity of these new abstractions by verifying pre-conditions. Some of these pre-conditions are themselves syntactic (like that elem-name must be a valid identifier) and some of them are semantic (like that seq evaluates to a list); all of them need to be checked.

5.2 Macros Robust at Run-Time

As we have seen, Racket macros are full-fledged procedures that can perform arbitrary computations. Clearly, we should be able to integrate checking of invariants into the definition of macros.

First, we will re-write out macro to use define-syntax directly.
(define-syntax (robust-for/list0 stx)
  (syntax-parse stx
    [(_ ([elem-name seq]) computation)
     #'(map (λ (elem-name) computation) seq)]))

Next, we will add a check to the expansion that seq is a list.
(define-syntax (robust-for/list1 stx)
  (syntax-parse stx
    [(_ ([elem-name seq]) computation)
     #'(if (list? seq)
         (map (λ (elem-name) computation) seq)
         (error 'robust-for/list1 "Expected list, given ~e" seq))]))

Although this macro now errors appropriately and does not expose that it uses map. It has a fundamental flaw. Considering the following use:
(robust-for/list1 ([x (begin (displayln "Launch the missiles!")
                             (list 1 2 3))])
                  (add1 x))
This program prints out Launch the missiles! two times, because each time seq appears in the macro expansion, it is evaluated. It is essential that we instead write the expansion as:
(define-syntax (robust-for/list2 stx)
  (syntax-parse stx
    [(_ ([elem-name seq]) computation)
     #'(let ([seq-v seq])
         (if (list? seq-v)
           (map (λ (elem-name) computation) seq-v)
           (error 'robust-for/list2 "Expected list, given ~e"
                  seq-v)))]))

You may ask why we don’t incorporate this test into the functionalization of robust-for/list1 so that the value of seq is received. This is a good idea, but we’ll see a much better idea in a moment.

This macro still has an annoying problem, however. It does not reveal which module is at fault for violating the pre-conditions on robust-for/list2. In other words, we are not using the racket/contract system well. Let’s pause for a moment and look at the other problem.

5.3 Macros Robust at Compile-Time

Although robust-for/list2 is robust at run-time, it is not robust at compile-time. We are not enforcing that elem-name is a syntactically valid identifier. We can do this by performing a check before returning the expansion.

(define-syntax (robust-for/list3 stx)
  (syntax-parse stx
    [(_ ([elem-name seq]) computation)
     (unless (identifier? #'elem-name)
       (raise-syntax-error #f "Element name must be identifier"
                           stx #'elem-name))
     #'(let ([seq-v seq])
         (if (list? seq-v)
           (map (λ (elem-name) computation) seq-v)
           (error 'robust-for/list3 "Expected list, given ~e"
                  seq-v)))]))

This code requires some commentary.

Consider the expression (identifier? #'elem-name). Why could we not have written (identifier? elem-name)? elem-name is not a variable in the robust-for/list3 program. Instead, it is a pattern variable in the first pattern clause. Thus, the name is only valid inside of syntax templates. Thus, to refer to the syntax provided, we must use syntax/#'.

Next, look at the next line. raise-syntax-error is a special version of error that is optimized for syntactic errors. Like error, it takes as arguments the name of the erroring procedure and a message. However, it also takes two arguments that evaluate to syntax. The first is the "large" piece of syntax that contains the error; in this case, the offending usage of robust-for/list3. The second is the "small" piece of syntax that is at fault; in this case, the actual syntax given for elem-name. It is far better to use raise-syntax-error than to use error, because Racket IDEs (like DrRacket) will hone their error highlighting with this extra information.

However, robust-for/list3 still has a problem. By using unless in the template position of the clause, we do not cooperate with syntax-parse to identify syntax that does not match the pattern expected by the macro. This is a subtle point. It is the job of syntax-parse to determine if the input syntax satisfies this clause. If we perform extra tests after it passes the pattern, then we are obscuring the conditions from syntax-parse. In this particualr case, it is not an issue, but if robust-for/list3 had other cases, we would want those cases to be attempted to match before signifying an error. (For example, in the real for/list, the elem-name position is either a single identifier or a list of identifiers. If it is a list of identifiers then the sequence is assumed to contain multiple-values per iteration. We won’t complicate our example with these details, but we’ll act as though they are there and robustify anyways.)

In this case, it is a fairly trivial change to inform syntax-parse of the condition.

(define-syntax (robust-for/list4 stx)
  (syntax-parse stx
    [(_ ([elem-name seq]) computation)
     #:fail-unless (identifier? #'elem-name)
     "Element name must be identifier"
     #'(let ([seq-v seq])
         (if (list? seq-v)
           (map (λ (elem-name) computation) seq-v)
           (error 'robust-for/list4 "Expected list, given ~e"
                  seq-v)))]))

syntax-parse clauses may contain an optional list of pattern directives. #:fail-unless is one such directive. You provide a condition to check and a message to provide as an error if the condition does not hold. Since we have now provided syntax-parse with more information about how to satisfy this clause’s pattern, it will fail to match the cause if elem-name is not an identifier. If no clause other is satisfied, then this clause’s error will be displayed to the macro client.

However, the error will no longer single out elem-name as being at fault (unlike our use of raise-syntax-error). This is because the purpose of the #:fail-unless pattern directive is to find inconsistencies between different pieces of the syntax. syntax-parse provides another mechanism for making more minute specifications about the form of particular syntactic elements: syntax classes.

5.4 Classy Syntax

syntax-parse clause patterns supports annotations on pattern variables that specify which syntactic category (i.e. class) they belong to. The notation for these annotations is pattern-variable:syntax-class. Its pattern matching algorithm will use these when parsing the macro application. If the invariants are not satisfied, then it will generate a high-quality error message, pin-pointing the location where the violation occured.

The simplest, and most common, syntax class is id. By writing elem-name:id in the pattern rather than elem-name, we declare that the position is only for an identifier.

(define-syntax (robust-for/list5 stx)
  (syntax-parse stx
    [(_ ([elem-name:id seq]) computation)
     #'(let ([seq-v seq])
         (if (list? seq-v)
           (map (λ (elem-name) computation) seq-v)
           (error 'robust-for/list5 "Expected list, given ~e"
                  seq-v)))]))

syntax-parse comes with pre-defined syntax classes for most of the core syntactic elements of Racket. For example, number is a class that recognizes literal numbers, regexp is a class that recognizes literal regular expressions, string recognizes strings, and nat recognizes exact non-negative integers.

Whenever you define a macro that has expectations for the form of syntax components, you should always look for, or define, a syntax class that corresponds to the expectations.

5.5 Analysis in Syntax Classes

Syntax classes do not simply refine the input syntax of macros. They can also analyze that syntax and synthesize new information about it. This feature will allow us to solve the problem with seq elegantly.

Like with the transition from define-simple-macro to define-syntax + syntax-parse, by switching to a different notation in our macro, we will be able to do something more powerful. The annotation on elem-name is a short-hand for a #:declare pattern directive.

(define-syntax (robust-for/list6 stx)
  (syntax-parse stx
    [(_ ([elem-name seq]) computation)
     #:declare elem-name id
     #'(let ([seq-v seq])
         (if (list? seq-v)
           (map (λ (elem-name) computation) seq-v)
           (error 'robust-for/list6 "Expected list, given ~e"
                  seq-v)))]))

It is almost never a good idea to use the #:declare directive when the syntax class is a built-in one like this, because there is no extra functionality offered. However, with a #:declare directive, the syntax class position is an expression, not an identifier. (It is unfortunate that our example is for the id syntax class. When we say "identifier" here, we don’t mean the id class, but the fact that id is an identifier, rather than an expression that evaluates to a syntax class.)

syntax-parse comes with a built-in syntax class for expressions that must satisfy a runtime contract called expr/c. We can use it with #:declare as:

(define-syntax (robust-for/list7 stx)
  (syntax-parse stx
    [(_ ([elem-name:id seq]) computation)
     #:declare seq (expr/c #'list?)
     #'(map (λ (elem-name) computation) seq)]))

expr/c takes as an argument an expression that evaluates at runtime to the contract the expression should satisfy. In this case, that expression is the identifier list?. When you use expr/c, you should generally ensure that the contract is bound to an identifier for performance.

However, we are not done, because syntax classes cannot change the way that pattern variables are bound. That is, by simply using expr/c like this, we are not actually checking anything yet. To do so, we need to change our expansion, just like before when we added the if and error. In this case, the syntax class provides an attribute on seq called c which is seq protected by a contract.

(define-syntax (robust-for/list8 stx)
  (syntax-parse stx
    [(_ ([elem-name:id seq]) computation)
     #:declare seq (expr/c #'list?)
     #'(map (λ (elem-name) computation) seq.c)]))

An attribute is syntax-parse’s mechanism for communicating the results of the analysis that a syntax class does to the client of the class (in this case, our robust-for/list8 macro.) Attributes are accessed by annotating the pattern variable reference with a dot and then the name of the attribute. So, we write seq.c. As usual, we could have written this in a long way:

(define-syntax (robust-for/list9 stx)
  (syntax-parse stx
    [(_ ([elem-name:id seq]) computation)
     #:declare seq (expr/c #'list?)
     #:with seq-w/-ctc (attribute seq.c)
     #'(map (λ (elem-name) computation) seq-w/-ctc)]))

Sometimes it is useful to access attributes via the special attribute form, but not in this case.

We are now done with our journey robustifying this macro. It is useful to stop and reflect on where we started and where we ended up. As it turns out, syntax classes like we have used can actually be written with define-simple-macro.

(define-simple-macro (simple-for/list0 ([elem-name seq]) computation)
  (map (λ (elem-name) computation) seq))
 
(define-simple-macro (robust-for/list9 ([elem-name:id seq]) computation)
  #:declare seq (expr/c #'list?)
  (map (λ (elem-name) computation) seq.c))

Thus, the robust version of our macro is extremely close to the original version, but does much much more. This is the power of Racket macros.

5.6 Defining Syntax Classes

Although the built-in syntax classes (e.g. id and expr/c) are very useful, it is sometimes necessary to define your own syntax classes. You should do so whenever there is a coherent syntactic category that can be defined.

For example, consider the let form. Let’s write our own version of this macro.

(define-simple-macro (our-let0 ([x:id xe:expr] ...) body ...+)
  ((λ (x ...) body ...) xe ...))
 
(our-let0 ([x 3] [y 4]) (+ x y))

The binding pairs inside of our-let are an example of a coherent category. We could define a new syntax class called binding for this. We do so by writing a begin-for-syntax block, then using define-syntax-class.

(begin-for-syntax
  (define-syntax-class binding0
    (pattern [x:id xe:expr])))

If we use this syntax class, then the sub-pattern variables (x and xe) automatically become attributes of the binding, so to use it we write:

(define-simple-macro (our-let1 (b:binding0 ...) body ...+)
  ((λ (b.x ...) body ...) b.xe ...))
 
(our-let1 ([x 3] [y 4]) (+ x y))

One powerful aspect of syntax classes is that we can add variants to the class without modifying the macro clients. For example,

(begin-for-syntax
  (define-syntax-class binding1
    (pattern [x:id xe:expr])
    (pattern [xe:expr #:as x:id])))
(define-simple-macro (our-let2 (b:binding1 ...) body ...+)
  ((λ (b.x ...) body ...) b.xe ...))
 
(our-let2 ([x 3] [4 #:as y]) (+ x y))

5.7 Synthesis in Syntax Classes

Syntax classes can mention other classes, as we do here with binding1 mentioning id. Sometimes it can be very useful to build towers of such classes. For example, our our-let2 macro is actually flawed because it is illegal to have a λ with multiple arguments of the same name. Thus, we expose the implementation when a user writes an instance like:

(our-let2 ([x 3] [x 4]) (+ x x))
; => ERROR: lambda: duplicate argument name at: x

We could introduce a new syntax class for a sequence of bindings where all the binders are unique.

(begin-for-syntax
  (define-syntax-class unique-bindings0
    (pattern (b:binding1 ...)
             #:fail-when
             (check-duplicates
              (syntax->list #'(b.x ...))
              free-identifier=?)
             "Duplicate binding")))

However, this class is not very useful because it does not expose the attributes of the binding1 syntax class. In other words, we cannot define our-let3 as:

(define-simple-macro (our-let3 bs:unique-bindings0 body ...+)
  ((λ (bs.x ...) body ...) bs.xe ...))

or even as

(define-simple-macro (our-let3 bs:unique-bindings0 body ...+)
  ((λ (bs.b.x ...) body ...) bs.b.xe ...))

Instead, we must write the unique bindings syntax class differently. It must explicitly define the attributes that it exposes as part of its documented interface. The easiest way to do this is to use a #:with pattern directive.

(begin-for-syntax
  (define-syntax-class unique-bindings1
    (pattern (b:binding1 ...)
             #:with (x ...) #'(b.x ...)
             #:with (xe ...) #'(b.xe ...)
             #:fail-when
             (check-duplicates
              (syntax->list #'(b.x ...))
              free-identifier=?)
             "Duplicate binding")))
 
(define-simple-macro (our-let3 bs:unique-bindings1 body ...+)
  ((λ (bs.x ...) body ...) bs.xe ...))
 
(our-let3 ([x 3] [x 4]) (+ x x))
; => ERROR: our-let3: Duplicate binding
 
(our-let3 ([x 3] [4 #:as y]) (+ x y))

We now have a reusable syntax class that we will find useful in our future macrology. For example, we could write a macro that puts the bindings after the body, like a Haskell-style where.

(define-simple-macro (where body . bs:unique-bindings1)
  ((λ (bs.x ...) body) bs.xe ...))
 
((+ x y)
 . where .
 [x 3]
 [4 #:as y])

5.8 What we did not cover

There is a lot more that we can say about syntax classes. Here’s a very brief summary of a few points.

How do we expose attributes of a syntax class if there are multiple variants that define different sub-pattern variables? The define-syntax-class form allows an explicit list of attributes to be named outside of any pattern and all patterns are checked to ensure they bind these.

Are attributes always bound to sub-pieces of syntax, like in our examples? Absolutely not. We can use attributes to bind syntax that the class synthesized via syntax template substitution. We can even use attributes to hold arbitrary information and data structures, not just syntax objects.

Can we define syntax classes where the pattern is not surrounded by parentheses? Yes, these are called "splicing syntax classes".

There are many many other subtle points and extensions that make syntax classes yet another powerful weapon in the macrologist’s arsenal.