9 Modules, Macros, Languages

7.4.0.4

9 Modules, Macros, Languages

Matthew Flatt

As a new running example for building languages in Racket, let’s look at a toy shell language. You can run external programs in Racket using functions like find-executable-path and system*, but it’s not nearly as convenient as running external programs in a language like bash. We can make a shell language that’s more streamlined like bash for running programs, but that has enough parentheses to make it beautiful. We’ll call it “pfsh” for “parenthesis-friendly shell.” Pronunciation: the “p” in “pfsh” is silent.

"demo.rkt"
#lang pfsh

; List all in the current directory
(ls -l)

; Hello, world!
(echo "Hello, world!")

; Hello to me
(define me (whoami))
(echo -n Hello to me)

; Count how many things are in the current directory
(define l (ls))
(wc -l < l)

; Run some other program in our path
(racket)

This language looks something like Racket, but identifiers behave differently. When an identifier appears after an open parenthesis, it is normally treated as an external program name, instead of a reference to a definition. When an identifier is in an argument position, it turns into a string argument—but a reference to a defined name gets the string from the output of the command used to define the name. The language is sufficiently like Racket that we’ll be able to mix in some Racket functionality, but it doesn’t start out with any of Racket’s usual constructs; when we write our own language, we get full control.

To get started on this language, we have to first learn about Racket modules and about how #lang turns into a module import.

9.1 Defining Modules

Let’s create a run function that combines find-executable-path and system* so that, for example,

(run "ls" "-l")

lists the content of the current directory in long format. We’ll put run in its own module, so we can use it in multiple programs. Here’s the module:

"run.rkt"
#lang racket

(provide run)

(define (run prog . args)
  (apply system* (find-program prog) args))

(define (find-program str)
  (or (find-executable-path str)
      (error 'pfsh "could not find program: ~a" str)))

This module defines run and uses the provide form to export it for use by other modules. The module also defines a find-program helper function, but that function is not exported for external use.

To use run, another module imports it with require. Assuming that "use-run.rkt" is in the same directory, we can reference the "run.rkt" module using a relative path:

Windows users: try (run "cmd.exe" "/c" "dir"), instead.

"use-run.rkt"
#lang racket
(require "run.rkt")

(run "ls" "-l")

If we want to be able to write (run ls -l), then we can’t implement run as a function, because the run form’s pieces are not expressions in the usual Racket sense. Of course, we can implement the revised run as a macro in terms of the run function that we have:

"pfsh-run.rkt"
#lang racket
(require "run.rkt"
         (for-syntax syntax/parse))

(provide (rename-out [pfsh:run run]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    [(_ prog:id arg:id ...)
     #'(run (symbol->string 'prog) (symbol->string 'arg) ...)]))

"use-pfsh-run.rkt"
#lang racket
(require "pfsh-run.rkt")

(run ls -l)

Note how "pfsh-run.rkt" defines pfsh:run but renames it to run when exporting via provide. That way, the implementation of the macro can refer to the run function imported from "run.rkt". The macro system manages bindings properly to ensure that run in the expansion of pfsh:run will always refer to the run function, even if pfsh:run is used in a module like "use-pfsh-run.rkt" where run does not refer to the function.

9.2 From Modules to Languages

The #lang that starts a Racket-program file determines what the rest of the file means. Specifically, the identifier immediately after #lang selects an meaning for the rest of the file, and it gets control at the character level. The only constraint on a #lang’s meaning is that it denotes a Racket module that can be referenced using the file’s path.

The implementation of a language doesn’t have to go all the way from characters to the machine-code representation of a module, however. Instead, it compiles the module text to a syntax object that represents a primitive module form.

It turns out that you can write the primitive module form directly in DrRacket. If you leave out any #lang line and write

(module example racket
(#%module-begin
(+ 1 2)))

then it’s the same as

#lang racket
(+ 1 2)

and if you write the latter form, then it essentially turns into the former form. Both forms have the same (+ 1 2) because #lang racket uses the native syntax for the module body.

Technically, there’s a difference in intent in the above two chunks of text showing programs. In the second case witth #lang, the parentheses are meant as actual parenthesis characters that reside in a file. In the first case with module, the parentheses are just a way to write a text representation of the actual value, which is a syntax object that contains a lists of syntax objects that contain symbols, and so on. A language implementation has to actually parse the parentheses in the second block of code to produce the first.

Other languages show a bigger difference between the #lang and module forms. For example,

#lang scribble/base
Hello, world!

turns into

(module example scribble/base/lang
(#%module-begin
(doc-begin doc values () "\n" "Hello, world!" "\n")))

You can see that the Hello, world! text and even the newlines have been turned into syntax-object strings here, but not much else has happened. In general, that’s a good strategy for a #lang: perform just enough parsing to get into syntax objects, and then use macros to finish the language’s compilation.

We’ll define pfsh so that the original pfsh example program corresponds to

(module example pfsh
  (#%module-begin
   (ls -l)
   (echo "Hello, world!")
   (define me (whoami))
   (echo -n Hello to me)
   (define l (ls))
   (wc -l < l)
   (racket)))

For now, we don’t want to bother parsing at the level of parentheses, so we’ll actually write

"example.rkt"
#lang s-exp "pfsh.rkt"
(define me (whoami))
(echo -n Hello to me)

The s-exp language doesn’t do anything but parse parentheses into syntax objects. For this example, it directly generates the syntax object

(module example "pfsh.rkt"
  (#%module-begin
   (define me (whoami))
   (echo -n Hello to me)))

Without creating a "pfsh.rkt" file, copy the #lang s-exp "pfsh.rkt" example into DrRacket and click the Macro Stepper button. The stepper will immediately error, since there’s no "pfsh.rkt" module, but it will show you the parsed form.

which is half-way to where we want to be: the define and echo syntax objects are still here to be expanded by macros, but we no longer have to worry about parsing characters. (The change from pfsh to "pfsh.rkt" just lets us work with relative paths, for now, instead of installing a pfsh collection.)

9.3 The Core module Form

The core module grammar is

Module

(module name initial-import-module

(#%module-begin

form ...))

(module _name _initial-import-module

form ...)

The second variant is a shorthand for the first, and it is automatically converted to the first variant by adding #%module-begin.

For a module that comes from a file, the name turns out to be ignored, because the file path acts as the actual module name. The key part is initial-import-module. The module named by initial-import-module gives meaning to some set of identifiers that can be used in the module body. There are absolutely no pre-defined identifiers for the body of a module. Even things like lambda or #%module-begin must be exported by initial-import-module if they are going to be used in the module body’s forms.

If require is provided by initial-import-module, then it can be used to pull in additional names for use by forms. If there’s no way to get at require, define, or other binding forms from the exports of initial-import-module, then nothing but the exports of initial-import-module will ever be available to the forms.

Since every module for has an explicit or implicit #%module-begin, initial-import-module had better provide #%module-begin. If a language should allow the same sort of definition-or-expression sequence as racket, then it can just re-export #%module-begin from racket. As we will see, there are some other implicit forms, all of which start with #%, and initial-import-module must provide those forms if they’re going to be triggered.

Here is the simplest possible Racket language module:

"simple.rkt"
#lang racket
(provide #%module-begin)

Since "simple.rkt" provides #%module-begin, it’s a valid initial import. You can use it in the empty program

"use-simple.rkt"
#lang s-exp "simple.rkt"

as long as "use-simple.rkt" is saved in the same directory as "simple.rkt" (so that the relative path works). You can add comments after the #lang line, since comments are stripped away by the parser. Nothing else in the body is going to work, though. Actually, (#%module-begin) will work, since #%module-begin is bound and since s-exp relies on the implicit introduction of #%module-begin instead of adding it explicitly. That’s a flaw in s-exp.

9.4 A First Implementation of pfsh

It’s going to take a few steps to get to the pfsh language in all of its glory. As a first step, let’s create a variant "pfsh0.rkt" that has a run form to run an external program:

#lang s-exp "pfsh0.rkt"
(run ls -l)

Since that’s equivalent to

(module example "pfsh0.rkt"
(#%module-begin
(run ls -l)))

then we need to create a "pfsh0.rkt" module that provides #%module-begin and run. The run macro’s job is to treat its identifiers as strings and deliver them to the run function that we defined in "run.rkt":

"pfsh0.rkt"
#lang racket
(require "run.rkt"
         (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [pfsh:run run]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    [(_ prog:id arg:id ...)
     #'(void (run (symbol->string 'prog) (symbol->string 'arg) ...))]))

We’ve wrapped void around the call to run to suppress the success or failure boolean that would otherwise print after the run program’s output.

9.5 Compile Time and Run Time

Notice that pfsh so far has two parts:

the compile-time part that is about dealing with the syntax of the language, here implemented by the pfsh:run macro in "pfsh0.rkt"; and
the run-time part that is called by generated code, here implemented by the run function in "run.rkt".

Although we happen to have implemented the two parts in different modules, they don’t have to be different. We could just as well have put the run function’s implementation directly in "pfsh0.rkt":

"pfsh0-alt.rkt"
#lang racket
(require (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [pfsh:run run]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    [(_ prog:id arg:id ...)
     #'(void (run (symbol->string 'prog) (symbol->string 'arg) ...))]))

(define (run prog . args)
  (apply system* (find-program prog) args))

(define (find-program str)
  (or (find-executable-path str)
      (error 'pfsh "could not find program: ~a" str)))

At this point, it’s worth double-checking that we have appropriately sorted computation in the compile and run phases. Generally, it’s better to perform a computation at compile time instead of run time, if possible. In this case, the pfsh:run macro generates symbol->string expressions to convert symbols to strings at run time,It’s a good idea to let the compiler optimize away computations when it can. Unfortunately, symbol->string is defined to generate a fresh mutable string every time it’s called, and the compiler cannot tell that the freshness is unnecessary here, so it won’t optimize the symbol->string calls to literal strings. but that conversion could be performed at compile time, instead. Let’s improve pfsh:run to perform that work at compile time.

The most obvious way to move the computation is to immediately escape back to compile time in the result template for pfsh:run:

"pfsh1a.rkt"
#lang racket
(require "run.rkt"
         (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [pfsh:run run]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    [(_ prog:id arg:id ...)
     #`(void (run #,(symbol->string (syntax-e #'prog))
                  #,@(map symbol->string
                          (map syntax-e
                               (syntax->list #'(arg ...))))))]))

Alternatively, we can stay within the template language in pfsh:run and defer the compile-time escape to a helper macro:

"pfsh1.rkt"
#lang racket
(require "run.rkt"
         (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [pfsh:run run]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    [(_ prog:id arg:id ...)
     #`(void (run (as-string prog) (as-string arg) ...))]))

(define-syntax (as-string stx)
  (syntax-parse stx
    [(_ sym:id)
     #`#,(symbol->string (syntax-e #'sym))]))

← prev up next →

1	Language-Oriented Programming
2	Macro Expansion
3	Language Extensions via Macros
4	Lab Simple Macros
5	Advanced Racket Macros
6	Lab Syntax Patterns & Classes
7	Lexical Scope, (Un)Hygienic Macros
8	Lab Comfortable Macros
9	Modules, Macros, Languages
10	Lab Languages via Macros
11	More Language Variations
12	Lab Languages and Readers
13	Types and Type Checking
14	Typed Languages with Turnstile
15	Lab My First Turnstile Language
16	Two Techniques
17	Extended Lab Regexps
18	Some Language Gems
19	Closure

9.1	Defining Modules
9.2	From Modules to Languages
9.3	The Core module Form
9.4	A First Implementation of pfsh
9.5	Compile Time and Run Time