11 More Language Variations

7.4.0.4

11 More Language Variations

Matthew Flatt

Let’s continue implementing pfsh. We’ll start with a solution to exercise 24:

"pfsh3.rkt"
#lang racket
(require "run.rkt"
         racket/port
         (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [pfsh:run run]
                     [pfsh:define define]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    #:datum-literals (<)
    [(_ prog:id arg:id ... < stream:id)
     #'(with-input-from-string
         stream
         (lambda ()
           (pfsh:run prog arg ...)))]
    [(_ prog:id arg:id ...)
     #`(void (run (as-string prog) (as-string arg) ...))]))

(define-syntax (as-string stx)
  (syntax-parse stx
    [(_ sym:id)
     #`#,(symbol->string (syntax-e #'sym))]))

(define-syntax (pfsh:define stx)
  (syntax-parse stx
    [(_ stream:id expr)
     #'(define stream (with-output-to-string
                        (lambda ()
                          expr)))]))

11.1 The Application Form

So far, the biggest difference between the pfsh that we’ve implemented and the pfsh that we want is that we have to put run before every program name. Instead of (run ls), we want to write (ls).

Since macros can do any kind of work at compile time, you might imagine changing pfsh so that it scans the filesystem and builds up a set of definitions based on the programs that are currently available via the PATH environment variable. That’s not how scripting languages are meant to work, though. Also, it’s likely to cause trouble to use the filesystem and environment-variable state at such a fine granularity to determine bindings of a module.

Instead, we would like to change the default meaning of parentheses. In Racket, a pair of parentheses mean a function call by default. In pfsh, a pair of parentheses should mean running an external program by default. The “by default” part concedes that an identifier after an open parenthesis can change the meaning of the parenthesis, such as when define appears after an open parenthesis. Otherwise, though, it’s as if a function-call identifier appears after the open parenthesis to specify a function-call form... and function-call exists, except that it’s spelled #%app.

In other words, in the racket language, when you write

(+ 1 2)

since + is not bound as a macro or core syntactic form, that expands to

(#%app + 1 2)

The #%app provided by racket is defined as a macro that expands to the core syntactic form for function calls. That core form is also called #%app internally, but in the rare case that we have to refer to the core form, we use the alias #%plain-app.

To change the default meaning of parentheses for pfsh, then, we can rename pfsh:run to #%app on export:

"pfsh7.rkt"
#lang racket
....
(provide ....
(rename-out [pfsh:run #%app]
....))
....

After that small adjustment, we conceptually change each run in a pfsh module to #%app, but we don’t actually have to write the #%app, since it’s added automatically by the expander:

"pfsh7.rkt"
#lang s-exp "pfsh7.rkt"
(define l (ls))
(wc -l < l)

11.2 More Implicit Forms

You can have seen two implicit forms that a language can adjust, #%module-begin and #%app, so you may wonder how many implicit forms there are. The others are #%datum, #%top, and #%top-interaction.

11.2.1 #%datum

Try including a number in a "pfsh7.rkt" program like this:

#lang s-exp "pfsh7.rkt"
0

The complaint is “literal data is not allowed; no #%datum syntax transformer is bound.”

The #%datum form is implicitly wrapped around a literal constant such as 0, #true, or "apple" when it appears in a place where an expression is expected. Since the #%datum form always has a single subform, it takes advantage of a performance hack internally by being written with parentheses and a ., which corresponds to a non-list pair instead of a list; so, 0 is implicitly (#%datum . 0), and so on.

Let’s not allow numbers in pfsh, but let’s allow literal strings, which can be useful for piping to a program’s input. Since a literal string is useful as a program’s input, let’s also change #%app to allow any expression after a < redirection.

"pfsh8.rkt"
#lang racket
....

(provide #%module-begin
         (rename-out [pfsh:run #%app]
                     [pfsh:define define]
                     [pfsh:datum #%datum]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    #:datum-literals (<)
    [(_ prog:id arg:id ... < stream:expr)
     #'(with-input-from-string
         stream
         (lambda ()
           (pfsh:run prog arg ...)))]
    [(_ prog:id arg:id ...)
     #`(void (run (as-string prog) (as-string arg) ...))]))

....

(define-syntax (pfsh:datum stx)
  (syntax-parse stx
    [(_ . s:string) #'(#%datum . s)]
    [(_ . other)
     (raise-syntax-error 'pfsh
                         "only literal strings are allowed"
                         #'other)]))

Now, this program works:

"use-pfsh8.rkt"
#lang s-exp "pfsh8.rkt"
(wc -w < "a b c")

and a program that has a literal number reports a better error message.

11.2.2 #%top

If you use an identifier that isn’t provided by "pfsh8.rkt" and isn’t between parentheses,

#lang s-exp "pfsh8.rkt"
oops

then you’ll get a message that mentions #%top. The #%top form is wrapped around an identifier that has no binding.

We could improve the error for users so that it doesn’t mention the implicit name #%top:

(define-syntax (complain-top stx)
  (syntax-parse stx
    [(_ . x:id)
     (raise-syntax-error 'variable "misplaced" #'x)]))

Or we could go a different direction, which is to treat an unbound identifier anywhere the same as a string. Then, #%app doesn’t need to insert any symbol conversions. That’s the direction we take in "pfsh9.rkt":

"pfsh9.rkt"
#lang racket
(require "run.rkt"
         racket/port
         (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [pfsh:run #%app]
                     [pfsh:top #%top]
                     [pfsh:define define]
                     [pfsh:datum #%datum]))

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    #:datum-literals (<)
    [(_ prog arg ... < stream:expr)
     #'(with-input-from-string
         stream
         (lambda ()
           (pfsh:run prog arg ...)))]
    [(_ prog arg ...)
     #`(void (run prog arg ...))]))

(define-syntax (pfsh:top stx)
  (syntax-parse stx
    [(_ . sym:id)
     #`#,(symbol->string (syntax-e #'sym))]))

(define-syntax (pfsh:define stx)
  (syntax-parse stx
    [(_ stream:id expr)
     #'(define stream (with-output-to-string
                        (lambda ()
                          expr)))]))

(define-syntax (pfsh:datum stx)
  (syntax-parse stx
    [(_ . s:string) #'(#%datum . s)]
    [(_ . other)
     (raise-syntax-error 'pfsh
                         "only literal strings are allowed"
                         #'other)]))

A benefit of this strategy is that we can now use defined variables as program arguments. A bound identifier as an argument is replaced with its value, while an unbound identifier is converted to a string:

"use-pfsh9.rkt"
#lang s-exp "pfsh9.rkt"
(define me (whoami))
(echo Hello to me)

11.2.3 #%top-interaction

Finally, you may have noticed that when you run any of the working programs with "pfsh9.rkt" and earlier variants, DrRacket usually reports “Interactions disabled: language does not support a REPL (no #%top-interaction).”

The #%top-interaction form is wrapped around any expression entered into the interactions window, and DrRacket notices that it will never work in our pfsh implementations, so it doesn’t provide a prompt. We could enable the interactions window to have the same kinds of forms as a program by providing a #%top-interaction that just removes itself:

"pfsh9.rkt"
....

(provide ....
         (rename-out ....
                     [pfsh:top-interaction #%top-interaction]))

(define-syntax (pfsh:top-interaction stx)
  (syntax-parse stx
    [(_ . form) #'form]))

....

Now, when you run a program, you can keep interacting after the script completes:

"use-pfsh10.rkt"
#lang s-exp "pfsh10.rkt"
(echo Ready!)

11.3 Defining Functions

Our pfsh implementation can now run the original example script, but let’s go a little further. An advantage of a parenthesis-friendly shell is that we can mix in more of Racket to better support abstraction in a script. At a minimum, we’d like to be able to define and call functions in pfsh scripts:

"use-pfsh11.rkt"
#lang s-exp "pfsh11.rkt"
(define (double x)
(string-append x x))

(define l (ls -l))
(wc -l < l)
(wc -l < (double l))

It’s easy to make the string-append function available. It’s also easy to change define to match and distinguish function and stream shapes—but what should a function definition expand to? We have defined #%app so that it treats its “function” position as a string name of an external program. When an open parenthesis is followed by the name of a function that we have defined within the script, then we’d like the application form to mean a function call, instead.

Here are two ways to make the adaptation work:

We can change #%app so that it inspects an identifier in the “function” position to check whether the identifier is bound. If so, the #%app corresponds to a function call.
In this case, a define for a function can expand to a regular define.
We can make define bind a function name as a macro, so that using the name after an open parenthesis triggers a function-specific macro instead of the generic #%app form.
In this case, a define for a function needs to expand to define-syntax to bind the function name as a macro.

Slightly different behaviors fall out from each of these strategies. With the first strategy, an identifier that is bound to a string for a program name cannot be used to run the program, because using the identifier after an open parenthesis would trigger a function call. With the second strategy, a name bound to a string still works as a program name, but a function identifier doesn’t work as an argument to another function (unless we do a little more work to make that possible). Both approaches are viable, and either could be made to fit a preferred behavior, so let’s try both of them.

11.3.1 Detecting Bindings

To try the first strategy, we need #%app to recognize whether an identifier has a binding or not. Since the #%app macro receives only an immediate application form, how can it know what definitions are in the rest of the module? That is, although the #%app macro can do any work its wants at compile time, it doesn’t have a handle on the whole module to inspect it. The macro expander itself must know about bindings, because it uses binding information to determine which macro should handle an expansion. Happily for our #%app, the macro expander shares its binding information with macros in several ways, including through a identifier-binding function.

The identifier-binding function takes an identifier and reports #f if the identifier has no binding. Otherwise, it reports some information about the binding, such as which module (possibly the current one) contains a definition of the identifier. For our purposes, we do not care about the additional details, so we can just check whether identifier-binding returns #f.

Specifically, we add a new clause to the syntax-parse form in pfsh:app, where the clause has a #:when guard so that it only applies when identifier-binding produces a non-#f value:

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    #:datum-literals (<)
    [(_ prog arg ... < stream:expr)
     #'(with-input-from-string
         stream
         (lambda ()
           (pfsh:run prog arg ...)))]
    [(_ prog:id arg ...)
     #:when (identifier-binding #'prog)
     #'(prog arg ...)]
    [(_ prog arg ...)
     #`(void (run prog arg ...))]))

Meanwhile, pfsh:define recognizes a function definition and passes it on to racket’s define:

(define-syntax (pfsh:define stx)
  (syntax-parse stx
    [(_ stream:id expr)
     #'(define stream (with-output-to-string
                        (lambda ()
                          expr)))]
    [(_ (proc:id arg:id ...) expr)
     #'(define (proc arg ...) expr)]))

The string-append function can be provided as-is:

(provide ....
string-append)

11.3.2 Macro-Defining Macros

With the strategy where define binds a function name as a macro, we don’t have to change #%app. We just have to change pfsh:define to compile a pfsh function definition into a racket macro definition.

Here’s a first attempt, but it’s not right:

(define-syntax (pfsh:define stx)
  (syntax-parse stx
    [(_ stream:id expr)
     #'(define stream (with-output-to-string
                        (lambda ()
                          expr)))]
    [(_ (proc:id arg:id ...) expr)
     #'(define-syntax (proc stx)
         (syntax-parse stx
           [(_ arg ...) #'expr]))]))

This broken attempt directly substitutes the body of the pfsh function in place of a function call. That causes the expression to be “inlined” in every call, which is probably a bad idea. Even worse, it causes each argument expressions to be copied in place of every use of the argument in the body expression.

To solve those problems, even though a pfsh function definition needs to expand to a racket macro definition, we also want a racket function. So, a pfsh definition should expand to both:

(define-syntax (pfsh:define stx)
  (syntax-parse stx
    [(_ stream:id expr)
     #'(define stream (with-output-to-string
                        (lambda ()
                          expr)))]
    [(_ (proc:id arg:id ...) expr)
     #'(begin
         (define (actual-proc arg ...)
           expr)
         (define-syntax (proc stx)
           (syntax-parse stx
             [(_ arg ...) #'(actual-proc arg ...)])))]))

For example, the definition

(define (double x)
(string-append x x))

expands to

(define (actual-proc x)
  (string-append x x))

(define-syntax (double stx)
  (syntax-parse stx
    [(_ arg ...) #'(actual-proc arg ...)]))

Since the #'(actual-proc arg ...) form originates from a module in the racket language (i.e., the implementation of pfsh), it uses the normal #%app from racket, so the macro expansion is always a regular function call. Even though the function always is named actual-proc, the macro system will arrange for different bindings for different expansions, so it’s ok to define multiple functions in a pfsh script.

There’s a small catch using this approach. We can’t just export string-append for use in pfsh scripts, because string-append is not a macro. Instead, in the implementation of pfsh, we need to use pfsh:define to define a variant that is implemented with string-append and then provide the new variant:

(provide ....
         (rename-out ....
                     [pfsh:string-append string-append]))

(pfsh:define (pfsh:string-append arg1 arg2)
             (string-append arg1 arg2))

11.4 Installing a Language

Let’s take the last step in defining a language, which will let use switch from #lang s-exp "pfsh11.rkt" to #lang pfsh. To enable writing #lang pfsh, we must do two things:

Adjust our language implementation so that it explicitly specifies S-expression parsing, instead of having S-expression parsing imposed externally.
Install our language as a package so that #lang pfsh will work from anywhere.

The part of a language that specifies its parsing from characters to syntax objects is called a reader. A language’s reader is implemented by a reader submodule (i.e., a nested module) inside the language’s module. That submodule must export a read-syntax function that takes an input port, reads characters from it, and constructs a module form as a syntax object. For historical reasons, the submodule should also provide a read function that does the same thing but returns a plain S-expression instead of a syntax object.

Here’s one way to implement the reader suubmodule:

(module reader racket
  (provide (rename-out [pfsh:read-syntax read-syntax]
                       [pfsh:read read]))

  (define (pfsh:read-syntax name in)
    (datum->syntax #f `(module anything pfsh
                         (#%module-begin
                          ,@(read-body name in)))))

  (define (read-body name in)
    (define e (read-syntax name in))
    (if (eof-object? e)
        '()
        (cons e (read-body name in))))

  (define (pfsh:read in)
    (syntax->datum (pfsh:read-syntax 'src in))))

Notice that pfsh:read-syntax constructs a module that uses pfsh as the initial import. Otherwise, it doesn’t really do anything specific to pfsh, and most of the work is performed by the built-in read-syntax function that reads a single term (such an an identifier or parenthesized form) as a syntax object. In fact, since this pattern is so common, Racket provides a syntax/module-reader language that expects just the pfsh part and builds the rest of the submodule around that. #;

(module reader syntax/module-reader
pfsh)

In short, we just need to add those two lines to our current pfsh implementation, and then save it as "main.rkt" in a "pfsh" directory. Here’s the complete implementation:

"pfsh/main.rkt"
#lang racket
(require "run.rkt"
         racket/port
         (for-syntax syntax/parse))

(provide #%module-begin
         (rename-out [pfsh:run #%app]
                     [pfsh:top #%top]
                     [pfsh:define define]
                     [pfsh:datum #%datum]
                     [pfsh:top-interaction #%top-interaction]
                     [pfsh:string-append string-append]))

(module reader syntax/module-reader
  pfsh)

(define-syntax (pfsh:run stx)
  (syntax-parse stx
    #:datum-literals (<)
    [(_ prog arg ... < stream:expr)
     #'(with-input-from-string
         stream
         (lambda ()
           (pfsh:run prog arg ...)))]
    [(_ prog arg ...)
     #`(void (run prog arg ...))]))

(define-syntax (pfsh:top stx)
  (syntax-parse stx
    [(_ . sym:id)
     #`#,(symbol->string (syntax-e #'sym))]))

(define-syntax (pfsh:define stx)
  (syntax-parse stx
    [(_ stream:id expr)
     #'(define stream (with-output-to-string
                        (lambda ()
                          expr)))]
    [(_ (proc:id arg:id ...) expr)
     #'(begin
         (define (actual-proc arg ...)
           expr)
         (define-syntax (proc stx)
           (syntax-parse stx
             [(_ arg ...) #'(actual-proc arg ...)])))]))

(pfsh:define (pfsh:string-append arg1 arg2)
             (string-append arg1 arg2))

(define-syntax (pfsh:datum stx)
  (syntax-parse stx
    [(_ . s:string) #'(#%datum . s)]
    [(_ . other)
     (raise-syntax-error 'pfsh
                         "only literal strings are allowed"
                         #'other)]))

(define-syntax (pfsh:top-interaction stx)
  (syntax-parse stx
    [(_ . form) #'form]))

You’ll also need "run.rkt" in the same "pfsh" directory.

To install this as a package, select Install Package... from the DrRacket File menu, click the Browse button to select a Directory, and select the "pfsh" directory. Alternatively, run

raco pkg install pfsh/

on the command line—and beware that the trailing slash is necessary (otherwise, raco pkg will consult a remote server to look for a registered pfsh package).

After either of those steps, you can run

#lang pfsh
(echo Hello!)

← prev up next →

1	Language-Oriented Programming
2	Macro Expansion
3	Language Extensions via Macros
4	Lab Simple Macros
5	Advanced Racket Macros
6	Lab Syntax Patterns & Classes
7	Lexical Scope, (Un)Hygienic Macros
8	Lab Comfortable Macros
9	Modules, Macros, Languages
10	Lab Languages via Macros
11	More Language Variations
12	Lab Languages and Readers
13	Types and Type Checking
14	Typed Languages with Turnstile
15	Lab My First Turnstile Language
16	Two Techniques
17	Extended Lab Regexps
18	Some Language Gems
19	Closure

11.1	The Application Form
11.2	More Implicit Forms
11.3	Defining Functions
11.4	Installing a Language