1 Language-Oriented Programming

7.4.0.4

← prev up next →

1 Language-Oriented Programming

Matthias Felleisen

Goals

— what this week is about: language-oriented programming

— how we will approach it: designing languages in Racket

1.1 Welcome to the 2019 Racket School of Programming Languages

1.1.1 Who are we? What are doing here?

For now, we just say “hello” and at the end of this first lecture, we’ll say what we’re doing here.

1.1.2 Who are you? Why are you here?

Please introduce yourself to the Racket team and your peers. In the past we have found that these introductions often intrigue others about your background and open connections. So in this spirit, please

tell us who you are, perhaps including a special tidbit about yourself
what you do for a living and how it related to Racket
why you chose to come to Racket week.

1.2 One Project, Many Programming Languages

It has become common that a software project employs many different programming languages. A “full stack” web application has software running in the web browser and the web server, and this software is rarely written in the same language. Furthermore the server is hardly ever a monolithic piece of software in one language. It typically glues together code that interprets requests (from the network), turns them into database queries, checks some business logic, and many more actions. Each of them might have been written in a different language (and possibly in a different era).

A few years ago a colleague from IBM presented (roughly) the following stack during his keynote address at POPL:

Programming with multiple external programming languages has been a reality for decades. It still is. Here are reasons why projects use many languages:

history—someone started a project in a language, and the language falls out of favor.
platform—a new platform appears and demands attention. The new platform (terminal, pc, browser, mobile) does not support the language in which the software is currently written.
expressiveness, productivity—on rare occasions, a team of developers can convince the project manager that a new programming language will make them more productive than the one that has been used so far.

There are probably more such reasons but, regardless, the only way to add a new component in a new language comes with a singular condition: there must be a “natural” way to separate the two components. Usually “natural” means the two components may communicated via an easy-to-use FFI or they are forced to communicate via some form of input/output anyways (say, the network for a view written for a web browser).

While one could call this form of programming, “language-oriented programming” this is a wider notion than the one we care about. What we do care about is the rationalization that different languages make solving different problems easier. And programming language designers have long ago recognized this fact, as a look at any programming language from the last 30 or 40 years shows.

1.3 One Programming Language, Many Languages

Almost every modern programming language comes with several distinct sub-languages that deal with distinct programming domains. Let’s look at a couple of simple ones in Racket.

The first one is the familiar one of format strings:

"~a :: ~a\n"

By itself, such a format string is pointless. But, a programming language supports interpreters for such strings to facilitate the rendering of values into strings for output devices:

; formatting strings to prepare for printing
(printf "~a :: ~a\n" "hello" 'String)

The printf function really plays the role of an interpreter for a program written as a string, whose inputs is a sequence of arbitrary Racket values. Of course, neither the Racket compiler nor the IDE understand the program because it is a string. Hence they can’t statically analyze it (well) and the developer is left without much assistance.

In Racket there are several interpreters for such embedded string-programs:

(format "~a :: ~a\n" "hello" 'String)

And this is a common phenomenon.

The language of regular expressions is a second string-based example that is equally common in modern languages. Many (all?) modern programming languages come with functions that interpret certain strings as regular expressions and matching such expressions against strings:

; String -> False of [List String Char-String Char-String])
(define (extract-digits year)
(regexp-match "20(.)(.)" year))

This is a function that extracts the last two digits of a string that represents a 21st-century year:

(extract-digits "2018")
(extract-digits "1999")

Again, regexp-match servers as an interpreter for an embedded program here. Racket comes with different embedded languages of regular expressions, and some facilitate solving this problem even more than plain regular-expressions-as-strings:

; String -> False of [List String Digit-String Digit-String])
(define (extract-digits-version-2 year)
(regexp-match #px"20(\\d)(\\d)" year))

In the “#px” language of regular expressions, we know that “\d” really matches just digits so this version of the function is “more correct” than the previous one.

A programming language does not have to use strings to represent embedded programs. The Racket-based teaching languages for “How to Design Programs” supply a domain-specific language for dealing with events. We can use this language inside of regular Racket programs:

; dealing with events from the environment

(require 2htdp/universe)
(require 2htdp/image)

(define (main s0)
  (big-bang s0
    [on-tick   sub1]
    [stop-when zero?]
    [to-draw   (lambda (s) (circle (+ 100 (* s 10)) 'solid 'red))]))

Run (main 40) and watch how this program deals with clock-tick events.

Writing down a keyword such as on-key or even a complete on-key clause outside a big-bang context is a syntactic error:

[on-key (lambda (s ke) (if (key=? " " ke) (stop-with s) s))]

Moving this clause inside the above big-bang allows us to stop the shrinking-circle animation in mid-sequence. Note how the JavaScript world has developed many such domain-specific embedded framework-languages to deal with events.

Finally, Racket—like many modern functional languages—supports (algebraic) tree-matching:

(define simple-tree '(a 1 2 3))

(match simple-tree
  [`(a ,(? number? x) ,y) (+ x y)]
  [`(a ,x ,y ,z) (* (+ x y) z)]
  [else "error"])

A match expression consists of an expression followed by sequence of match clauses. Just as big-bang clauses, these match clauses are a brand-new category of syntactic things that Racket programmers can write down, once match becomes available. Each match clause consists of a match pattern followed by any number of Racket expressions. And again, patterns are a new syntactic category, not comparable to anything that exists in Racket. But, they allow escapes to arbitrary Racket code as the (? number? x) pattern shows. (Naturally, the racket code in such sub-patterns could use match again.) Again, the JavaScript world supports its own ways to match and extract elements of the most important tree—the DOM—without writing manual traversal functions; for example, jQuery treats the DOM as a database from which programs may retrieve certain classes of nodes.

Of course there are other embedded languages that most programming languages have to support in this day and age, with database queries being an important one.

Think What kind of embedded domain-specific languages does your favorite programming language support?

Language designers accept that code communicates ideas (about problems and solutions) across time and that clear expression of ideas greatly facilitates communication. They therefore include these sub-languages because they know that these niche problem areas in programming—from preparing a string for printing to querying a database—have their own ways of describing solutions.

The advantage of sub-languages over external languages is clear: combining such special-purposed languages into a coherent whole is much easier than linking programs via input/output code:

Composition is a mere syntactic act.
Computation is accomplished via translation into the host.
Communication is easy because embedded programs compute host values. Of course, this form of communicating poses its own problems.

In short, internal languages take away a lot of the pain of program linking.

But normally language designers do not enable software developers to create languages for niche application areas. Racketeers do, because they trust programmers.

1.4 One Racket Programmer, Many Languages

Racket translates these insights into an explicit design goal:

Racket empowers developers to add (sub)languages, and the process of adding these languages to the existing eco-system is free of any friction.

We call this language-oriented programming (LOP).

Racket supports a large spectrum of LOP in a reasonably friction-free and productive manner. The key is its API for the front-end of its implementation, that is, the syntax system, the ability to write compile-time functions, and the possibility to hook such functions into the compiler.

As a result, Racket is easy to extend. Adding new syntactic forms is just a matter of writing compile-time functions. You would write such functions because you want to abstract over recurring patterns in your code that cannot be abstracted over with functions (or other means of conventional abstraction).

Here is an example of two similar syntactic phrases for which functional abstraction doesn’t work:

(define (bigger-string x)

(number->string (add1 x)))

(provide

(contract-out

[bigger-string (-> number? string?)]))

(define (smaller-than-5 x)

(< x 5))

(provide

(contract-out

[smaller-than-5 (-> number? boolean?)]))

But syntactic abstraction will work, and we will teach you how.

Racket’s notion of language extension goes back to the primitive Lisp macros from 1964. Unsurprisingly, the idea has been thoroughly studied in the intervening 55 years, and Racketeers have advanced it more than any other language community.

One direction of advancement concerns the creation of language modules. Like all modern languages, Racket supports modules and a modular style of programming. Unlike in other languages, a Racket programmer chooses the programming language for each module in a software base independently of what languages the other components are written in. Conversely, a language is just a module. So to construct languages we write modules and to write modules we use languages. This notion of langauge-modules is key to writing software systems in Racket.

Here are two modules that use languages other than plan racket:

#lang datalog

edge(a, b).

edge(b, c).

edge(c, d).

edge(d, a).

path(X, Y) :- edge(X, Y).

path(X, Y) :-

edge(X, Z),

path(Z, Y).

path(X, Y)?

#lang typed/racket

(provide string->er)

(: string->er (String -> (U Exact-Rational False)))

(define (string->er s)

(define r

(parameterize ([read-decimal-as-inexact #f])

(string->number s)))

(and (rational? r) (exact? r) (ann r Exact-Rational)))

Creating and experimenting with such languages has become straightforward in the Racket eco-system.

Where there are languages, people will ask for types. The Racket story of typed domain-specific languages is still in flux but we have one now. The Turnstile system allows programmers to write down the type system within the syntax extension system, and they get a typed language. But, this is where the Racket world is pushing the boundaries. Turnstile is a research prototype, and that’s why this week is a research summer school.

The final stage of LOP concerns the creation of embedded languages. This lecture demonstrated two languages embedded at the fine-grained level of expressions: dealing with events and matching algebraic patterns. These languages exist in all kinds of programming languages, but in all but one they have to be built into the compiler. In Racket, such languages are libraries.

Building languages that interleave with Racket expressions is possible but our infrastructure remains somewhat primitive for this area. Like the above language efforts, we want the creator of such embedded languages to inherit as many “good things” from Racket as possible—we dub this linguistic inheritance—because this reduces the overhead of creating such languages. One particular idea embedded languages benefit from is extensibility. Yes, we want embedded domain-specific languages to be as extensible as Racket itself, and we can achieve this in some ways.

Here is an example concerning algebraic matching. The grammar production of match patterns is extensible:

(define (private-adder x) (map add1 x))

(define-match-expander adder
  (lambda (stx)
    (syntax-parse stx
      [(_ x ...) #'(? (curry equal? (private-adder (list x ...))))]))

  (lambda (stx)
    (syntax-parse stx
      [(_ x ...) #'(private-adder (list x ...))])))

This extension allows the expression (adder 1 2 3) to mean one thing in a Racket expression context:

(adder 1 2 3)

and something completely different in a Racket pattern context:

(match '(2 3 4)
[(adder 1 2 3) "success!"]
[else "** failure **"])

Now we can explain what the following days, lectures and labs, will teach you.

1.5 Who are we? What are doing here?

This summer school presents the tools for creating languages in a bottom-up, back-to-front manner:

Day 1 Matthias will review (for some participants, introduce) a simple model of Racket’s front-end implementation.
Jay will show how to use it to define language extensions.
Day 2 Jay will cover more advanced techniques using Racket’s syntax system, including issues of scope and phasing.
Day 3 Matthew will use the tools from the previous two days to build domain-specific languages.
Day 4 Jesse will show you how to equip domain-specific languages with type systems, in theory and practice.
Day 5 Robby will set up an extended lab that will give you a glimpse at fine-grained embedded languages.
We will wrap up the summer school with a presentation of a language gem by Robby and some concluding words by Matthias.

← prev up next →

1	Language-Oriented Programming
2	Macro Expansion
3	Language Extensions via Macros
4	Lab Simple Macros
5	Advanced Racket Macros
6	Lab Syntax Patterns & Classes
7	Lexical Scope, (Un)Hygienic Macros
8	Lab Comfortable Macros
9	Modules, Macros, Languages
10	Lab Languages via Macros
11	More Language Variations
12	Lab Languages and Readers
13	Types and Type Checking
14	Typed Languages with Turnstile
15	Lab My First Turnstile Language
16	Two Techniques
17	Extended Lab Regexps
18	Some Language Gems
19	Closure

1.1	Welcome to the 2019 Racket School of Programming Languages
1.2	One Project, Many Programming Languages
1.3	One Programming Language, Many Languages
1.4	One Racket Programmer, Many Languages
1.5	Who are we? What are doing here?