17.4 Better Performance through Smarter Compilation

The compilation strategy used in Untyped, Slow Regular Expressions constructs regular expressions as part of the matching process. That is, it calls Racket’s regexp function only after it gets the string to match against. The regexp function, however, is expensive when compared to the cost of actually matching the string, so we would like to be able to call it earlier, in preparation for matching, and save the result to use multiple times. Happily, Racket’s regular expressions are values that can cross phases (just like numbers and booleans), so we can call the regexp function during the macro expansion process and then simply expand into code that contains the regular expression value.

Computing the regular expression at compile time results in about a 30x speed up for this expression.
(time
 (for ([x (in-range 100000)])
   (r^match (r^seq (r^range #\a #\d) (r^range #\x #\z)
                   (r^range #\a #\d) (r^range #\x #\z))
            "bycz")))

To achieve this speed up, first go back to the untyped version of the regular expression matcher and focus on to-r^. Instead of a macro that pushes itself down into its arguments, we can turn it into a compile-time function that constructs something similar to the r^ struct, but where the procedure field is a syntax object containing code that will extract the result from the given match directly.

The test cases from the original solution should also all pass here.

Solution