Friday, September 25, 2009

Recursive Compiling

Years ago, back when GCC was still on version 1.x, I went looking through the code. I don't remember why, but I was looking for a way to "just make a function." I had some notion that down in the bowels of the code there would be a function called "create_function" or some such, and I could pass it a return type and a name and who-knows-what-else, and it would return a function object. Or maybe it would automatically emit the function object. I don't remember what I was thinking at the time, I just remember being frustrated by the nest of snakes I uncovered, each function requiring a huge initialized data structure as input, making a small update to the tree, and returning.

The other day I was working on Close, and I needed to create a function for the namespace init code. And I thought, "what I need is a function that will just create this one thing for me." Deja vu!



So I wrote that function. And with the caveat that I knew exactly what the return type was going to be (void), and what the parameters were (none), and what the content would be (none), it still took more than 40 lines of NQP code to produce the result I wanted.

Why? Well, the compiler is built around the assumption that it is parsing input, so each function requires a node, operates on that node and maybe a few more, and so forth. So my "create a function" function had to create a declarator, then create a type specifier, then look up the type specifier in the symbol table, then link the declarator and specifier with a symbol, then convert the resulting function declaration into a function definition, et cetera, et cetera.

Eventually, the created function would be added to the namespace as the very first function in the namespace. And even later, every single one of those functions would be automatically discarded by the code marshaller because they had no code inside.

Wouldn't it be nice if ...
I spent some time on IRC, watching people who are way smarter than me make progress on Parrot, and I happened to mention to WhiteKnight that I wished I could just put the declaration of my function into a string and invoke the parser on the string. It was kind of a joke, and WhiteKnight cautioned me that misery and evil lay down that path, and that I should avoid it.

Well, as mentioned, I sat down and grunted out the fortysomething lines of NQP code that it took to create the namespace init subs. And boy, did that suck. (A whole bunch of functions, every single one of which is being used in a way that was not intended. What could go wrong?)

Today I changed my mind.

It turns out that the way Close processes #include directives is by reading in the contents of the file to a String, then passing that String to the compiler's 'compile' method with a note saying "don't do more than PAST generation."

Here's two functions from the Close include-file module that illustrate how it works. Ultimately, it's parse_text that does the work. And that function is a whopping 15 lines, most of which is scaffolding. Thanks, PCT, for making life easy.

sub parse_text($text) {
    my $result := Q:PIR {
        .local pmc parser
        parser = compreg 'close'
       
        .local string source
        $P0 = find_lex '$text'
        source = $P0
       
        %r = parser.'compile'(source, 'target' => 'past')
    };

    DUMP($result);
    return $result;
}

sub parse_include_file($node) {
    get_file_contents($node);
    my $contents := $node<contents>;

    if Scalar::defined($contents) {
        push($node.name());
        close::Compiler::Scopes::push($node);
       
        # Don't worry about the result of this. The grammar
        # updates the current scope, etc.
        parse_text($contents);
       
        close::Compiler::Scopes::pop('include_file');
        pop_include_file();
    }
   
    return $node;
}


It turns out that invoking the compiler recursively is the right thing to do. The code I had to write for include files, and the code I had to refactor to provide a callable interface for special-purpose internal naughtiness, is nowhere near as brittle as the code for creating a function directly. To say nothing of being way more maintainable. The include file code looks like "open a file, slurp the file, call the parser," which is nicely understandable. The create-a-function-from-scratch code looked like a horrible mess, especially in the light of a new day.

Working the kinks out of this code encouraged me to find a bunch of other places in the code where I was building data structures "by hand" to see if I could replace them with recursive parsing. So far, so good.

No comments:

Post a Comment