tag:blogger.com,1999:blog-69268384955791608362023-10-19T13:31:44.828-07:00CloseNotes about Close compiler development.Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.comBlogger18125tag:blogger.com,1999:blog-6926838495579160836.post-23413772399875317152010-03-22T08:04:00.000-07:002010-03-22T08:04:45.470-07:00I'm back!With the completion of Kakapo release-10 a while back, I've been focusing on compiler work again. I have to say, Kakapo is making it smooth.<br />
<br />
The change to NQP-rx is causing all kinds of problems. Some of them are voluntary: I'm moving to attribute based classes, and I'm taking advantage of a bunch of features of the new -rx engine. But some of them are involuntary, like climbing the learning curve for how optable parsing works.<br />
<br />
I put up a <a href="http://trac.parrot.org/parrot/wiki/NQP-rx%20Operator%20Precedence%20Parsing">wiki page</a> about that over on the Parrot.org trac site, but even after writing out a bunch of stuff, it's still not totally obvious to me how some parts work. That's part of the learning process, I guess.<br />
<br />
One thing I'm doing somewhat differently now is testing. Before, I was writing tests for the compiler. Now I'm writing tests <i>of </i>the compiler. <br />
<br />
<a name='more'></a> I've got this much of the TDD approach figured out: find out what needs to be done, write some code to do it, and then bury it in tests.<br />
<br />
In theory, this is a bad idea because it leads to "test sclerosis," a condition where the system becomes unmodifiable because of the cost of updating the test cases. That's not an issue for me (yet) because I'm testing parsing rules. First, if things get a little sclerotic I don't think I'll mind, because the testing has been <b>working </b>- in the sense that it is finding places where my code doesn't do what I think it does. (Switching to a new language that uses the same syntax makes this a particularly acute problem.)<br />
<br />
Second, although it's still early in the process, I'm testing stuff that I don't expect to change much. Recognizing this or that token should produce pretty much exactly the same PAST output. As I get higher in the grammar, this may not be true any more - maybe when I'm up high, I'll want to quickly change the grammar and the tests will be a drag.<br />
<br />
But for now, it's smooth sailing. The Kakapo library has a little bit of syntactic sugar added just for this purpose: the Matcher::PctNode class and its friend Matcher::PAST::Node. Here's what a testcase looks like:<br />
<blockquote><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">method test_binary() {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> my $matcher := val( :value(1) );</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> my $code := "0b01";</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> </span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> my $slam := Slam::Compiler.new;</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> my $past := $slam.compile: $code, </span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> :rule<EXPR>, </span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> :target<past>,</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> ;</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> assert_match($past, $matcher,</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> 'Failed to parse {' ~ $code ~ '} as expected');</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">}</span></span> </blockquote>In fact, there are so many test cases that the processing is more scripted than this. But this represents what one test would look like. The sugar I've built is in the "my $matcher := " line - that expression is generating a matcher.<br />
<br />
Anyway, this next stage is going to be mostly me migrating the grammar over to -rx, and building test scaffolding around it. I've started working on a private branch, instead of trunk, so there won't be so many bogus commits.Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com1tag:blogger.com,1999:blog-6926838495579160836.post-65949391965982547192010-02-15T17:46:00.000-08:002010-02-15T17:46:13.991-08:00Kakapo release-2I released a limited version of Kakapo today. Then, I released again.<br />
<br />
I created the release-1 tag, and felt good for a bit. Then I noticed that some common documentation files were missing, so I added them and created release-2.<br />
<br />
Release-2 only provides the _base library, with PMC type extensions. But it's something. Documentation is over at GoogleCode: http://code.google.com/p/kakapo-parrot/Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com2tag:blogger.com,1999:blog-6926838495579160836.post-80553638702651061022010-02-12T06:48:00.000-08:002010-02-12T06:48:41.646-08:00There's more than one way to do it...In the process of paying the taxes, I've come across one of the important classes in Kakapo that needs "updating."<br />
<br />
In this case, "updating" means "complete rewrite," since I've got the new class system working. So I thought I'd show a before/after version of the code.<br />
<br />
<a name='more'></a><br />
Here's the before version: <br />
<blockquote>module DependencyQueue;<br />
# A queue that orders its entries according to their prerequisites.<br />
<br />
_pre_initload();<br />
<br />
method added(*@value) { self._ATTR_HASH('added', @value); }<br />
method already_done(*@value) { self._ATTR_HASH('already_done', @value); }<br />
method cycle(*@value) { self._ATTR_HASH('cycle', @value); }<br />
method cycle_keys(*@value) { self._ATTR_ARRAY('cycle_keys', @value); }<br />
method open(*@value) { self._ATTR_HASH('open', @value); }<br />
method pending(*@value) { self._ATTR_HASH('pending', @value); }<br />
method queue(*@value) { self._ATTR_ARRAY('queue', @value); }<br />
<br />
method add_entry($name, $value, :@requires?) {<br />
unless @requires { @requires := Array::new(); }<br />
<br />
my @entry := Array::new($name, $value, @requires);<br />
self.pending{$name} := @entry;<br />
}<br />
<br />
method already_added($name) {<br />
return self.already_done{$name} || self.added{$name};<br />
}<br />
<br />
method init(@args, %opts) {<br />
for @args {<br />
self.mark_as_done(~ $_);<br />
}<br />
<br />
self.open(1);<br />
}<br />
<br />
method is_empty() { <br />
if self.open {<br />
return self.pending.elements == 0;<br />
}<br />
<br />
return self.queue.elements == 0;<br />
}<br />
<br />
method mark_as_done($label) {<br />
self.already_done{$label} := 1;<br />
}<br />
<br />
method marked_done($label) {<br />
return self.already_done{$label};<br />
}<br />
<br />
method next() {<br />
if self.open {<br />
self.tsort_queue();<br />
}<br />
<br />
if self.queue.elements {<br />
my $node := self.queue.shift;<br />
self.mark_as_done($node[0]);<br />
return $node[1];<br />
}<br />
<br />
return my $undef;<br />
}<br />
<br />
sub _pre_initload() {<br />
if our $_Pre_initload_done { return 0; }<br />
$_Pre_initload_done := 1;<br />
<br />
use('Dumper');<br />
<br />
Class::SUBCLASS('DependencyQueue', <br />
'Class::HashBased');<br />
}<br />
<br />
method reset() {<br />
self.open(1);<br />
self.pending(Hash::empty());<br />
}<br />
<br />
method tsort_queue() {<br />
self.open(0);<br />
self.cycle_keys(Array::empty());<br />
self.cycle(Hash::empty());<br />
self.added(Hash::empty());<br />
<br />
self.tsort_add_keys(self.pending.keys);<br />
}<br />
<br />
method tsort_add_keys(@list) {<br />
# Visits a list of keys, adding the attached calls to the queue in topological order.<br />
<br />
for @list {<br />
my $key := $_; <br />
<br />
unless self.already_added($key) {<br />
## First, check for cycles in the graph.<br />
my $cycle_elts := self.cycle_keys.elements;<br />
self.cycle_keys.push($key);<br />
<br />
if self.cycle.exists($key) {<br />
my @slice := self.cycle_keys.slice(:from(self.cycle{$key}));<br />
<br />
Opcode::die("Cycle detected in dependencies: ",<br />
@slice.join(', '),<br />
);<br />
}<br />
<br />
self.cycle{$key} := $cycle_elts;<br />
<br />
## Put everything $key depends on ahead of $key<br />
my $node := self.pending{$key};<br />
<br />
my @prerequisites := $node[2];<br />
self.tsort_add_keys(@prerequisites);<br />
<br />
## Finally, it's my turn.<br />
self.added{$key} := 1;<br />
self.queue.push($node);<br />
}<br />
}<br />
}</blockquote> The first few lines are the 'module' declaration and the call to _pre_initload. The call is a waste, since Kakapo.nqp calls this one in the "super-duper-early" section, since the dep-q is used to implement the ordering for all the <i>other </i>modules.<br />
<br />
Looking at the _pre_initload sub, the class is declared as a direct subclass of HashBased, which was my old root behavior. I can rewrite this class header using the 'class' keyword and no call to _pre_initload. I should put my copyright block in:<br />
<blockquote><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"># Copyright (C) 2009-2010, Austin Hastings. See accompanying LICENSE file, or <br />
# http://www.opensource.org/licenses/artistic-license-2.0.php for license.<br />
<br />
class DependencyQueue;<br />
# A queue that orders its entries according to their prerequisites.</span></span></blockquote><br />
The next block is a bunch of attribute-methods. Those are automatically generated using the P6metaclass 'has' sub:<br />
<blockquote><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">INIT {<br />
use( 'P6metaclass' );<br />
<br />
has( '%!added',<br />
'%!already_done',<br />
'%!cycle',<br />
'@!cycle_keys',<br />
'%!open',<br />
'%!pending',<br />
'@!queue'<br />
);<br />
}</span></span> </blockquote>The next bunch of methods is generally not going to change. The automatically generated methods should have the same names, and the same behaviors, as the hand-coded ones they replaced. The 'init' method is different, though. That method has been replaced by a series of methods, one of which fills the bill here:<br />
<blockquote><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">method _init_positional_(@pos) {<br />
for @pos {<br />
self.mark_as_done(~ $_);<br />
}<br />
<br />
self.open(1);<br />
}</span></span></blockquote>Plus, of course, the _pre_initload sub goes away entirely, since the class creation has moved to the class header. If this were not such an important class, I would probably add a call to the INIT header to tell the system that there is no further initload processing to be done. But dep-q runs before the Program:: module does, which is where that information is tracked. (Program:: creates a DependencyQueue to do the tracking.)<br />
<br />
Anyway, here's the rewritten code:<br />
<blockquote># Copyright (C) 2009-2010, Austin Hastings. See accompanying LICENSE file, or <br />
# http://www.opensource.org/licenses/artistic-license-2.0.php for license.<br />
<br />
class DependencyQueue;<br />
# A queue that orders its entries according to their prerequisites.<br />
<br />
INIT {<br />
use( 'P6metaclass' );<br />
<br />
has( '%!added',<br />
'%!already_done',<br />
'%!cycle',<br />
'@!cycle_keys',<br />
'%!open',<br />
'%!pending',<br />
'@!queue'<br />
);<br />
}<br />
<br />
method add_entry($name, $value, :@requires?) {<br />
unless @requires { @requires := Array::new(); }<br />
<br />
my @entry := Array::new($name, $value, @requires);<br />
self.pending{$name} := @entry;<br />
}<br />
<br />
method already_added($name) {<br />
return self.already_done{$name} || self.added{$name};<br />
}<br />
<br />
method _init_positional_(@pos) {<br />
for @pos {<br />
self.mark_as_done(~ $_);<br />
}<br />
<br />
self.open(1);<br />
}<br />
<br />
method is_empty() { <br />
if self.open {<br />
return self.pending.elements == 0;<br />
}<br />
<br />
return self.queue.elements == 0;<br />
}<br />
<br />
method mark_as_done($label) {<br />
self.already_done{$label} := 1;<br />
}<br />
<br />
method marked_done($label) {<br />
return self.already_done{$label};<br />
}<br />
<br />
method next() {<br />
if self.open {<br />
self.tsort_queue();<br />
}<br />
<br />
if self.queue.elements {<br />
my $node := self.queue.shift;<br />
self.mark_as_done($node[0]);<br />
return $node[1];<br />
}<br />
<br />
return my $undef;<br />
}<br />
<br />
method reset() {<br />
self.open(1);<br />
self.pending(Hash::empty());<br />
}<br />
<br />
method tsort_queue() {<br />
self.open(0);<br />
self.cycle_keys(Array::empty());<br />
self.cycle(Hash::empty());<br />
self.added(Hash::empty());<br />
<br />
self.tsort_add_keys(self.pending.keys);<br />
}<br />
<br />
method tsort_add_keys(@list) {<br />
# Visits a list of keys, adding the attached calls to the queue in topological order.<br />
<br />
for @list {<br />
my $key := $_; <br />
<br />
unless self.already_added($key) {<br />
## First, check for cycles in the graph.<br />
my $cycle_elts := self.cycle_keys.elements;<br />
self.cycle_keys.push($key);<br />
<br />
if self.cycle.exists($key) {<br />
my @slice := self.cycle_keys.slice(:from(self.cycle{$key}));<br />
<br />
Opcode::die("Cycle detected in dependencies: ",<br />
@slice.join(', '),<br />
);<br />
}<br />
<br />
self.cycle{$key} := $cycle_elts;<br />
<br />
## Put everything $key depends on ahead of $key<br />
my $node := self.pending{$key};<br />
<br />
my @prerequisites := $node[2];<br />
self.tsort_add_keys(@prerequisites);<br />
<br />
## Finally, it's my turn.<br />
self.added{$key} := 1;<br />
self.queue.push($node);<br />
}<br />
}<br />
}</blockquote>Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com1tag:blogger.com,1999:blog-6926838495579160836.post-2486183311877203462010-02-12T05:02:00.000-08:002010-02-12T06:22:00.939-08:00Progress!My taxes aren't paid yet.<br />
<br />
That said, I've at least seen some forward progress under the new NQP. The change to <i>not </i>using :init on code at the top level means that I'm getting the chance to write more test cases for every ... single ... file in the kakapo library. And yeah, there's some "why the hell doesn't <i>this </i>work?" going on, as well.<br />
<br />
I'm pleased to report that I have a working <b>UnitTest </b>system. It's modeled on xUnit, but the default also includes a TAP Listener so that test cases report as TAP tests.<br />
<br />
<a name='more'></a><br />
Here's what it looks like:<br />
<blockquote><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">class Test::Pmc::Undef<br />
is Test::Pmc::COMMON {<br />
<br />
INIT {<br />
use( 'P6metaclass' );<br />
use( 'UnitTest::Testcase' );<br />
<br />
Program::register_main();<br />
}<br />
<br />
sub main() {<br />
my $proto := Opcode::get_root_global(Opcode::get_namespace().get_name);<br />
$proto.suite.run;<br />
}<br />
<br />
method test_defined() {<br />
verify_that( "Defined returns false" );<br />
my $object := self.class.new;<br />
<br />
if $object.defined { fail( "Undef.defined reports yes" ); }<br />
}</span></span></blockquote><br />
Note that I've created a Test::Pmc::COMMON testcase class to handle some common test cases for all the Pmc types. So each Test::Pmc::<type> gets a bunch of test_... methods inherited.<br />
<br />
The INIT block runs after the class is declared. I import P6metaclass, which Kakapo extends with some class definition subs. You don't see them here, but things like " has( '$!attribute' ); " are there.<br />
<br />
The main sub is magic. Black magic. Secrets man was not meant to know. Kakapo's "register_main" only works with subs - not methods. (For now...PHP anyone?) So there needs to be a sub, and it needs to have some kind of link to the class or namespace of the test case. So each test file includes this boilerplate 'main' that does eldritch mummery to figure out what the current class's proto-object is, and uses that to create and run a test suite from the current test case.<br />
<br />
The test_defined method is a pretty representative test method. It's short and simple, checks a single condition and either returns quietly or throws an exception. The verify_that function sets the $.verify attribute on the current object (it uses dynamic scope to find 'self') so that tests can provide a little more description. Here's the output:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">/usr/local/bin/parrot -Llibrary t/Pmc/Undef.t.pbc<br />
1..5<br />
ok 1 - 'new' returns an object of the right type<br />
ok 2 - Defined returns false<br />
ok 3 - 'isa' returns correct results<br />
ok 4 - Clone returns a different, valid object<br />
ok 5 - A 'can' method exists, and returns known results</span></blockquote><div style="font-family: "Courier New",Courier,monospace;"></div><blockquote><blockquote><blockquote><br />
</blockquote></blockquote></blockquote>Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com0tag:blogger.com,1999:blog-6926838495579160836.post-70429643412993578162010-02-10T17:09:00.000-08:002010-02-10T17:10:42.784-08:00More snow .. err, code!We got some more snow here in the northeast. Which means I've had plenty of time to code. Sadly, though, while I've been very active on <a href="http://trac.parrot.org/">Parrot's trac</a> submitting tickets, I haven't made much forward progress. This was my tree at about 4:30pm.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5BltBz2mo-HHciAOvjbAVKgqbGAWhBjhow1O5ocPaXpb6bd-mYNFje2AioLQJhlnZ4WcU-9VR2AAX7gT7rrnUCQFkode10Fcg0y1-G_OXC0MV6MyhFEr-4dg4DH9XjHc1SKtejklDFvw/s1600-h/cw2+-+tree+1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5BltBz2mo-HHciAOvjbAVKgqbGAWhBjhow1O5ocPaXpb6bd-mYNFje2AioLQJhlnZ4WcU-9VR2AAX7gT7rrnUCQFkode10Fcg0y1-G_OXC0MV6MyhFEr-4dg4DH9XjHc1SKtejklDFvw/s320/cw2+-+tree+1.png" /></a></div><br />
<a name='more'></a><br />
<br />
I have learned something about the C3 linearizer, and about how P6object classes work. And I was able to make my first Parrot <a href="http://trac.parrot.org/parrot/changeset/43847">commit</a> this week, providing a test case. (Yay! Test cases.)<br />
<br />
This was my tree at about 6:30pm. Notice that there's about another inch and a half of snow on the branches. While I was walking home from dinner, I heard a double "ka-krak!" I turned to look and saw some pine branches falling off their trees, broken under the weight of the snow they were carrying.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZn23XmUUwliQjbatPbC9YIaWMQItV3-Yzafr9jXOmoHUINxApFb2GEotuQQTR2znlmcBXu6l0nly-bX8G-CC0Sh9ISAgy8kYWVvQYjs8mvEDpK6yh5JgTFM5YvvncmnddJeXX7sPZA4c/s1600-h/cw2+-+tree+2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZn23XmUUwliQjbatPbC9YIaWMQItV3-Yzafr9jXOmoHUINxApFb2GEotuQQTR2znlmcBXu6l0nly-bX8G-CC0Sh9ISAgy8kYWVvQYjs8mvEDpK6yh5JgTFM5YvvncmnddJeXX7sPZA4c/s320/cw2+-+tree+2.png" /></a></div>Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com2tag:blogger.com,1999:blog-6926838495579160836.post-20481272653405406172010-02-06T12:09:00.000-08:002010-02-06T12:10:50.794-08:00Codin' weather<div class="separator" style="clear: both; text-align: center;"> <a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitu34-QFct2AiDmQLAWvB9geoAxpbqO0QodqW5MrVNIy-ZNnq5E7gTTenmvp6wUUFNSxWHc-EALuck9s3mRSA6m5tN0y0zcTlLvENtDWa_vD0NBRteBaFxEXCPxvMTmnuFnmPLcJGUYp8/s1600-h/Front+steps.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitu34-QFct2AiDmQLAWvB9geoAxpbqO0QodqW5MrVNIy-ZNnq5E7gTTenmvp6wUUFNSxWHc-EALuck9s3mRSA6m5tN0y0zcTlLvENtDWa_vD0NBRteBaFxEXCPxvMTmnuFnmPLcJGUYp8/s320/Front+steps.png" /></a></div><br />
Well, the snowpocalypse is upon us, and I've got 15 inches of snow outside my front door. There's nothing to do (and not really any way to do it) until the Super Bowl tomorrow. Obviously, this is codin' weather. Hopefully I can get a Kakapo release in the can, and start back on Close development.<br />
<div class="separator" style="clear: both; text-align: center;"></div><br />
<a name='more'></a>I think there's about a foot on the roof, with 18 inches or so in the lee of my ornamental shrubbery:<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZEKL4XdTYo4d3E7YJdHLYtxxp1hE2wXaHIGqDTOvHnqOhOsxTkyQK4Bv8HrXr1rh4PQKpQqxd5MJkqnYnHnISikdRvHtzvqPz2f6OGV3fJg7pAxgWiwICRNEOCq8OzGWcsQU7WkOsi38/s1600-h/Snowy+roof.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZEKL4XdTYo4d3E7YJdHLYtxxp1hE2wXaHIGqDTOvHnqOhOsxTkyQK4Bv8HrXr1rh4PQKpQqxd5MJkqnYnHnISikdRvHtzvqPz2f6OGV3fJg7pAxgWiwICRNEOCq8OzGWcsQU7WkOsi38/s320/Snowy+roof.jpg" /></a> </div><div class="separator" style="clear: both; text-align: left;"> </div>And there's a little more out by the street, where there are no "wind effects" to mess up the delivery of the snowy goodness. Let's call it two feet:<br />
<div class="separator" style="clear: both; text-align: center;"></div><br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgx9cXmSClQA6loUZ5LYKB8o1V0JfoDkuF-U6T3QDAKOFlSUdQevY2wCFtIgpqM4IYRQnPaw5dw7nuPqr7FazzQ5f8wSv4uLO2juZZze7pHTD5VMAjrWsYkLZRUZJjRyOl4N0H5SttQ_sA/s1600-h/Sidewalk.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgx9cXmSClQA6loUZ5LYKB8o1V0JfoDkuF-U6T3QDAKOFlSUdQevY2wCFtIgpqM4IYRQnPaw5dw7nuPqr7FazzQ5f8wSv4uLO2juZZze7pHTD5VMAjrWsYkLZRUZJjRyOl4N0H5SttQ_sA/s320/Sidewalk.jpg" /></a></div>Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com1tag:blogger.com,1999:blog-6926838495579160836.post-9962251469416566332009-11-17T22:59:00.000-08:002009-11-17T23:04:10.801-08:00Paying taxesA long time ago - 3 or 4 months, when I started this project - I tried to update my version of Parrot every few days. I'd get some code working, then the next day I'd <span style="font-family: "Courier New",Courier,monospace;">svn update </span>in the Parrot dir, rebuild, and carry on. But a couple of things happened, and I realized that that was a stupid idea. Once you give out commit privs, it's next to impossible to enforce any kind of discipline without chasing volunteers off your project, so Parrot is stuck with the occasional commit-without-testing.<br />
<br />
So I decided that me updating without any kind of knowledge about stability was a stupid thing to do, and I started just updating after the monthly releases. This is a bad idea, too, because the release process seems to involve a "code freeze" for a little bit just before the release, and then everybody slams in whatever branch they've been holding off on merging for the last week. So I'm basing my stuff on the tagged release, rather than trunk, which is as it should be.<br />
<br />
A few times, there have been failures even then, because the code didn't work, or a bug was introduced, or whatever. I've evolved a procedure for that, too -- I move the old Parrot workspace out of the way until I'm sure the new one is good, and I don't waste any time waiting for fixes. This means sometimes I go two months with no updates, but that's not such a bad thing.<br />
<br />
At any rate, every month or so I "pay the upgrade tax." That is, I invest an hour or more in getting and building and rebuilding, and oh, yeah, I forgot I need to run dos2unix, and re-rebuilding Parrot. My buddy Jesse is now convinced that I've generated a meme, since other folks on <a href="irc://irc.parrot.org/parrot">#parrot</a> keep telling me about paying the tax.<br />
<br />
Today is tax day, and there's this new version of NQP, called NQP-rx. (That's the last sentence in this post that I'll be able to write without having to go back and remove profanity.)<br />
<br />
<a name='more'></a><br />
And the thing is, it's "better" than NQP + PGE, because it promises to finally let me put NQP in-line in the grammar targets, rather than just inline PIR. But there's a price, and the price is a bunch of changes. Like a stricter POD parser.<br />
<br />
IMO, pod6 leaves a fair amount to be desired. Damian delivered something that looks a whole lot like xPmOlD, and for documenting source code it's *way* more verbose than it should be. I wasn't too unhappy with the NQP pod parser, since it would accept code like:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">method foo() { <br />
=method<br />
Blah blahblah<br />
=end<br />
<br />
code_for_foo();<br />
}</span><br />
</blockquote>But now there's a new sheriff in town, and I'd have to say something like "=begin method" at the top. Which wouldn't suck too bad (it's only 14 characters or so, compared with the <ahem> 1 character required for docstrings in elisp, or the <cough> 3 characters required in python). But then I'd also have to say "=end method" at the end, because <i>matching end-tags have been such a great f.....reaking idea in Ada and Xml, </i>so I decided that "ctrl-q" - one "character" - was a much better solution. So I've converted my POD comments into block comments (# at beginning-of-line) which <a href="http://notepad-plus.sourceforge.net/uk/site.htm">Notepad++</a> does with a keystroke. The suboptimal part is that I'm having to do that over and over again, whereever I have written any documentation. I'm doing this in Kakapo, which (I'm sorry, it's true) I'm still working on. And sometimes I wind up just deleting the documentation. Hell, *I* know what the stuff does, and isn't that enough?<br />
<br />
Another change which irritates me (but probably only me) is the $-variable interpolation in NQP. Yesterday it wasn't there, today it is. And since I've been doing a lot of PIR-generation, I'm caught replacing " with ' in a thousand places.<br />
<br />
I'm sure there will be some stuff that doesn't irritate me until run-time, too. But I'm going off to pay the upgrade tax now, and this month the taxes are high. Should I blame Obama for this, too?Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com1tag:blogger.com,1999:blog-6926838495579160836.post-73233362472772358992009-11-13T07:41:00.000-08:002009-11-13T07:41:14.113-08:00No Close! Kakapo!<img align="absmiddle" class="ife_marker" id="null_ife_marker_2" src="chrome://informenter/skin/marker.png" style="border: 0pt none; cursor: pointer; height: 19px; width: 14px;" title="Max field length is unknown" /><img align="absmiddle" class="ife_marker" id="null_ife_marker_0" src="chrome://informenter/skin/marker.png" style="border: 0pt none; cursor: pointer; height: 19px; width: 14px;" title="Max field length is unknown" /><img align="absmiddle" class="ife_marker" id="null_ife_marker_1" src="chrome://informenter/skin/marker.png" style="border: 0pt none; cursor: pointer; height: 19px; width: 14px;" title="Max field length is unknown" />I've been silent for a while, and that's pretty bad.<br />
<br />
The reason is that I've been busy not working on Close at all. Instead, I stopped to turn a bunch of the code I had already written into a general-purpose library, called <a href="http://code.google.com/p/kakapo-parrot/">Kakapo.</a><br />
<br />
Kakapo is a runtime library for NQP, that helps developers (like me!) do useful stuff without having to drop into PIR every couple of lines. Since this is code that has been extracted from Slam, obviously Slam will be using it. I hope that other coders find it useful, too.Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com1tag:blogger.com,1999:blog-6926838495579160836.post-90104854133698311742009-10-19T10:34:00.000-07:002009-11-13T07:38:17.604-08:00rtype not setI just got this diagnostic from NQP, and boy, is that not a helpful error message. Of course, since it's in PAST->POST compilation, there's no line number or token output.<br />
<br />
FWIW, the problem was that I was doing this:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">self.declarator := '' ~ $name;</span><br />
</blockquote>when, as anyone can plainly see, I should have been doing this:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">self.declarator('' ~ $name);</span><br />
</blockquote>because .declarator is a method, not an attribute. D'oh.<br />
<br />
I've seen this error once before, and I didn't remember what caused it. I hope that next time I'll remember it, or that <a href="http://www.google.com/">The All-Knowing Oz</a> will be able to remind me.Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com1tag:blogger.com,1999:blog-6926838495579160836.post-25840165241946557682009-10-16T16:53:00.000-07:002009-10-16T16:56:40.698-07:00Parrot Multi-Subs and InheritanceOne of the things I've added to Slam (the Close compiler) is support for declaring multi-subs. In another language, these would be called "overloaded methods" or "overloaded functions," but in Close, it's all about the underlying hardware, and that means MultiSubs.<br />
<br />
Before I write too much, let me make this clear: Parrot is not the same as Perl6, or Python, or whatever your favorite language is. And so, the way Parrot (and Close) handle things like object dispatch -- the "Object Meta-model," as it is called -- is different from the way other languages handle these things. This is one of those times where the difference matters. (And specifically, the stuff I write below about inheritance isn't true for Perl6. You have been warned.)<br />
<br />
<a name='more'></a>In Parrot's default object model (P6object), each class can have one <i>or more </i>parent classes, which can have one or more grandparent classes, etc. And in order to dispatch method calls "optimally," an algorithm called the <a href="http://192.220.96.201/dylan/linearization-oopsla96.html">C3 algorithm</a> is used. <br />
<br />
The subject in question is <i>method resolution order,</i> and it has to do with picking between a method, say "bark," that might be inherited from two different parents. If you have a class Tree, with a "bark" method, and a class Dog, with a "bark" method, and a child class DogWood that inherits from both Tree and Dog, which "bark" method runs when you invoke it? That's method resolution.<br />
<br />
And when you "order" the inheritance hierarchy, so that methods from one class get chosen in preference to methods from another class, you have a method resolution order, or MRO. In most class hierarchies, MRO is simple. For instance, in the Parrot Compiler Toolkit (PCT) libraries, there are classes like this:<br />
<blockquote><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">PCT::Node - the root of the class hierarchy</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">PCT::Block - a lexical scope</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">PCT::Stmts - a collection of statements, but not a lexical scope</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">PCT::Var - a declaration or reference to a variable</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">PCT::Val - a literal value</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">PCT::Op - an operation, like add or multiply</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">PCT::VarList - a collection of variable declarations</span><br />
</span><br />
</blockquote>Conveniently, every one of those classes (except Node) is derived from PCT::Node. So the class hierarchy looks like a fan. But that makes the MRO trivial. For each class, the MRO is "search my methods, then search PCT::Node, then quit."<br />
<br />
So what does this have to do with MultiSubs, you ask? Well, it turns out that MultiSubs are magical in two ways. First of all, a MultiSub is a single entity. Get that through your head, and let it sink in. It took me a while to understand it. A MultiSub is a PMC that is created by the compiler when you use the <span style="font-family: "Courier New",Courier,monospace;">:multi</span> modifier on a sub declaration (in PIR). Likewise in Close -- if you declare a function with <span style="font-family: "Courier New",Courier,monospace;">:multi,<span style="font-family: inherit;"> it will be created as a MultiSub.</span></span><br />
<br />
<span style="font-family: "Courier New",Courier,monospace;"><span style="font-family: inherit;">These MultiSub PMCs are treated somewhat like Sub PMCs, meaning that they get compiled into PBC files, loaded, and they respond to the <span style="font-family: "Courier New",Courier,monospace;">.invoke()</span></span></span> method call. But that's where things get ugly. Because a MultiSub secretly encodes a list of ordinary Subs. (I don't think it's possible to "nest" MultiSubs, but I haven't done any research on it.) Each of the ordinary Subs is associated with a <i>Signature </i>that indicates how many arguments, and of what types, are to be used in <i>MultiMethod Dispatch (MMD). </i>And yeah, it's called MMD even though it can be done on plain old non-method Subs. Go figure.<br />
<br />
So when a MultiSub gets invoke()'ed, it takes its list of arguments and tries to match the args against a Sub using the signatures. There's an algorithm for that, too, that uses the "manhattan distance" -- essentially (but not exactly) the sum of the counts of how many "steps away" each argument is from the corresponding parameter in its own MRO list.<br />
<br />
For example, since DogWood inherits from Tree and Dog, it might be one step, or two steps, from Tree. So a function declarated to take a single Tree parameter will have a manhattan distance of 1 or 2 from the DogWood argument. If a function of the same name were declared to take a DogWood parameter, it would have a distance of 0, and so be a "closer" match.<br />
<br />
That's where the simplicity ends, though, because when dispatch is done on <i>two </i>parameters, you have the case where a function with one very close match and another very far match might be a better or worse match than a function with two "medium close" classes. For example, say you have a class hierarchy A -> B -> C -> D, where A is the "root" of the hierarchy. In this example, we have two subs "foo" declared as multi, with parameter lists (B, B) and (D, A). If I call foo(d, d), which is closer? How about foo(d, c)?<br />
<br />
First of all, this is the stuff that gives academics nightmares. But it's okay, really, because if you find yourself in that position, you probably are getting what you deserve. (If the two subs do radically different things, what the hell were you thinking by overloading them? And if they do similar things, you're probably okay.)<br />
<br />
While all that is interesting, it's not particularly relevant. In fact, it all "just works." What's more interesting is what happens when you <i>don't </i>match. That is, when you call a MultiSub with a set of arguments that just don't match any candidate. Consider an obvious (and recently bothersome) example: the compiler builds a tree of nodes. The Slam compiler extends the PCT::Node class hierarchy as I mentioned in my post on <a href="http://close-parrot.blogspot.com/2009/09/multiple-inheritance.html">Multiple Inheritance,</a> so there are a bunch of different node types to support. While I was trying to implement the <i><a href="http://en.wikipedia.org/wiki/Visitor_pattern">Visitor</a> </i>pattern on the syntax tree, I implemented a Class::MULTISUB function in NQP that dynamically generates and compiles <a href="http://en.wikipedia.org/wiki/Trampoline_%28computers%29"><i>trampolines</i></a> to convert a set of sub declarations into a multisub. The resulting Visitor class had overrideable methods for each possible node type (every class in the hierarchy). So if I have a visitor, $v, and call <span style="font-family: "Courier New",Courier,monospace;">$v.visit($node)</span>, multidispatch invokes the right method based on the type of the first argument.<br />
<br />
<br />
But what happens when I add a class to the hierarchy? Well, I usually forget to re-run the script that generates all the visitor methods. And so there is no multisub with a signature that matches. Whoops.<br />
<br />
An even better question is this: what happens when I overload some of the visit methods in a new Visitor class?<br />
<br />
Here's the problem: Parrot does not dispatch "beyond" a MultiSub. That is, when you call a method like 'visit', and Parrot's dispatch mechanism <i>finds </i>a method entry 'visit' that points to a MultiSub, <i>it is done. </i>Parrot hands control to the MultiSub, says "Here ya go. Good luck, buddy!" and doesn't wait around to see what happens. What this means is that by default, calling a MultiSub with an argument list that doesn't match any of the available signatures is a fatal error. Yikes!<br />
<br />
<br />
So what happens if, say, I want to specify a particular action for handling nodes of a single type -- symbol declarations, say -- but I want the "default" handling to be inherited from some ancestor?<br />
<br />
<br />
Well, now it gets tricky. First, remember that I'm not using multisubs or multimethods directly. Instead, I'm using these automatically generated trampolines. So the behavior you want to specify lives in a method called something like <span style="font-family: "Courier New",Courier,monospace;">_visit_Slam_Symbol_Declaration()</span> and it gets <i>called </i>by a trampoline that is part of some parent class' <span style="font-family: "Courier New",Courier,monospace;">visit</span> MultiSub. So the simple answer is, if you replace the right method, it will just work. Because calling <span style="font-family: "Courier New",Courier,monospace;">$child.visit($node)</span> results in a call to the inherited <span style="font-family: "Courier New",Courier,monospace;">Ancestor::visit($child, $node)</span> which in turn invokes <span style="font-family: "Courier New",Courier,monospace;">$child._visit_Slam_Symbol_Declaration($node)</span> which has been overridden in the Child class. Score!<br />
<br />
But there's a problem. There's always a problem, or it wouldn't be worth writing about. The problem is that the host of little per-node-type visit methods is coded waaaay down at the root of the Visitor class hierarchy. And there are a few generations of other classes between the root of the hierarchy and where most code lives. (And here, let me put in a plug for an <a href="http://ftp.cwi.nl/CWIreports/SEN/SEN-R0210.pdf">excellent paper</a> by van Deurssen and Visser that serves as a solid introduction to Visitor/Combinators and the <a href="http://www.meta-environment.org/doc/api/JJTraveler/">JJTraveler</a> class library.)<br />
<br />
The intervening layers of parent classes are not necessarily a problem, except when they override the <span style="font-family: "Courier New",Courier,monospace;">.visit() </span>method. Because a parent class that replaces the MultiSub with a simple Sub method suddenly eliminates the "it just works" mechanism -- the MultiSub dispatcher down at the root of the tree -- that makes per-type method overriding possible.<br />
<br />
So what to do? Well, clearly I needed to replace the "root" MultiSub with one in the Child class. Okay, I've got code that does that automatically. But if the Child class only provides one MultiSub method -- because the Child only wants to process symbol declaration nodes, remember? -- there won't be any other methods available. And if there are no valid candidates in a MultiSub, it's a fatal error. Yikes.<br />
<br />
One obvious solution was to implement a second multi-method that matched Slam::Node or PCT::Node or something, so that there was always a fall-back multimethod available. But that's boilerplate, and shouldn't that be generated automatically? Well, yes, IMO. So I implemented that, too.<br />
<br />
Creating a multisub now does two things. First, it seeks out all the matching method names, and creates trampolines for them (so a call to <span style="font-family: "Courier New",Courier,monospace;">Class::MULTISUB($class, 'visit', :starting_with('_visit_'))</span> creates a MultiSub named 'visit', and automatically creates trampolines for each existing method that starts with _visit_ in the class' namespace. Each method is assumed to start with whatever prefix you specify, and then contain a class name, with '::' replaced by '_', so that a visit method that takes a Slam::Symbol::Declaration parameter would be named _visit_Slam_Symbol_Declaration.<br />
<br />
The second thing is the automatic generation of a "default" multi. This is a multi-method with a signature of (_) -- that is, it is dispatched only on the 'self' parameter and considers any other parameters to be extras. This apparently works because the manhattan sort gives some kind of preference to "right number of parameters", so that while a signature of (_) does match, a signature of (_, Some::Parent::Class) is still "closer."<br />
<br />
The "default" multi is generated in one of two ways. The multi compiler searches the class' MRO (available via the 'all_parents' argument to the inspect method on the class object) for the first ancestor with a matching method -- be it a MultiSub or plain old method -- and creates the default multi to call that method if it finds one. Otherwise, the default multimethod just <span style="font-family: "Courier New",Courier,monospace;">die</span>'s with a diagnostic message indicating failure.<br />
<br />
Okay, that's it for today. FMTYEWTKA MultiSubs. The short summary: multisubs are great, but that don't play well with inheritance. The longer summary: you have to do your inheritance chaining yourself. And if you need to know more, I recommend you do what I do - get on the <a href="irc://irc.parrot.org/parrot">#parrot</a> IRC channel and pester PMichaud with questions. That dude knows just about everything.Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com0tag:blogger.com,1999:blog-6926838495579160836.post-44728920633442787812009-10-02T06:11:00.000-07:002009-10-02T06:16:12.233-07:00Top versus BottomLast time, I posted on multiple inheritance, and on "class"-ifying the Slam code in general. Sadly, that is a "rat-hole" that I've gone down, and other than "still working on it" there hasn't been much to post. :-(<br />
<br />
So I thought I'd dig out this little gem of a topic that I've had sitting in reserve: the difference between top-down and bottom-up parsing, and what it means to you.<br />
<br />
<a name='more'></a><br />
<br />
<b><span style="font-family: "Trebuchet MS",sans-serif;">Top-down parsing</span></b><br />
First, top-down parsing is what you're doing when you write code in PGE, or Perl6, or in ANTLR or if you just do it yourself. In a top-down parser, you describe structures in terms of their sub-components:<br />
<ul><li>A program is a sequence of zero or more statements.</li>
<li>A statement is an expression followed by a semicolon.</li>
</ul>These rules turn into code like:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">sub program() {<br />
</span><span style="font-size: x-small;"> my @program := Array::empty();<br />
</span><span style="font-size: x-small;"> repeat {</span><span style="font-size: x-small;"><br />
</span><span style="font-size: x-small;"> @program.push(statement());<br />
</span><span style="font-size: x-small;"> }<br />
</span><span style="font-size: x-small;"> until (eof());</span><span style="font-size: x-small;"><br />
</span><span style="font-size: x-small;"> return @program;</span><span style="font-size: x-small;"><br />
</span><span style="font-size: x-small;">}<br />
</span><span style="font-size: x-small;"><br />
sub statement() {<br />
</span><span style="font-size: x-small;"> my $statement := expression();<br />
</span><span style="font-size: x-small;"> match( ';' );</span><span style="font-size: x-small;"><br />
</span><span style="font-size: x-small;"> return $statement;</span><span style="font-size: x-small;"><br />
</span>}<br />
</blockquote>This is a very intuitive way to write programs, and <i>if </i>you know what your language looks like in advance, it can be really, really fast to code. (If you're stuck wandering in a maze of features, like me, then maybe not so much.)<br />
<br />
<b><span style="font-family: "Trebuchet MS",sans-serif;">Bottom-up parsing</span></b><br />
On the other hand there is something called bottom-up parsing. To get your head around this, first you need to understand that most parsing is broken into two separate phases, called "lexical analysis" (or "lexing") and "parsing." The distinction is that the lexing phase takes input characters and turns them into <i>tokens. </i>A token is an abstraction of the input characters, identifying what role it plays in the language. This can be a slippery concept to grasp: in written human languages, for example, we talk about "words." So is a word a token? Well, if you're trying to process the text, sure. But if you're trying to convert the words into a different tense, maybe you want to separate out prefixes and suffixes as tokens. Or if you're trying to understand the text, maybe you want the tokens to be parts of speech: noun, verb, adjective. Finally, if you're processing Thai, you're doomed - they don't put spaces between the words, so how will your lexer know where one starts and the next one stops?<br />
<br />
At any rate, lexing isn't 100% necessary. It's a convenience that you will almost immediately re-invent for yourself if you don't have it - kind of like a "print" statement. But tokens are generally what parsers operate on, whether they have to create them or whether a lexer does it for them.<br />
<br />
That said, bottom-up parsing defines a higher-level structure in terms of compositions of a sequence of low-level tokens. It's the reverse of top-down, and it can look almost exactly the same if you aren't careful:<br />
<blockquote>A sequence of zero or more statements is a program.<br />
An expression followed by a semicolon is a statement.<br />
</blockquote>See the difference? Yeah, it's tough. And it's made tougher by the fact that the tool writers, who build automatic parser generators, generally all choose a similar syntax:<br />
<blockquote>program: statement*<br />
statement: expression SEMICOLON<br />
</blockquote><br />
Is that top-down or bottom-up? It's impossible to say, really. So you just have to <i>know </i>what kind of tool you're using, and adjust accordingly. But if you're using a bottom-up parser, you'll hear the terms "shift" and "reduce" a lot, so that usually helps. Here's why.<br />
<br />
A bottom-up, or shift/reduce, parser uses a separate stack to keep track of the tokens it sees. So when you type in "y = m * x + b;" it receives a sequence of tokens like:<br />
<blockquote><div style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">IDENT(y) EQUALS IDENT(m) STAR IDENT(x) PLUS IDENT(b) SEMI</span><br />
</div></blockquote>By default, the parser will 'shift' these tokens onto its internal stack, in this order, so that the right-most token is at the "top" of the stack. As it shifts tokens onto the stack, it matches the top-most few tokens against its internal database of rules, to see if any rule has a pattern that the stack matches. If a "production" (a rule) matches what is on the stack, the matching tokens are taken off the stack and replaced with whatever the production makes.<br />
<br />
In this example, maybe you have an expression rule that looks like:<br />
<blockquote><div style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">term : IDENT ;</span><br />
</div><div style="font-family: "Courier New",Courier,monospace;"><br />
</div><div style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">op : EQUALS | PLUS | STAR ; </span><br />
</div><div style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;"><br />
</span><br />
</div><div style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">expression : expression op term | term;</span><br />
</div></blockquote><br />
(Note that I am ignoring things like operator precedence. If you want to know more, Google is your friend.)<br />
<br />
In our example, the parser will do the following:<br />
<br />
<table><tbody>
<tr><th>Stack<br />
</th><th>Next token<br />
</th><th>Action?<br />
</th></tr>
<tr><td><empty><br />
</td><td>IDENT(y)<br />
</td><td>shift<br />
</td></tr>
<tr><td>IDENT(y)<br />
</td><td>EQUALS<br />
</td><td>reduce <br />
</td></tr>
<tr><td>term(y)<br />
</td><td>EQUALS<br />
</td><td>reduce<br />
</td></tr>
<tr><td>expression(y)<br />
</td><td>EQUALS<br />
</td><td>shift<br />
</td></tr>
<tr><td>expression(y), EQUALS<br />
</td><td>IDENT(m)<br />
</td><td>reduce<br />
</td></tr>
<tr><td>expression(y), op(=)<br />
</td><td>IDENT(m)<br />
</td><td>shift<br />
</td></tr>
<tr><td>expression(y), op(=) IDENT(m)<br />
</td><td>STAR<br />
</td><td>reduce<br />
</td></tr>
<tr><td>expression(y), op(=), term(m)<br />
</td><td>STAR<br />
</td><td>reduce<br />
</td></tr>
<tr><td>expression(y=m)<br />
</td><td>STAR<br />
</td><td>shift<br />
</td></tr>
<tr><td>expression(y=m), STAR<br />
</td><td>IDENT(x)<br />
</td><td>reduce<br />
</td></tr>
<tr><td>expression(y=m), op(*)<br />
</td><td>IDENT(x)<br />
</td><td>shift<br />
</td></tr>
<tr><td>expression(y=m), op(*), IDENT(x)<br />
</td><td>PLUS<br />
</td><td>reduce<br />
</td></tr>
<tr><td>expression(y=m), op(*), term(x)<br />
</td><td>PLUS<br />
</td><td>reduce<br />
</td></tr>
<tr><td>expression(y=m*x)<br />
</td><td>PLUS<br />
</td><td>shift<br />
</td></tr>
<tr><td>expression(y=m*x), PLUS<br />
</td><td>IDENT(b)<br />
</td><td>reduce<br />
</td></tr>
<tr><td>expression(y=m*x), op(+)<br />
</td><td>IDENT(b)<br />
</td><td>shift<br />
</td></tr>
<tr><td>expression(y=m*x), op(+), IDENT(b)<br />
</td><td>SEMI(b)<br />
</td><td>reduce<br />
</td></tr>
<tr><td>expression(y=m*x), op(+), term(b)<br />
</td><td>SEMI(b)<br />
</td><td>reduce<br />
</td></tr>
<tr><td>expression(y=m*x+b)<br />
</td><td>SEMI(b)<br />
</td><td>shift<br />
</td></tr>
<tr><td>expression(y=m*x+b), SEMI<br />
</td><td>??<br />
</td><td>reduce<br />
</td></tr>
<tr><td>statement(y=m*x+b;)<br />
</td><td>??<br />
</td><td>??<br />
</td></tr>
</tbody></table><br />
Beware: the expression produced will not be what you think it should be without all that operator precedence stuff. Instead, you'll get an expression tree like (((y=m)*x)+b), wrong in nearly every language.<br />
<br />
<div style="font-family: "Trebuchet MS",sans-serif;"><b>So what's the difference?</b><br />
</div>Well, one difference is that bottom-up parsers can recognize a broader set of languages than top-down parsers. Another difference is that bottom-up parser generators are generally easier to code than top-down generators. And the bottom-up parsers generally perform better than their top-down brethren, because they are using small, simple data structures (a big table of rules, and a small stack).<br />
<br />
But more importantly, the difference is in the "legacy" of bottom-up versus top-down parsing. Technically, top-down parsers are called LL, and bottom-up parsers are called LR -- usually both terms are surrounded by other letters and numbers, but if you look hard you'll find an LL or LR in there somewhere. And the difference there -- L versus R -- is in 'left' versus 'right' parsing. When a top-down parser is given a sequence of tokens, it consumes from the left. When a top-down parser is given the same sequence, it consumes from the right.<br />
<br />
One place you can see that difference is in C versus Perl 4 or 5 parsing. A variable declaration and function declaration in C look like:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">int x = 1;</span><br />
<span style="font-size: x-small;">int foo() { return x; }</span><br />
</blockquote>In perl, you get:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">our $x = 1;</span><br />
<span style="font-size: x-small;">sub foo() { return $x; }</span><br />
</blockquote>Notice that in C, the "differences" between the two are on the right, while in Perl they are on the left. Larry Wall is generally a pretty people-oriented person, so he may have done that to make reading declarations easier for coders. But I'd also bet that the first perl parser was a top-down parser, that consumed from the left.<br />
<br />
<div style="font-family: "Trebuchet MS",sans-serif;"><b>At last!</b><br />
</div>Because this is the point: top-down parsers have different operating characteristics. And one of those differences is that they want to 'commit' early to pursuing a particular path. As a result, top-down parsers are going to excel at recognizing languages where the major decisions are encoded to the left. Declaring "sub foo" puts the fact that this is a function declaration right up front. A top-down parser doesn't have to ask a single question about what's going on: it knows from the very first token that this is a subroutine.<br />
<br />
One of the more frequent questions on the ANTLR support list was about parsing C-like declarations, where the information that a function declaration is taking place didn't show up until the very end. The only difference between an external declaration of a function and the function declaration itself is that the final semicolon is replaced by a block:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">int foo(void) ;</span><br />
<span style="font-size: x-small;">int foo(void) { return 1; }</span><br />
</blockquote><br />
<div style="font-family: "Trebuchet MS",sans-serif;"><b>How does that affect me?</b><br />
</div>If you are coding a top-down parser, you hate C syntax for this reason. Almost all of the grammar specifications for languages like C are built expecting a bottom-up parser. So blindly converting the grammar from one syntax to the other will produce a parser that occasionally does a lot of work to parse a declaration, then discards it all, then redoes the exact same work to parse a function definition. How much fun is that?<br />
<br />
Getting around this requires changing the structure of the grammar definition. Instead of having two rules for declarations and function definitions, you want a single rule:<br />
<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">declaration: specifiers declarator ';'</span><br />
<span style="font-size: x-small;">definition: specifiers declarator block</span><br />
</blockquote>becomes:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">any_decl: specifiers declarator [ ';' | block ]</span><br />
</blockquote><br />
This avoids the parse-forget-reparse problem, at the expense of making your grammar actions heavier. <br />
<br />
In turn, rewriting your rules like this means you tend to change your grammar. If your declarations and definitions are all the same rule, why not accept more than one? <br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">int foo(void) { return 1;}, bar(void) { return 2; }</span><br />
</blockquote><br />
Anyway, that's enough for today.Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com0tag:blogger.com,1999:blog-6926838495579160836.post-39735579944462863462009-09-27T22:10:00.000-07:002009-09-27T22:24:11.371-07:00Multiple InheritanceAs an update to my <a href="http://close-parrot.blogspot.com/2009/09/subclassing-pastnode-in-nqp.html">previous post </a>on subclassing PAST::Node classes, I ran into another problem. It turns out that the PAST -> POST compiler knows about, and depends on the differences between, the different Node subclasses.<br />
<br />
For example, there are multi-methods that match on the types of their parameters, so that a PAST::Block gets processed by a different method than a PAST::Var. You get the idea.<br />
<br />
So my original idea, of deriving a set of roughly parallel Block, Var, Val classes from a new root that was a child of PAST::Node won't work. Because while a Slam::Block may be a Slam::Node, which may be a PAST::Node, it turns out that it also has to be a PAST::Block for the other stuff to work.<br />
<br />
<a name='more'></a><br />
<br />
So I'm stuck needing to "insert" my class behavior into several different places. I can't get between PAST::Block and PAST::Node, and I shudder to think about getting behind PAST::Node as a parent class. <br />
<br />
Deriving from PAST::Block, Node, Var, Val, Stmts, Op, etc. seems like the only reasonable way to go, except that I would have to reimplement all the Slam::Node methods in each child, and would lose the convenience of common ancestry.<br />
<br />
<b style="font-family: "Trebuchet MS",sans-serif;">P6object to the Rescue!</b><br />
But all is not lost, because among all the other cool things about Parrot is this: Parrot doesn't care about how many parents an object has. What's more, the P6object library supports adding parents willy-nilly. <br />
<br />
So the solution, in this case, was pretty easy:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;"> my $base := $meta.new_class('Slam::Node');<br />
<br />
my $block := $meta.new_class('Slam::Block', :parent($base));<br />
$meta.add_parent($block, PAST::Block);<br />
<br />
my $test := Slam::Block.new();<br />
if $test.isa(PAST::Block) {<br />
say("I'm a blockhead, ma!");<br />
}<br />
if $test.isa(Slam::Node) {<br />
say("I'm a slammer, too.");<br />
}<br />
</span><br />
</blockquote>Et voila! Multiple inheritance. Or, if you prefer, a "<i>slatheron</i>."Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com0tag:blogger.com,1999:blog-6926838495579160836.post-85191821605983261492009-09-27T08:42:00.000-07:002009-09-27T22:23:53.779-07:00Subclassing PAST::Node in NQPOne of the problems with NQP is that it's not quite perl6. And that means if you do much development, you eventually run up against a corner of the language where the sidewalk just ends.<br />
<br />
The support for objects is an example of this. There's no problem with defining methods, or defining a class name. There's no problem with creating a new instance of the class. But that's where it stops being easy. Because the syntax for extending another class is missing.<br />
<br />
One thing I'd like to do is subclass the PAST::Node class(es) in my own code, so I can use method invocations to call functions in a different namespace. This would change code like:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">close::Compiler::Type::merge_specifiers($node1, $node2)<br />
</span><br />
</blockquote>into something like:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;"> $node1.merge($node2);<br />
</span><br />
</blockquote>which would make my fingers happy, if nothing else.<br />
<a name='more'></a><br />
<b><span style="font-family: "Trebuchet MS",sans-serif;">Automatically generated, but wrong</span></b><br />
One problem is that when I declare a class in NQP, like:<br />
<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">class close::Compiler::Type;<br />
</span><br />
</blockquote>NQP emits an initload block that creates a new class. The problem is that it creates the class with no parent. Whoops. So the first thing to do is to change from using a '<span style="font-family: "Courier New",Courier,monospace;">class</span>' keyword to using '<span style="font-family: "Courier New",Courier,monospace;">module</span>.'<br />
Now I won't have any class definition at all, which is actually good. All I need now is to make my own initload that creates the class.<br />
<br />
<b><span style="font-family: "Trebuchet MS",sans-serif;">Make your own initload sub</span></b><br />
The trick to rolling your own initload sub is not to do it -- you can't get there from here in NQP. What you <i>can</i> do, though, is take advantage of the fact that any code that is at package scope in your NQP source code is put into the initload sub for the class or module you are defining.<br />
<br />
So, since package scope is an initload sub, my solution is to define a sub and call it from package scope:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;"> _onload();<br />
<br />
sub _onload() {<br />
say("Hello, from _onload");<br />
}<br />
</span><br />
</blockquote><b><span style="font-family: "Trebuchet MS",sans-serif;">Creating a class, the right way</span></b><br />
Now that we know how to get a class created, it's time to dig into the PCT source code to see how it creates the classes. I looked in $parrot/compilers/pct/src/PAST/Node.pir, and found this:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;"> p6meta = new 'P6metaclass'<br />
base = p6meta.'new_class'('PAST::Node', 'parent'=>'PCT::Node')<br />
p6meta.'new_class'('PAST::Op', 'parent'=>base)<br />
</span><br />
</blockquote>Well, that looks pretty simple! Let's convert that to NQP:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">module PAST::Subclass {<br />
_onload();<br />
</span><br />
</blockquote><br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;"> sub _onload() {<br />
my $meta := Q:PIR { %r = new 'P6metaclass' };<br />
$meta.new_class('PAST::Subclass', :parent('PAST::Node'));<br />
}<br />
}</span><br />
</blockquote>Note that I chose to inherit from PAST::Node, rather than PCT::Node. Once you know the trick, you can subclass just about anything.<br />
<br />
<b><span style="font-family: "Trebuchet MS",sans-serif;">The init problem</span></b><br />
The next problem is that the P6metaclass system generates a really dumb 'new' method. So the easiest thing is to replace it. But if I replace it, how will I initialize the superclass data?<br />
<br />
This is where you have to investigate on your own. In this case, I chose to inherit the PAST::Node version of 'new,' because it calls self.init(...), which I could override.<br />
<br />
So I have an init method, but I need to call the PAST::Node::init method as well. That wouldn't be a problem, except that (1) there is no PAST::Node init method, it is inherited from PCT::Node; and (2) the PCT::Node init method wants its parameters flattened. It's more PIR code to the rescue, because NQP doesn't support flattening args:<br />
<blockquote><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">sub init(*@children, *%attributes) {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> # do my own stuff ...</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> # ... then call </span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> Q:PIR {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> .local pmc children, attributes</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> children = find_lex '@children'</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> attributes = find_lex '%attributes'</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> $P0 = get_hll_global [ 'PCT' ; 'Node' ], 'init'</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> self.$P0(children :flat, attributes :named :flat) </span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> };</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> return self;</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">}</span></span><br />
</blockquote>And now I've got a subclass. Just to bundle it into one big copy/pasteable bunch, here you go:<br />
<br />
<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">module PAST::Subclass {<br />
_onload();<br />
<br />
sub _onload() {<br />
</span><span style="font-size: x-small;"> </span><span style="font-size: x-small;">my $meta := Q:PIR { %r = new 'P6metaclass' };<br />
</span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> $meta.new_class('PAST::Subclass', :parent('PAST::Node'));<br />
</span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> }<br />
<br />
</span><span style="font-size: x-small;"> </span><span style="font-size: x-small;">sub init(*@children, *%attributes) {<br />
</span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> # do my own stuff ...<br />
</span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> # ... then call <br />
</span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> Q:PIR {<br />
</span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> .local pmc children, attributes<br />
</span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> children = find_lex '@children'<br />
</span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> attributes = find_lex '%attributes'<br />
</span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> $P0 = get_hll_global [ 'PCT' ; 'Node' ], 'init'<br />
</span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> self.$P0(children :flat, attributes :named :flat) <br />
</span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> };<br />
<br />
</span><span style="font-size: x-small;"> </span><span style="font-size: x-small;"> </span><span style="font-size: x-small;">return self;<br />
</span> <span style="font-size: x-small;">}</span><br />
</blockquote><blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;"> }</span><br />
</blockquote>Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com1tag:blogger.com,1999:blog-6926838495579160836.post-74990790091245543002009-09-26T07:42:00.000-07:002009-09-27T22:23:30.118-07:00The Whitespace HackEvery culture has certain rites of passage that they expect members to pass through before they are fully accepted. In some places, they're pretty extreme -- taking knives to your naughty bits, and the like. Fortunately, the Parrot community is a little more laid back than that. There are two rites of passage for new members to participate in. The second one, which I haven't done yet, is to code a new garbage collector. Apparently, when you've coded a new GC, you get to do <br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">svn commit -m "Today, I am a Parrot."</span><br />
</blockquote>and you're totally in. I haven't reached that far, but I'm here to tell you that the <i>first</i> rite of Parrot-hood is pretty easy: filling out a ticket on <a href="http://trac.parrot.org./">http://trac.parrot.org</a>.<br />
<br />
<a name='more'></a><br />
<br />
You have to register, obviously, and a lot of the tickets get closed out with a comment like "You're doing it wrong." or "That's how it's supposed to work. Read the docs." But if you keep trying, eventually you'll get a real ticket in. And then comes the best part -- they fix stuff! There are few things in life more satisfying than watching people work your tickets on #parrot, while chortling "Dance, puppets! Dance while I pull your strings with my ticket submissions!"<br />
<br />
Actually, it turns out there are some things more satisfying than that. But submitting tickets is definitely a good way to get into the community. And the earlier you start, the more bugs there will be, so the easier they are to find. It's like a Multi-Level Marketing program -- the earlier you join, the easier it is, and the more you gain. In the case of Parrot, mostly you're gaining karma on #parrot. But everybody starts somewhere.<br />
<br />
<b style="font-family: "Trebuchet MS",sans-serif;">Anyway...</b><br />
I submitted a ticket (<a href="https://trac.parrot.org/parrot/ticket/1065">TT#1065</a>) today about a problem with PIR that is inserted in-line in PGE rules. The ticket isn't very interesting, but it lets me segue into talking about the Whitespace Hack.<br />
<blockquote><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">token ws {</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> | <?{{</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> $P0 = get_global '$!ws'</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> if null $P0 goto noshort</span></span><br />
<br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> $P1 = $P0.'to'()</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> $P2 = match.'to'()</span></span><br />
<br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> if $P1 != $P2 goto noshort</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> .return(1)</span></span><br />
<br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> noshort:</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> set_global '$!ws', match</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> .return(0)</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> }}></span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> | <!ws> <.WS_ALL>*</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">}</span></span><br />
</blockquote>Give that a read-through, and see if it's obvious what is going on.<br />
<br />
It's sure not obvious to me. Even when I went back and read it later, I totally missed some stuff. Pmichaud is a pretty smart guy, and he's solving an important problem, but even when you know what he's doing, it's easy to miss stuff. <br />
<br />
<b><span style="font-family: "Trebuchet MS",sans-serif;">The <.ws> rule</span></b><br />
<br />
The first thing to be aware of is that the <span style="font-family: "Courier New",Courier,monospace;">ws</span> rule has a special place in Perl6 grammars. If you specify a <span style="font-family: "Courier New",Courier,monospace;">rule</span> (as opposed to a <span style="font-family: "Courier New",Courier,monospace;">token</span> or a <span style="font-family: "Courier New",Courier,monospace;">regex</span>), or if you add some modifiers (<span style="font-family: "Courier New",Courier,monospace;">:sigspace</span>, I think, but you'd do well to check the specs), then the rule automatically matches <b>optional</b> whitespace everywhere you have white space in your grammar.<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">rule my_rule {<br />
my dog has fleas<br />
}<br />
</span><br />
</blockquote>This rule matches a bunch of literal characters - because all "normal" characters match themselves unless you do something to them - and it also <b>optionally </b>matches whitespace in between some of them.<br />
<br />
The way this works is that the regex above is modified by adding calls to a whitespace-matching rule -- you guessed it, <span style="font-family: "Courier New",Courier,monospace;"><ws></ws></span>. The effect is the same as if you wrote:<br />
<blockquote><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">rule my_rule {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> <.ws>my<.ws>dog<.ws>has<.ws>fleas<.ws></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">}</span></span><br />
</blockquote>(Just so you know, putting that period in front means that the results of those matches don't get collected. It's "match <span style="font-family: "Courier New",Courier,monospace;">ws</span> and don't save the contents," which is backwards from old-style regexes, where you had to put parens around all the groups you <i>wanted </i>to keep.)<br />
<br />
One important part of "parsing" -- as opposed to "regexing," I guess -- is that lots of stuff in the input text gets treated as whitespace. For example, a fairly common approach to computer language parsing is to treat the comments as whitespace, so they can be ignored no matter where they appear. (Some old C hacks relied on a quirky implementation of this in the preprocessor to do "token pasting." Since comments were whitespace, a token followed by a comment ended the token. But since comments were replaced by 0 characters, a C comment could "glue" two tokens together with no spaces between them.)<br />
<br />
So we've got two different imperatives for the <span style="font-family: "Courier New",Courier,monospace;">ws</span> rule: it will get called <b>a lot </b>because of the automatic whitespace recognition feature; and it needs to potentially recognize a lot of different stuff, like "real" whitespace, comments, heredocs, and whatever else strikes the coder's fancy. Here's the <span style="font-family: "Courier New",Courier,monospace;">WS_ALL</span> rule I use for Close:<br />
<blockquote><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">token WS_ALL {</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> [ \h+ # WS</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> | \n [ {*} #= start_heredoc</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> [</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> <?{{ $P0 = get_hll_global [ 'close' ; 'Grammar' ; 'Actions' ], '$Heredocs_open'<br />
$I0 = $P0</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> .return($I0)</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> }}></span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> [ $<lines>=[ \h* <ident> \h* [ \n | $ ] ] {*} #= check_for_end</ident></lines></span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> || $<lines>=[ \N* \n ]</lines></span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> ]</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> ]*</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> #{*} #= finish_heredoc</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> ]</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> | '/*' .*? '*/' # C_BLOCK_COMMENT</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> | '//' \N* [ \n | $ ] # C_LINE_COMMENT</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> | <.POD></span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> ]</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">}</span></span><br />
</blockquote><b style="font-family: "Trebuchet MS",sans-serif;">Matching nothing<br />
</b>The key fact to remember about the <span style="font-family: "Courier New",Courier,monospace;">ws</span> rule is that whitespace is optional. The <span style="font-family: "Courier New",Courier,monospace;">ws</span> rule is being inserted because you, the developer, have whitespace in your rule. Maybe that's because you expect whitespace, or maybe it's because you want to separate two parts of a complicated rule to make things readable. So ultimately, <span style="font-family: "Courier New",Courier,monospace;">ws</span> absolutely <b>must</b> accept a zero-length pattern. <br />
<br />
The <span style="font-family: "Courier New",Courier,monospace;">token ws <span style="font-family: inherit;">regex at the top </span></span>is broken into three alternative paths. The first path is a check for a single-entry cache. I'll come back to the caching theory later -- it's one of those things I missed the first time through.<br />
<br />
The second rule is where most of the "work" of whitespace-ignoring is going to be done. Let's look at that in some detail:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;"> | <!ww> <.WS_ALL>*<br />
</span><br />
</blockquote>It seems pretty simple. The first part of the rule calls another rule, <span style="font-family: "Courier New",Courier,monospace;">ww</span>, that is also built-in to the PGE-generated grammar. The <span style="font-family: "Courier New",Courier,monospace;">ww</span> rule is a zero-width condition test, and it returns true if the current position is between two 'word' characters. That is, if the previous character matches <span style="font-family: "Courier New",Courier,monospace;">\w</span> and if the next character matches <span style="font-family: "Courier New",Courier,monospace;">\w</span>, then <span style="font-family: "Courier New",Courier,monospace;"><ww></ww></span> is true.<br />
<blockquote><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">token ww {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> <?after \w></span></span> <span style="font-size: x-small;"><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> <?before \w></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">}</span></span><br />
</blockquote>Of course, <span style="font-family: "Courier New",Courier,monospace;">ww</span> uses some magic to do its testing, so <i>it looks nothing like that! </i>But it could look like that, if it were slow.<br />
<br />
So what does <span style="font-family: "Courier New",Courier,monospace;"></span> do? Well, it returns true if <span style="font-family: "Courier New",Courier,monospace;"><ww></ww></span> is false. <span style="font-family: "Courier New",Courier,monospace;"></span> is the logical negation syntax in Perl6 grammars. If both characters are <b>not </b>word characters, then we're in the middle of some spaces or line noise. If both characters <b>are </b>word characters, then we're in the middle of a word. And if one character <b>is </b>while the other character <b>isn't</b> a word character, we're looking at either the start (<span style="font-family: "Courier New",Courier,monospace;">\W\w</span>) or the end (<span style="font-family: "Courier New",Courier,monospace;">\w\W</span>) of a word.<br />
<br />
Using <span style="font-family: "Courier New",Courier,monospace;"></span> matches three of those four cases: it matches the case where we are in the middle of spaces or line noise, the start of a word, and the end of a word. Most users won't care about the start-of-word case -- not many modern languages use words to introduce comments. So <span style="font-family: "Courier New",Courier,monospace;"></span> is a speed optimization, and a filter that eliminates middle-of-the-word invocations of the <span style="font-family: "Courier New",Courier,monospace;">ws</span> rule. <br />
<br />
Only when checking for "space" <i>might </i>be relevant -- when <span style="font-family: "Courier New",Courier,monospace;"></span> approves -- does the work of actually checking for whatever it is that Close calls "whitespace" begin.<br />
<br />
<b><span style="font-family: "Trebuchet MS",sans-serif;">Caching is a win</span><br style="font-family: "Trebuchet MS",sans-serif;" /></b>One way to speed up any repetitive task is to cache the results. This is a standard dogma for most computer people, but does it apply to whitespace? Especially, does it apply when you're scanning <i>forward </i>through a string doing pattern matching?<br />
<br />
It turns out that it does. Take a look at this code from the Close grammar.<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">rule declaration {<br />
<specifier_list> <declarator_part> [ ',' <declarator_part> ]* {*}<br />
}<br />
<br />
rule declarator {<br />
<dclr_pointer>* <dclr_atom> <dclr_postfix>* {*}<br />
}<br />
<br />
rule declarator_part {<br />
<declarator><br />
<dclr_alias>?<br />
# ...<br />
</span><br />
</blockquote>The entry point for these rules is <span style="font-family: "Courier New",Courier,monospace;">declaration</span>, which will match a specifier list, then check for whitespace, then call <span style="font-family: "Courier New",Courier,monospace;">declarator_part</span>.<br />
<br />
When <span style="font-family: "Courier New",Courier,monospace;">declarator_part</span> is called, it will check for whitespace, then call <span style="font-family: "Courier New",Courier,monospace;">declarator</span>.<br />
<br />
When <span style="font-family: "Courier New",Courier,monospace;">declarator</span> is called, it first will check for whitespace, then call <span style="font-family: "Courier New",Courier,monospace;">dclr_pointer</span>.<br />
<br />
Can you see where this is going? When an outer rule calls <span style="font-family: "Courier New",Courier,monospace;">declaration</span>, the first thing that happens is a whitespace check. The second, third, and so on, things that happen are whitespace checks. Assuming there is some whitespace to consume, the very first invocation of the <span style="font-family: "Courier New",Courier,monospace;">ws</span> rule is going to gobble it all up. After that, each subsequent call to the <span style="font-family: "Courier New",Courier,monospace;">ws</span> rule do a lot of extensive testing for various ways that it might not fail, before failing.<br />
<br />
So we come to the first alternate branch of the <span style="font-family: "Courier New",Courier,monospace;">ws</span> rule:<br />
<blockquote><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"></span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> |<?{{</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> $P0 = get_global '$!ws'</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> if null $P0 goto noshort</span></span><br />
<br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> $P1 = $P0.'to'()</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> $P2 = match.'to'()</span></span><br />
<br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> if $P1 != $P2 goto noshort</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> .return(1)</span></span><br />
<br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> noshort:</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> set_global '$!ws', match</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> .return(0)</span></span><br />
<span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> }}></span></span><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"></span></span><br />
</blockquote>When a subrule is invoked, even a subrule that is coded in PIR in-line, rather than being written using Perl6 regex syntax, there are two properties that already have a meaningful value. The <span style="font-family: "Courier New",Courier,monospace;">match.to()</span> property is the end of the current match, while the <span style="font-family: "Courier New",Courier,monospace;">match.from()</span> property is the beginning. The results are offsets from the start of the text -- numbers, in other words. <br />
<br />
When a new match is being set up, the start and end are going to point to the same place. And for zero-length matches, like the <span style="font-family: "Courier New",Courier,monospace;">ww</span> rule, they are always pointing at the same place. So what this code does is not use the start of the match, but rather the end. Because the <i>first </i>time a whitespace rule runs, it might actually match some whitespace, and so <span style="font-family: "Courier New",Courier,monospace;">.to()</span> and <span style="font-family: "Courier New",Courier,monospace;">.from()</span> will be different.<br />
<br />
But the next time the <span style="font-family: "Courier New",Courier,monospace;">ws</span> rule gets called in the same location, it will be called with <span style="font-family: "Courier New",Courier,monospace;">.to()</span> equal to <span style="font-family: "Courier New",Courier,monospace;">.from()</span>, and both of them will be pointing at the same place where that first <span style="font-family: "Courier New",Courier,monospace;">ws</span> match <i>ended. </i>So this cache ignores the start of the match, and focuses on comparing the end of the match with the endpoint of the cached result. It essentially checks if the very last whitespace match ended at the current position. If so, then the next call to the same function will also decide to stop here. So don't bother making the call.<br />
<br />
<b><span style="font-family: "Trebuchet MS",sans-serif;">When caching doesn't work</span><br style="font-family: "Trebuchet MS",sans-serif;" /></b>There's two problems with the whitespace hack that you probably can't see. They are related, and they spring from an <a href="http://close-parrot.blogspot.com/2009/09/recursive-compiling.html">earlier post</a>, in which I mentioned that Close calls itself to handle #include processing, as well as some internal voodoo.<br />
<br />
The first problem is that the PGE compiler outputs the inline PIR with no namespace. That's the ticket I just submitted, and while it isn't fatal, it did cause me to spend some time tail-chasing before I figured out where the <span style="font-family: "Courier New",Courier,monospace;">$!ws</span> variable was winding up.<br />
<br />
The second problem is that when there is no whitespace at the beginning of a translation unit -- as happens when the first line is #include -- the whitespace hack caches a match entry with an ending offset of 0. And when the included file starts, it checks for a cache entry, finds one (after all, it's a global), and <i>requires</i> that the beginning of the file be something other than whitespace. Whoops!<br />
<br />
The solution, for me, was to implement a "whitespace hack stack" to go along with my #include'd filename stack. So when I change files, I also save and restore the cached <span style="font-family: "Courier New",Courier,monospace;">$!ws</span> variables. Technically, I could just blow away the contents, but since I had to write all the code to reach into the variable and set it to <i>something,</i> I decided to save and restore it.Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com1tag:blogger.com,1999:blog-6926838495579160836.post-55207007973144696402009-09-25T10:42:00.000-07:002009-09-27T22:23:02.552-07:00Recursive CompilingYears ago, back when GCC was still on version 1.x, I went looking through the code. I don't remember why, but I was looking for a way to "just make a function." I had some notion that down in the bowels of the code there would be a function called "create_function" or some such, and I could pass it a return type and a name and who-knows-what-else, and it would return a function object. Or maybe it would automatically emit the function object. I don't remember what I was thinking at the time, I just remember being frustrated by the nest of snakes I uncovered, each function requiring a huge initialized data structure as input, making a small update to the tree, and returning.<br />
<br />
The other day I was working on Close, and I needed to create a function for the namespace init code. And I thought, "what I need is a function that will just create this one thing for me." Deja vu!<br />
<br />
<a name='more'></a><br />
<br />
So I wrote that function. And with the caveat that I knew exactly what the return type was going to be (void), and what the parameters were (none), and what the content would be (none), it still took more than 40 lines of NQP code to produce the result I wanted. <br />
<br />
Why? Well, the compiler is built around the assumption that it is parsing input, so each function requires a node, operates on that node and maybe a few more, and so forth. So my "create a function" function had to create a declarator, then create a type specifier, then look up the type specifier in the symbol table, then link the declarator and specifier with a symbol, then convert the resulting function declaration into a function definition, et cetera, et cetera.<br />
<br />
Eventually, the created function would be added to the namespace as the very first function in the namespace. And even later, every single one of those functions would be automatically discarded by the code marshaller because they had no code inside. <br />
<br />
<b style="font-family: "Helvetica Neue",Arial,Helvetica,sans-serif;">Wouldn't it be nice if ...</b><br />
I spent some time on IRC, watching people who are way smarter than me make progress on Parrot, and I happened to mention to <a href="http://wknight8111.blogspot.com/">WhiteKnight </a>that I wished I could just put the declaration of my function into a string and invoke the parser on the string. It was kind of a joke, and WhiteKnight cautioned me that misery and evil lay down that path, and that I should avoid it.<br />
<br />
Well, as mentioned, I sat down and grunted out the fortysomething lines of NQP code that it took to create the namespace init subs. And boy, did that suck. (A whole bunch of functions, <i>every single one of which</i> is being used in a way that was not intended. What could go wrong?)<br />
<br />
<b><span style="font-family: "Helvetica Neue",Arial,Helvetica,sans-serif;">Today I changed my mind.</span></b><br />
<br />
It turns out that the way Close processes #include directives is by reading in the contents of the file to a String, then passing that String to the compiler's 'compile' method with a note saying "don't do more than PAST generation."<br />
<br />
Here's two functions from the Close include-file module that illustrate how it works. Ultimately, it's parse_text that does the work. And that function is a whopping 15 lines, most of which is scaffolding. Thanks, <a href="http://docs.parrot.org/parrot/latest/html/docs/book/pct/ch03_compiler_tools.pod.html">PCT</a>, for making life easy.<br />
<br />
<blockquote style="font-family: "Courier New",Courier,monospace;"><span style="font-size: x-small;">sub parse_text($text) {<br />
my $result := Q:PIR {<br />
.local pmc parser<br />
parser = compreg 'close'<br />
<br />
.local string source<br />
$P0 = find_lex '$text'<br />
source = $P0<br />
<br />
%r = parser.'compile'(source, 'target' => 'past')<br />
};<br />
<br />
DUMP($result);<br />
return $result;<br />
}<br />
<br />
sub parse_include_file($node) {<br />
get_file_contents($node);<br />
my $contents := $node<contents><contents>;<br />
<br />
if Scalar::defined($contents) {<br />
push($node.name());<br />
close::Compiler::Scopes::push($node);<br />
<br />
# Don't worry about the result of this. The grammar <br />
# updates the current scope, etc.<br />
parse_text($contents);<br />
<br />
close::Compiler::Scopes::pop('include_file');<br />
pop_include_file();<br />
}<br />
<br />
return $node;<br />
}<br />
</contents></span><br />
</blockquote><br />
It turns out that invoking the compiler recursively is the right thing to do. The code I had to write for include files, and the code I had to refactor to provide a callable interface for special-purpose internal naughtiness, is nowhere near as brittle as the code for creating a function directly. To say nothing of being way more maintainable. The include file code looks like "open a file, slurp the file, call the parser," which is nicely understandable. The <i>create-a-function-from-scratch </i>code looked like a horrible mess, especially in the light of a new day.<br />
<br />
Working the kinks out of this code encouraged me to find a bunch of other places in the code where I was building data structures "by hand" to see if I could replace them with recursive parsing. So far, so good.Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com0tag:blogger.com,1999:blog-6926838495579160836.post-22656945993482676642009-09-22T21:37:00.000-07:002009-09-27T22:22:27.559-07:00Output ordering and Initializers<div style="font-family: Verdana,sans-serif;"><span style="font-size: small;"><b>Output ordering</b></span><br />
</div><br />
Today I got Close to emit functions in the same sequence they occur in the input file. That was easy.<br />
<br />
What wasn't easy was automatically creating a namespace init function to handle initialized declarators.<br />
<br />
If you compile code that looks like this:<br />
<blockquote style="font-family: "Courier New",Courier,monospace;">int x = 10;<br />
</blockquote>There's no function involved. Clearly, this is a global variable 'x', and its initial value is 10. But Parrot does not use the same model that *nix uses, of loading a file that contains a partial memory image. Since there's no data segment, any and all initialization have to be done by code.<br />
<br />
<a name='more'></a><br />
<br />
That's where the namespace init functions come in. A namespace may contain variables and functions that depend on the variables. For the case of simple initialization - like the x = 10 case above - I'm willing to take care of that for you. If you want a more complicated initialization, where some data depends on the initialization of other data, well, you can write that code yourself. (Seriously - that's what the ':init' adverb is for.)<br />
<br />
<div style="font-family: Verdana,sans-serif;"><b>Executive decision</b><br />
</div><br />
One decision I had to make was the order of declaration of the _nsp_init functions. Should they be declared (and, by the rules of Parrot, be executed) after other functions, or before them. My decision was that _nsp_init functions will be "declared" the first time the namespace appears. Any initializers in the namespace will (eventually - this doesn't work yet) get appended to the _nsp_init function.<br />
<br />
This means that initializers cannot depend on user-specified :init functions having run. If you need that, then you will have to perform your own initialization.<br />
<br />
<div style="font-family: Verdana,sans-serif;"><b>Results</b><br />
</div><br />
The result of this is all but invisible to you. But it means that this code:<br />
<blockquote><div style="font-family: "Courier New",Courier,monospace;">namespace A::B {<br />
</div><div style="font-family: "Courier New",Courier,monospace;"> int x = 10;<br />
</div><div style="font-family: "Courier New",Courier,monospace;"><br />
</div><div style="font-family: "Courier New",Courier,monospace;"> void my_init() :init {<br />
</div><div style="font-family: "Courier New",Courier,monospace;"> say(y);<br />
</div><div style="font-family: "Courier New",Courier,monospace;"> y = 20;<br />
</div><div style="font-family: "Courier New",Courier,monospace;"> }<br />
</div><div style="font-family: "Courier New",Courier,monospace;"><br />
</div><div style="font-family: "Courier New",Courier,monospace;"> int y = 100;<br />
</div><div style="font-family: "Courier New",Courier,monospace;"><br />
</div><div style="font-family: "Courier New",Courier,monospace;"> void main() :main {<br />
</div><div style="font-family: "Courier New",Courier,monospace;"> say(y);<br />
</div><div style="font-family: "Courier New",Courier,monospace;"> }<br />
</div><div style="font-family: "Courier New",Courier,monospace;">}<br />
</div></blockquote>Will produce these subs, in this order:<br />
<br />
<blockquote>:: _nsp_init // empty<br />
:: A :: _nsp_init // empty<br />
:: A :: B :: _nsp_init <br />
x = 10<br />
y = 100<br />
:: A :: B :: my_init<br />
:: A :: B :: main<br />
</blockquote>Of course, the empty _nsp_init subs will be silently deleted by the compiler, so they never appear in the output. But if they had content, that was added later, say, they would be emitted in that order.<br />
<br />
The output of running the code above would be:<br />
<blockquote>100<br />
20<br />
</blockquote> Because the _nsp_init function was emitted (and so, would run) before the my_init function.Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com2tag:blogger.com,1999:blog-6926838495579160836.post-9843380365643122202009-09-22T04:18:00.000-07:002009-09-27T22:21:46.408-07:00Referencing symbols in another namespaceToday, I got this code to DWIW:<br />
<blockquote><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">#include <std/io></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">#include <std/test></span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">namespace test {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> namespace hll: close :: test :: nested {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> namespace deeply {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> void goodbye() {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> say("Adios, amigo.");</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> }</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> }</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> }</span></span><br />
</blockquote><blockquote><span style="font-size: x-small;"></span><br />
<a name='more'></a><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;"> </span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> void test() :main {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> using namespace ::test::nested::deeply;</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> say("namespace-function-say");</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> hello();</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> }</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> </span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> namespace nested::deeply {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> void hello() {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> say("Hello, world");</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> }</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> }</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> </span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> namespace X {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> void xray() {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> say("Specs!");</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> }</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> </span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> namespace ::test::nested::deeply {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> void say(string what) {</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> asm(what) {{ say %0 }};</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> }</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> }</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> }</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">}</span></span><br />
</blockquote><br />
What I want, in this case, is for all of the various invocations of 'say', except the one in xray(), to refer to the say function declared down at the bottom.<br />
<br />
There's a bunch of stuff I did wrong that I'd love to talk about, but it's late and I'm tired. So I'll say that compiler writers have things a lot easier than I thought - it's the guy writing the specs that does most of the heavy lifting. In this case, I'm writing a compiler with no spec, so every detail is a learning experience. (And let's face it: I'm old, and learning is painful.)<br />
<br />
Anyway, here's the output:<br />
<blockquote><span style="font-size: x-small;"><span style="font-family: "Courier New",Courier,monospace;">.namespace []</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.sub "anon" :subid("post15")</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.end</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.HLL "close"</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.namespace ["test";"nested";"deeply"]</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.sub "say" :subid("10_1253616492")</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> .param pmc param_12</span><span style="font-family: "Courier New",Courier,monospace;"></span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> .lex "what", param_12</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> get_global $P13, "what"</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> say $P13 </span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> .return ()</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.end</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.HLL "close"</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.namespace ["test";"nested";"deeply"]</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.sub "hello" :subid("11_1253616492")</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"></span><span style="font-family: "Courier New",Courier,monospace;"> get_hll_global $P15, ["test";"nested";"deeply"], "say"</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> $P16 = $P15("Hello, world")</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> .return ($P16)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.end</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.HLL "close"</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.namespace ["test"]</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.sub "test" :main :subid("12_1253616492")</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"></span><span style="font-family: "Courier New",Courier,monospace;"> get_hll_global $P18, ["test";"nested";"deeply"], "say"</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> $P18("namespace-function-say")</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> get_hll_global $P19, ["test";"nested";"deeply"], "hello"</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> $P20 = $P19()</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> .return ($P20)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.end</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.HLL "close"</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.namespace ["test";"nested";"deeply"]</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.sub "goodbye" :subid("13_1253616492")</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"></span><span style="font-family: "Courier New",Courier,monospace;"> get_hll_global $P22, ["test";"nested";"deeply"], "say"</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> $P23 = $P22("Adios, amigo.")</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> .return ($P23)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.end</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.HLL "close"</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.namespace ["test";"X"]</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.sub "xray" :subid("14_1253616492")</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"></span><span style="font-family: "Courier New",Courier,monospace;"> get_global $P25, "say"</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> $P26 = $P25("Specs!")</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;"> .return ($P26)</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">.end</span><br style="font-family: "Courier New",Courier,monospace;" /></span><br />
</blockquote>It's worth noting that I don't stop on an error - I keep trying to generate stuff. This will probably not be true in production - any error will cause an exit 1, etc. But for now, I need to see what's going on. Also, the order of generation is presently determined by the traversal order of some internal trees I've built. My next step will be to generate functions in the same order they appear in the source code. (With the caveat that variable initializers will get shuffled together.)Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com0tag:blogger.com,1999:blog-6926838495579160836.post-86060324041033273362009-09-22T03:56:00.000-07:002009-09-22T03:58:01.843-07:00HowdyEveryone else is doing it, so why can't we?<br /><br />In this case, I'll be blogging about Close - a programming language on and for the Parrot VM.Austinhttp://www.blogger.com/profile/13284330930068467507noreply@blogger.com0