Friday, October 16, 2009

Parrot Multi-Subs and Inheritance

One of the things I've added to Slam (the Close compiler) is support for declaring multi-subs. In another language, these would be called "overloaded methods" or "overloaded functions," but in Close, it's all about the underlying hardware, and that means MultiSubs.

Before I write too much, let me make this clear: Parrot is not the same as Perl6, or Python, or whatever your favorite language is. And so, the way Parrot (and Close) handle things like object dispatch -- the "Object Meta-model," as it is called -- is different from the way other languages handle these things. This is one of those times where the difference matters. (And specifically, the stuff I write below about inheritance isn't true for Perl6. You have been warned.)

In Parrot's default object model (P6object), each class can have one or more parent classes, which can have one or more grandparent classes, etc. And in order to dispatch method calls "optimally," an algorithm called the C3 algorithm is used.

The subject in question is method resolution order, and it has to do with picking between a method, say "bark," that might be inherited from two different parents. If you have a class Tree, with a "bark" method, and a class Dog, with a "bark" method, and a child class DogWood that inherits from both Tree and Dog, which "bark" method runs when you invoke it? That's method resolution.

And when you "order" the inheritance hierarchy, so that methods from one class get chosen in preference to methods from another class, you have a method resolution order, or MRO. In most class hierarchies, MRO is simple. For instance, in the Parrot Compiler Toolkit (PCT) libraries, there are classes like this:
PCT::Node - the root of the class hierarchy
PCT::Block - a lexical scope
PCT::Stmts - a collection of statements, but not a lexical scope
PCT::Var - a declaration or reference to a variable
PCT::Val - a literal value
PCT::Op - an operation, like add or multiply
PCT::VarList - a collection of variable declarations

Conveniently, every one of those classes (except Node) is derived from PCT::Node. So the class hierarchy looks like a fan. But that makes the MRO trivial. For each class, the MRO is "search my methods, then search PCT::Node, then quit."

So what does this have to do with MultiSubs, you ask? Well, it turns out that MultiSubs are magical in two ways. First of all, a MultiSub is a single entity. Get that through your head, and let it sink in. It took me a while to understand it. A MultiSub is a PMC that is created by the compiler when you use the :multi modifier on a sub declaration (in PIR). Likewise in Close -- if you declare a function with :multi, it will be created as a MultiSub.

These MultiSub PMCs are treated somewhat like Sub PMCs, meaning that they get compiled into PBC files, loaded, and they respond to the .invoke() method call. But that's where things get ugly. Because a MultiSub secretly encodes a list of ordinary Subs. (I don't think it's possible to "nest" MultiSubs, but I haven't done any research on it.) Each of the ordinary Subs is associated with a Signature that indicates how many arguments, and of what types, are to be used in MultiMethod Dispatch (MMD). And yeah, it's called MMD even though it can be done on plain old non-method Subs. Go figure.

So when a MultiSub gets invoke()'ed, it takes its list of arguments and tries to match the args against a Sub using the signatures. There's an algorithm for that, too, that uses the "manhattan distance" -- essentially (but not exactly) the sum of the counts of how many "steps away" each argument is from the corresponding parameter in its own MRO list.

For example, since DogWood inherits from Tree and Dog, it might be one step, or two steps, from Tree. So a function declarated to take a single Tree parameter will have a manhattan distance of 1 or 2 from the DogWood argument. If a function of the same name were declared to take a DogWood parameter, it would have a distance of 0, and so be a "closer" match.

That's where the simplicity ends, though, because when dispatch is done on two parameters, you have the case where a function with one very close match and another very far match might be a better or worse match than a function with two "medium close" classes. For example, say you have a class hierarchy A -> B -> C -> D, where A is the "root" of the hierarchy. In this example, we have two subs "foo" declared as multi, with parameter lists (B, B) and (D, A). If I call foo(d, d), which is closer? How about foo(d, c)?

First of all, this is the stuff that gives academics nightmares. But it's okay, really, because if you find yourself in that position, you probably are getting what you deserve. (If the two subs do radically different things, what the hell were you thinking by overloading them? And if they do similar things, you're probably okay.)

While all that is interesting, it's not particularly relevant. In fact, it all "just works." What's more interesting is what happens when you don't match. That is, when you call a MultiSub with a set of arguments that just don't match any candidate. Consider an obvious (and recently bothersome) example: the compiler builds a tree of nodes. The Slam compiler extends the PCT::Node class hierarchy as I mentioned in my post on Multiple Inheritance, so there are a bunch of different node types to support. While I was trying to implement the Visitor pattern on the syntax tree, I implemented a Class::MULTISUB function in NQP that dynamically generates and compiles trampolines to convert a set of sub declarations into a multisub. The resulting Visitor class had overrideable methods for each possible node type (every class in the hierarchy). So if I have a visitor, $v, and call $v.visit($node), multidispatch invokes the right method based on the type of the first argument.


But what happens when I add a class to the hierarchy? Well, I usually forget to re-run the script that generates all the visitor methods. And so there is no multisub with a signature that matches. Whoops.

An even better question is this: what happens when I overload some of the visit methods in a new Visitor class?

Here's the problem: Parrot does not dispatch "beyond" a MultiSub. That is, when you call a method like 'visit', and Parrot's dispatch mechanism finds a method entry 'visit' that points to a MultiSub, it is done. Parrot hands control to the MultiSub, says "Here ya go. Good luck, buddy!" and doesn't wait around to see what happens. What this means is that by default, calling a MultiSub with an argument list that doesn't match any of the available signatures is a fatal error. Yikes!


So what happens if, say, I want to specify a particular action for handling nodes of a single type -- symbol declarations, say -- but I want the "default" handling to be inherited from some ancestor?


Well, now it gets tricky. First, remember that I'm not using multisubs or multimethods directly. Instead, I'm using these automatically generated trampolines. So the behavior you want to specify lives in a method called something like _visit_Slam_Symbol_Declaration() and it gets called by a trampoline that is part of some parent class' visit MultiSub. So the simple answer is, if you replace the right method, it will just work. Because calling $child.visit($node) results in a call to the inherited Ancestor::visit($child, $node) which in turn invokes $child._visit_Slam_Symbol_Declaration($node) which has been overridden in the Child class. Score!

But there's a problem. There's always a problem, or it wouldn't be worth writing about. The problem is that the host of little per-node-type visit methods is coded waaaay down at the root of the Visitor class hierarchy. And there are a few generations of other classes between the root of the hierarchy and where most code lives. (And here, let me put in a plug for an excellent paper by van Deurssen and Visser that serves as a solid introduction to Visitor/Combinators and the JJTraveler class library.)

The intervening layers of parent classes are not necessarily a problem, except when they override the .visit() method. Because a parent class that replaces the MultiSub with a simple Sub method suddenly eliminates the "it just works" mechanism -- the MultiSub dispatcher down at the root of the tree -- that makes per-type method overriding possible.

So what to do? Well, clearly I needed to replace the "root" MultiSub with one in the Child class. Okay, I've got code that does that automatically. But if the Child class only provides one MultiSub method -- because the Child only wants to process symbol declaration nodes, remember? -- there won't be any other methods available. And if there are no valid candidates in a MultiSub, it's a fatal error. Yikes.

One obvious solution was to implement a second multi-method that matched Slam::Node or PCT::Node or something, so that there was always a fall-back multimethod available. But that's boilerplate, and shouldn't that be generated automatically? Well, yes, IMO. So I implemented that, too.

Creating a multisub now does two things. First, it seeks out all the matching method names, and creates trampolines for them (so a call to Class::MULTISUB($class, 'visit', :starting_with('_visit_')) creates a MultiSub named 'visit', and automatically creates trampolines for each existing method that starts with _visit_ in the class' namespace. Each method is assumed to start with whatever prefix you specify, and then contain a class name, with '::' replaced by '_', so that a visit method that takes a Slam::Symbol::Declaration parameter would be named _visit_Slam_Symbol_Declaration.

The second thing is the automatic generation of a "default" multi. This is a multi-method with a signature of (_) -- that is, it is dispatched only on the 'self' parameter and considers any other parameters to be extras. This apparently works because the manhattan sort gives some kind of preference to "right number of parameters", so that while a signature of (_) does match, a signature of (_, Some::Parent::Class) is still "closer."

The "default" multi is generated in one of two ways. The multi compiler searches the class' MRO (available via the 'all_parents' argument to the inspect method on the class object) for the first ancestor with a matching method -- be it a MultiSub or plain old method -- and creates the default multi to call that method if it finds one. Otherwise, the default multimethod just die's with a diagnostic message indicating failure.

Okay, that's it for today. FMTYEWTKA MultiSubs. The short summary: multisubs are great, but that don't play well with inheritance. The longer summary: you have to do your inheritance chaining yourself. And if you need to know more, I recommend you do what I do - get on the #parrot IRC channel and pester PMichaud with questions. That dude knows just about everything.

No comments:

Post a Comment