Multiple dispatch: a fix for some problems of single dispatch (Java etc.)

[2009-05-01] programming languages, pl fundamentals, dev, java, software engineering
(Ad, please don’t block)
Almost all well-known object oriented languages (Java, C#, JavaScript, Python, Ruby, ...) have single dispatch: Methods are chosen depending on the (single) receiver of a message. On the other hand, there is multiple dispatch which is an interesting mix of functional method selection with mutable state. It is used in less-known object-oriented languages such as Common Lisp.

In this article, we'll first look at Java's single dispatch and Java's overloading and then use what we have learned to understand multiple dispatch and how it solves some design dilemmas that can't be solved with single dispatch.

Dynamic dispatch

Let's start with dynamic dispatch:
    public class Empty extends Object {

        @Override
        public String toString() {
            return "Empty";
        }

        public static void main(String[] args) {
            Object empty = new Empty();
            System.out.println(empty.toString());
        }
    }
What is the output of this program? It is “Empty”. If this seems obvious, it is because you are already very familiar with dynamic dispatch: Java determines at runtime what class an instance belongs to and chooses the appropriate, possibly overridden, method. For the example above, this means that we don't use the toString() method from class Object, but the toString() method from class Empty. In some languages, such as C++, you have to explicitly state that you want dynamic dispatch.

Overloading

Now on to overloading, another way of picking a method implementation.
    public class Printer {
        public void print(String str) {
            System.out.println("String: "+str);
        }
        public void print(Object obj) {
            System.out.println("Object: "+obj);
        }
        public static void main(String[] args) {
            Object obj = "abc";
            new Printer().print(obj);
        }
    }
Here the output is “Object: abc”. The method implementation is chosen statically, at compile time: Internally, the compiler uses the static types of the method arguments to disambiguate the method names. This time, the result is unexpected, even really good Java programmers that I've asked get it wrong. Due to dynamic dispatch feeling so natural for the receiver of a message (=“this”), we expect it to work the same for the arguments. There is a kind of asymmetry between the receiver and the arguments of a method and that asymmetry is reflected in the invocation syntax, too.

Multiple dispatch

With multiple dispatch, methods become top-level constructs. This is similar to implementing a single dispatch method “foo” with two arguments as a static method:
    public static void foo(this, arg1, arg2) {
        if (this instanceof A) {
            ...
        } else if (this instanceof B) {
            ...
        } ...
    }
The message receiver “this” becomes just another argument and all variations of the method are united in a single place (having this kind of view on a method helps with understanding implementations that use polymorphism, but I digress). Instead of the myObj.foo(x,y) you now invoke foo as foo(myObj,x,y). This is still single dispatch. Multiple dispatch nests instance-of checks for arg1 and arg2 inside the checks for “this”. Only after checking the types of all arguments do we decide which variation of the method to use. Common Lisp calls foo a generic function and the code snippets inside it methods.

Note that the if statements were for illustration only, languages with multiple dispatch have efficient algorithms for performing the checks and selecting a method.

What advantages does this have?

A generic function can “belong” to several classes. This helps whenever method arguments are not true parameters (data), but rather collaborators (that offer services) for an algorithm. For example, if you have a method Database.export(Filesystem,UserFeedback), this method might contain as many Database invocations as Filesystem invocations. The special case of binary operators as methods exhibits the same difficulty: Should the operator “String + Integer” be considered part of class Integer or part of class String? It is for a reason that UML has special diagrams for collaborations and that these diagrams cross-cut classes.

One more example of collaborating objects is the visitor pattern: It is a clumsy simulation of multiple dispatch with single dispatch. What you have at its core is the object for the algorithm collaborating with the object for the data. With multiple dispatch, things are much simpler, there is less code to write and the data objects do not have to be prepared for visitors. Interestingly, even the explicit object for the algorithm disappears, because the generic function replaces it.

Another area where the asymmetry of single dispatch shows is with the null value. For example, "abc".equals(null) is OK while null.equals("abc") causes an exception (and is not even directly syntactically correct). If you introduce null checks as selection criterion for methods, then handling null values is simple with multiple dispatch.

Extending a class is trivial with multiple dispatch, just create a new generic function that accepts instances of that class as its argument. With Java, people often overlook external static methods that actually extend a given class, because they don't know where to look. For example, if you don't know Java well, you might be puzzled as to why List has no sort() method. If you do, you know that the class Collections has a static method sort(List) that you have to use. In languages with multiple dispatch, one already assumes that in general, a generic function is relevant for several classes. The development tools help one with finding all functions that apply to a given class, making sure that code is re-used instead of re-invented.

Having code tightly integrated with the data is less desirable in settings where you serialize objects. With generic functions, code and data are separate and it is easier to use the same data structures on the server and the client. The server can host a lot of code that generates or modifies data. The client only has to display the data and lets the server handle the more complicated stuff. This is a frequent scenario when doing client-server communication with the Google Web Toolkit. As Java does not have generic functions, the server-only functionality has to be moved to external static helper methods. Consequently, things are even less encapsulated, slightly messy and one loses polymorphism.

Predicate dispatch

A generalization of multiple dispatch is predicate dispatch. With multiple dispatch, the methods of a function are selected based on the types of the arguments. With predicate dispatch, more complex conditions can be specified. This simplifies changing the behavior of an instance depending on its state and obviates the need for the strategy pattern. For example: Let's assume that an instance of class Bar has to behave differently if an error flag is set to true. The relevant generic functions will contain one method that is invoked if the flag is true and another one if the flag is false. Attached to the first method is an explicit selection condition such as myBar.error. The second method has a condition such as !myBar.error. These selection conditions are additional ways of classifying instances. One could say that myBar changes its class depending on the value of its field error.

Conclusion

We have seen that multiple dispatch can do several things that single dispatch can't. But I think that both are complementary. Message passing is a nice and clean metaphor for method invocation that works well with distributed computing. It views objects as components that provide services. On the other hand, objects-as-data result in phenomena (binary operators etc.) that are best implemented with multiple dispatch.

Further reading

The basics in depth

Advanced topics

Languages close to Java with multiple dispatch