Visitor pattern and dynamic dispatch

The Design Patterns book by the infamous Gang of Four contains 23 object oriented design patterns. Some of them are kind of obvious (Adapter), or have become common knowledge since the book was published in 1994 (Iterator). My personal favourite is Visitor.

Visitor "represents an operation to be performed on the elements of an object structure. Visitor lets you define a new operation without changing the classes of the elements on which it operations" [except you need to add an accept method].

Without Visitor

There are plenty of introductions online, so I'll focus on what it really solves, by means of code examples (using Kotlin). Let's start with a simple tree-shaped data structure:

interface Tree {}

class Node(public vararg val children: Tree): Tree {}

class Leaf(): Tree {}

If we want to define an operation on this structure, such as converting it to XML, the obvious way is to add a method on the structure:

interface Tree {
    public fun toXML(): CharSequence
}

// Implementations are left as an exercise to the student.

fun main(args: Array<String>) {
    val data: Tree = Node(Node(Leaf()), Leaf(), Leaf())
    println("text: " + data.toXML())
}

Sometimes this is the way to go, but other times we want to separate the business data from the XML serialization.

So let's create a class that operates on the structure:

interface Tree {}

class Node(public vararg val children: Tree): Tree {}

class Leaf(): Tree {}

class Serializer {
    final public fun serialize(elem: Tree): CharSequence {
        return when (elem) {
            is Node -> serializeNode(elem)
            is Leaf -> serializeLeaf(elem)
            else -> throw NotImplementedError("Don't know how to deal with "
                    + elem + "; this is a bug")
        }
        // We are basically implementing a vtable, which is error-prone and
        // should not be necessary. This is a hint for what we really want:
        // dynamic dispatch on the `elem` parameter.
    }

    final public fun serializeNode(node: Node): CharSequence {
        val txt = StringBuilder("<node>")
        for (child in node.children) {
            txt.append(serialize(child))
        }
        return txt.append("</node>")
    }

    final public fun serializeLeaf(leaf: Leaf): CharSequence {
        return "<leaf/>"
    }
}

fun main(args: Array<String>) {
    val data: Tree = Node(Node(Leaf()), Leaf(), Leaf())
    val serializer = Serializer()
    println("text: " + serializer.serialize(data))
}

This works (without visitor pattern):

<node><node><leaf/></node><leaf/><leaf/></node>

Dynamic dispatch

The above approach worked: the data and operation were separated, and we got the desired output.

The problem is the serialize(elem: Tree) method. We're essentially implementing dynamic dispatch by hard-coding a vtable. What does that mean?

Dynamic dispatch is a fancy way of saying that the method to call is determined at runtime, rather than at compile time. (It is called static dispatch at compile time).

In many languages with object oriented elements, such as Python, C++ or the JVM ones, there is dynamic dispatch on the object type.

What I mean is that when we call thing.method(), the method being called is determined at runtime by the type of thing. Have a look at this example:

interface Tree {
    public fun method()
}

class Node(public vararg val children: Tree): Tree {
    override fun method() {
        println("Node.method")
    }
}

class Leaf(): Tree {
    override fun method() {
        println("Leaf.method")
    }
}

public fun treeFactory(): Tree {
    if (Random().nextInt(2) == 0) {
        return Node()
    } else {
        return Leaf()
    }
}

public fun main(args: Array<String>) {
    treeFactory().method()
}

It is not known at compile time what the runtime type of treeFactory() is. Therefore the compiler doesn't know which method() will be called, and it is determined at runtime: dynamic dispatch (dynamic dispatch works for overriden functions).

This does not apply to 'normal' parameters (in some languages like Python, the object is an explicit parameter self, in others like Java it is implicit - when I say 'normal' I mean not-self). Let's try:

class Bla {
    fun method(node: Node) {
        println("Bla.method for node")
    }
    fun method(leaf: Leaf) {
        println("Bla.method for leaf")
    }
}

public fun main(args: Array<String>) {
    Bla().method(treeFactory())
}

We again don't know the type of treeFactory() at compile time, so we want to determine whether to call method(node: Node) or method(leaf: Leaf) at runtime. But this isn't possible in many languages: 'normal' arguments are dispatched statically (at compile time, using static type information). This is function overloading (as opposed to overriding).

The code above does not compile: we don't know whether treeFactory() if Node or Leaf (or something else), we only know that it is Tree, so we can only call method(elem: Tree) (or method(elem: Object)). If we add method(elem: Tree), this method is found statically, and the other overloads will not be used.

A vtable, or virtual function table, is the typical implementation of dynamic dispatch. It connects function implementations to types, much like we did in serialize(elem: Tree). The compiler or virtual machine usually generates these, and we don't have to think about this implementation detail.

With Visitor

So we cannot do dynamic dispatch on 'normal' parameters. But that is exactly what we want to do to solve our problem from before! If we could do dynamic dispatch, we could do:

interface Tree {}

class Node(public vararg val children: Tree): Tree {}

class Leaf(): Tree {}

class Serializer {
    final public fun serialize(node: Node): CharSequence {
        val txt = StringBuilder("<node>")
        for (child in node.children) {
            txt.append(serialize(child))
        }
        return txt.append("</node>")
    }

    final public fun serialize(leaf: Leaf): CharSequence {
        return "<leaf/>"
    }
}

fun main(args: Array<String>) {
    val data: Tree = Node(Node(Leaf()), Leaf(), Leaf())
    val serializer = Serializer()
    println("text: " + serializer.serialize(data))
}

Then the tree structure doesn't need to know anything about the serializer, we don't need to write a dispatch method and we can't mess up the dispatch method. (Because any class in any package may implement Tree, we might not get a compiler error if we forget a serialize(...) implementation unless we sealed Tree or add a language construct akin to traits).

But alas! Among the top 10 most popular programming languages (arguable), the only language that supports this is C# (>= 4.0). (There are quite some not too far down the list though).

Most can only do single dynamic dispatch, and only on the object type. So evidently, we need to have a method on Tree elements to dispatch dynamically. This method only exists to use dynamic dispatch and to trigger compiler warnings for forgotten implementations, so the implementation just forwards to the serializer.

Lets get to the implementation! The data types now have an accept method:

interface Tree {
    // We also use Composite pattern
    public fun accept(visitor: Serializer): CharSequence
}

class Node(public vararg val children: Tree): Tree {
    final override public fun accept(visitor: Serializer): CharSequence {
        // This is dynamic dispatch on visitor, but static on the Tree element.
        return visitor.serializeNode(this)
    }
}

class Leaf(): Tree {
    final override public fun accept(visitor: Serializer): CharSequence {
        // Pretty much the same as Node; all Tree instances need this.
        return visitor.serializeLeaf(this)
    }
}

Any class that implements Tree will be forced to have this accept (we can't enforce that it's valid, but fingers crossed). Let's implement the Serializer:

class Serializer {
    final public fun serializeNode(node: Node): CharSequence {
        // This is often named `serialize`, but I've used `serializeNode` to
        // emphasize this is statically dispatched, not connected to `serializeLeaf`.
        val txt = StringBuilder("<node>")
        for (child in node.children) {
            txt.append(child.accept(this))
        }
        return txt.append("</node>")
    }

    final public fun serializeLeaf(leaf: Leaf): CharSequence {
        // This is often named `serialize`, but I've used `serializeLeaf` to
        // emphasize this is statically dispatched, not connected to `serializeNode`.
        return "<leaf/>"
    }
}
fun main(args: Array<String>) {
    val data: Tree = Node(Node(Leaf()), Leaf(), Leaf())
    val serializer = Serializer()
    println("text: " + data.accept(serializer))
}

This works! But it seems kind of elaborate compared to what we had before. Not to mention more strongly coupled. What have we gained here?

  • We do not need to write a dispatch method (serialize(elem: Tree) in the early example).
  • We get a compile error if we forget a serialize...(...) method, rather than a runtime error.

The second advantage is the case because:

# If a Tree subclass is added (say BinaryNode), the interface requires it to implement accept. # The standard implementation of accept is a call to serializeBinaryNode(node: BinaryNode). # This is statically dispatched (in node), giving a compile error if it does not exist. # (In the general case, there may be an interface instead of Serializer. Adding serializeBinaryNode(node: BinaryNode) to the interface requires it on all implementations at compile time.)

A disadvantage is that we get the compile error in the Tree class that we forgot the serialize for, rather than in the Serializer.

Note one important generalization: with minor changes, we could implement more visitors (like Serializer) using the same accept method. So although the Tree elements must know about the fact that operations can be performed on them, they don't need to know what they are or even how many exist.

Here is a code sample for the general case, with two operations (Serializer and Converter). A Visitor interface was added so that the Tree doesn't know about the operations.

/// Data.

interface Tree {
    // We also use Composite pattern
    public fun <T> accept(visitor: Visitor<T>): T
}

class Node(public vararg val children: Tree): Tree {
    final override public fun <T> accept(visitor: Visitor<T>): T {
        // This is dynamic dispatch on visitor, but static on the Tree element.
        return visitor.visitNode(this)
    }
}

class Leaf(): Tree {
    final override public fun <T> accept(visitor: Visitor<T>): T {
        // Pretty much the same as Node; all Tree instances need this.
        return visitor.visitLeaf(this)
    }
}

interface Visitor<T> {
    // All visitors must be able to visit all nodes.
    fun visitNode(node: Node): T
    fun visitLeaf(leaf: Leaf): T

}

/// First operation: Serialize to XML.

class Serializer: Visitor<CharSequence> {
    final override public fun visitNode(node: Node): CharSequence {
        val txt = StringBuilder("<node>")
        for (child in node.children) {
            txt.append(child.accept(this))
        }
        return txt.append("</node>")
    }

    final public override fun visitLeaf(leaf: Leaf): CharSequence {
        return "<leaf/>"
    }
}

/// Second operation: Convert to another tree.

interface NewTree {}

data class NewNode(public val children: Array<NewTree>): NewTree {}

class NewLeaf: NewTree {}

class Converter: Visitor<NewTree> {
    final override fun visitNode(node: Node): NewTree {
        return NewNode(node.children.map { it.accept(this) }.toTypedArray())
    }

    final override fun visitLeaf(leaf: Leaf): NewTree {
        return NewLeaf()
    }
}

/// Testing.

fun main(args: Array<String>) {
    val data: Tree = Node(Node(Leaf()), Leaf(), Leaf())
    val serializer = Serializer()
    println("text: " + data.accept(serializer))
    val converter = Converter()
    println("new tree: " + data.accept(converter))
}

The output will be something like:

text: <node><node><leaf/></node><leaf/><leaf/></node>
new tree: NewNode(children=[NewNode(children=[blag.NewLeaf@63e31ee]), blag.NewLeaf@68fb2c38, blag.NewLeaf@567d299b])

Because we use generics, the accept method doesn't need to know about the return type (actually it does, but the compiler takes care of it). Another popular choice is to let visit... and accept be void, and store all the output on the visitor class.

Misconceptions

I'd like to focus on what I feel are misconceptions that some people have:

The purpose of the visitor pattern is to separate data and operations.

It's true that this is a purpose.

But as we have seen, that is easy to do without the visitor pattern. Just put the operation in another class. It's not the only purpose.

The visitor pattern is a way to implement double dispatch

First some nitpicking:

  • This has to be understood as 'double dynamic dispatch'; double static dispatch is just calling a method with two arguments.
  • Optimizations aside, any method that is not static, final or private is dispatched dynamically in Java. Let us only count those cases where we don't know the type.

So it is a misconception? Well, no, there can be two dynamically dispatched calls: data.accept(visitor) and visitor.visit...(this). But the second one is only dynamic because we had to throw away the type of the visitor, which we did know originally (in most cases). If there is only one visitor and we make it final, it is no longer dynamic dispatch.

So there is often double dynamic dispatch, but that's not the point. The point is to have dynamic dispatch in a parameter other than ``this``. We want the operation in a separate class, but we want to dynamically resolve based on the data class (which is the argument).

Visit and accept methods should be void.

This was already noted, but in many cases, it is a matter of preference. Void methods are more common I think.

Void methods help the accept method be independent of the data structure that the visitor produces. A lot of that can now also be achieved using generics, as in the above example. Generics have become widespread after the Gang of Four book was published (they have been around since at least 1967, but Java only got them in 2004, 10 years after the book).

Using return values is often short and easy, but in some cases, storing state on the visitor is preferable, for example as discussed in 'The visitor doesn't need to know about the data structure'.

The classes in the data structure don't need to be changed.

As was seen, a shared interface (Composite pattern) is necessary, and it must have something like an accept(Visitor) method.

So it would be fair to say that there is some increase in coupling: without the visitor pattern, the data structure doesn't need to know that operations on it exist at all.

But the data structure needs to know very little about the operations: only the interface with it's visit...(...) methods. So it may be considered acceptable, considering the benefits it brings.

The visitor doesn't need to know about the data structure.

In the examples, the visitor relies on the structure of the data: it explicitly iterates over the node's children.

It is possible to move this explicit iteration (and hopefully with that, the structural knowledge) to the data structure. Indeed, it seems reasonable that the structure of the data structure is contained in the data classes.

If the iteration is in the accept method, and each iteration yields a result, then accept must know how to combine them. This is undesirable, because it must then know the output format, increasing coupling and decreasing reusability. It would be difficult if not impossible for one accept method to work for both operations in the example, if it must also combine results (because one produces a flat stream and the other a tree structure). Therefore, we will have to switch to void methods to do this.

Let's look at the code with void methods and all the output state in the visitor:

import java.util.Stack

/// Data.

interface Tree {
    // We also use Composite pattern
    public fun accept(visitor: Visitor)
}

class Node(public vararg val children: Tree): Tree {
    // This method delegates to childen, because the visitor
    // doesn't know about the structure anymore.
    override fun accept(visitor: Visitor) {
        visitor.startVisitNode(this)
        for (child in children) {
            child.accept(visitor)
        }
        visitor.endVisitNode(this)
    }
}

class Leaf(): Tree {
    override fun accept(visitor: Visitor) {
        visitor.startVisitLeaf(this)
        visitor.endVisitLeaf(this)
    }
}

interface Visitor {
    fun startVisitNode(node: Node)
    fun endVisitNode(node: Node)
    fun startVisitLeaf(leaf: Leaf)
    fun endVisitLeaf(leaf: Leaf)

}

/// First operation: Serialize to XML.

class Serializer: Visitor {

    val xml = StringBuilder()

    override fun startVisitNode(node: Node) {
        xml.append("<node>")
    }

    override fun endVisitNode(node: Node) {
        xml.append("</node>")
    }

    override fun startVisitLeaf(leaf: Leaf) {
        xml.append("<leaf/>")
    }

    override fun endVisitLeaf(leaf: Leaf) {
    }
}

/// Second operation: Convert to another tree.

interface NewTree {}

class NewNode: NewTree {
    fun add(newNode: NewTree) {
        children.add(newNode)
    }

    override fun toString(): String {
        return "NewNode($children)"
    }

    val children: MutableList<NewTree> = mutableListOf()
}

class NewLeaf: NewTree {}

class Converter: Visitor {

    var activeNodes: Stack<NewNode> = Stack<NewNode>()
    init {
        // This node just holds children but isn't part of the tree
        activeNodes.push(NewNode())
    }

    override fun startVisitNode(node: Node) {
        val newNode = NewNode()
        activeNodes.peek().add(newNode)
        activeNodes.push(newNode)
    }

    override fun endVisitNode(node: Node) {
        /// EmptyStackException should not happen in practice
        activeNodes.pop()
    }

    override fun startVisitLeaf(leaf: Leaf) {
        activeNodes.peek().add(NewLeaf())
    }

    override fun endVisitLeaf(leaf: Leaf) {
    }

    fun getTree(): NewTree {
        assert(activeNodes.size == 1)
        return activeNodes.peek().children.first()
    }
}

/// Testing.

fun main(args: Array<String>) {
    val data: Tree = Node(Node(Leaf()), Leaf(), Leaf())
    val serializer = Serializer()
    data.accept(serializer)
    println("text: " + serializer.xml.toString())
    val converter = Converter()
    data.accept(converter)
    println("new tree: " + converter.getTree())
}
  • The iteration over children was indeed moved to the Node.
  • The same void accept method handled both visitors.

Great! and the Serializer stayed simple.

But some acrobatics were needed to make the Converter work. It keeps track of the nesting level as a stack of nodes, whereas the call stack did this implicitly before. And although the iteration over children is no longer in the visitor, it seems debatable whether the structural information was really moved, or merely duplicated. Would the visitor still work if the structure changed? At the same time, it is safe to say that the amount of code and complexity went up.

In my humble opinion, it makes more sense in most cases for the visitor to know the data structure than for the data to know about the visitors. After all, the visitor must already know about the data to do its job (e.g. include fields in the XML based on properties of the data, or transfer properties to the new tree structure).

The End

We have seen that the Visitor pattern can be used to implement multiple operations over a data structure that consists of multiple objects. The operations and the data structure can be separated into kind-of-independent classes. We've seen that this essentially amounts to implementing dynamic dispatch on an argument. And we've found that it catches several problems at compile time that would be runtime errors in the naive approach.

Comments

No comments yet

You need to be logged in to comment.