On Monday, I attended Don Knuth’s annual Christmas Tree Lecture, where he talks about the most interesting thing he has learned about trees in the past year. This year, he talked about ternary trees and planar graphs. Among other things, he showed us a beautiful theorem with a correspondingly beautiful proof, which I shall reproduce here.
A (nonempty) ternary tree is a tree consisting of a root node, and (possibly) several other nodes, so that each node has at most three children (denoted left, middle, and right). We draw ternary trees like this:
In this ternary tree, A is the root node, and it has a left child (B) and a right child (C), but no middle child. B has a left child (D) and a middle child (E), but no right child. C has a left child (F) and a middle child (G), but no right child. D has only a right child (H). Some nodes (E, F, G, and H) have no children.
If we have a ternary tree, we can assign ranks to the vertices, as follows: the root gets rank 0, and if a node has rank , then a left child (if there is one) will have rank , a middle child will have rank , and a right child will have rank . Hence, in the above tree, A and F have rank 0; B, E, and H have rank ; D has rank ; C and G have rank 1.
We call a ternary tree a skew ternary tree if all the ranks are nonnegative. The question we’ll answer is, what proportion of ternary trees with nodes are skew?
Whenever we have such a question and, there is only one correct way to start: calculate a few terms and look up the sequence on the Online Encyclopedia of Integer Sequences. Fortunately for me, after going to the talk, I was able to remember enough of the terms of the total number of ternary trees to find it without having to do any calculation: it is sequence A001764, and it begins 1, 1, 3, 12, 55, 273, 1428, 7752. It tells me that there are of them. This isn’t too surprising, since the famous Catalan numbers enumerate binary trees (see Richard Stanley’s Enumerative Combinatorics Volume 2, Exercise 6.19(c)).
Unfortunately, the skew trees are not represented on OEIS, but since I remember the theorem, I can calculate them easily: the sequence begins 1, 1, 2, 6, 22, 91, 408, 1938. The theorem is as follows:
Theorem. The ratio of the number of skew ternary trees with nodes to the number of total ternary trees with nodes is .
I think this is a rather remarkable theorem, because I don’t see any a priori reason to expect any nice relationship between these two numbers. On the other hand, perhaps it’s not so different from the classical Catalan situation: there are ways to get from to by taking unit steps either up or to the right, and ways of doing this without every going below the diagonal line .
Before moving on to the proof, I want to pose a question to think about while going through the proof: can a variation of this argument be used to write down a formula for the Catalan numbers? I don’t see it, but I’d love to hear it if you do!
In order to prove the theorem, we start by filling in all the potential children, so that every node has three children or phantom-children. In addition, all nodes except the root have parent nodes, so in order to make the root act more like the rest of them, let’s add a phantom-parent for the root. Let us now count the phantom nodes: each node has exactly four neighbors or phantom-neighbors, so there are a total of edge-germs emanating from all the genuine nodes, put together. There a total of edges between genuine nodes, and each one counts as two edge-germs, so there are a total of phantom edge-germs, or $2n+2$ phantom nodes.
Now, we start from the root and trace the outline of the entire tree. Since we are civilised mathematicians, of course we go counterclockwise. As we go, we write down a list of numbers, adding a new one every time we hit a genuine node. It doesn’t matter what number we start with, but Knuth chose , so I’ll follow his lead. Now, as we trace, we hit genuine nodes at various times. When we do, we either increase or decrease our running total by one. If we’ve just gone around a phantom node, then we increase the total by one, whereas if we have just come from a (different) genuine node, we decrease it by one.
Let’s see this in action with the example above. After we add the phantom nodes, we get the following diagram:
We start just before the phantom parent-node to node A with . Then we move around the phantom parent node. This increases the total by 1, to . Continuing along the outline, we next reach node B, which decreases our count to . Then we go on to node D, which decreases our count to , and so on. Here is a table with the first few values.
We can also represent this table in pictorial form, just by keeping track of where we are in the running total after each step. Here is the picture for the above graph.
It’s a bit easier to keep track of what’s going on if we overlay this picture on five horizontal lines (also known as a musical staff), so we can see immediately where we are:
Now it’s easy to see that we start at and end at in this case. But in fact, that’s always true: we pass phantom nodes and edges between genuine nodes. The former decrease the count by one, and the latter increase it by 1, so we have a net by tracing the entire tree. So, we always start at and end at .
The next observation is that the root of the tree shouldn’t be thought of as being fixed in place. Instead, we can pick a new node to be the root and let everything dangle from there. But we should be a little bit more precise on this point: it’s not quite the root node that we want to choose, but the parent phantom node of the root. Once we have done that, if we keep all the other edges in the right order, we obtain another ternary tree. We can do this with each phantom node, and so we obtain a total of different ternary trees in this way.
The next thing to work out is how the list of running totals changes when we choose a new parent phantom node. But this is not so hard: we see the same increases and decreases starting from some point, but then it must wrap around from the beginning at some point. For example, if we pick the new parent phantom node to be the right phantom child of F, we obtain the following chart (marked with the red rectangle):
So far, we haven’t said anything about what distinguishes skew trees from arbitrary ternary trees, but there is a very nice interpretation of them in terms of the counts: they are exactly the trees that never return to . With that observation in mind, the proof of the theorem is now quite easy: since each trace around the outline of the tree increases the count by 4, if we trace the outline of the tree many times, there will be several places where we encounter certain counts for the last time. These will certainly be before phantom nodes, rather than genuine nodes, since genuine nodes decrease the count. Those phantom nodes are exactly the phantom nodes that lead to skew ternary trees if we hang the tree from them. Since the count increases by 4 when we trace the outline once, there will be four such phantom nodes in each trace, out of a total of . Hence, the proportion of skew trees among any one such family is , and so the proportion of all ternary trees which are skew is also .
So, that’s the proof. Can this method be used to come up with a formula for the Catalan numbers as well? It seems as though this should be possible: in a path from to with right and up steps, there are a total of edges and vertices. We can add in the phantom neighbors to obtain phantom nodes. The goal would be to show that it is possible to hang the picture up from each one of the phantom nodes, so that each one leads to a suitable lattice lath, and of those, exactly 4 of them don’t go below the diagonal. But I don’t know how to do this, so I’d be interested to hear suggestions.