Red-Black Tree in C++

Jukka Jylänki

2024-09-26

Table of Contents

1. Introduction
2. Motivation
3. Search
4. Preliminaries
4.1. Root is always Black
4.2. Color Flip
4.3. Rotate + Color Swap
5. Insert
5.1. Case 0. Iterating red Uncles up the tree
5.2. Case 1. A distant black Uncle
5.3. Case 2. The close black Uncle
5.4. Summary
6. Delete
7. Increase_height(N)
7.1. Case 1: Iterating to find red in the neighborhood
7.2. Case 2: Red sibling
7.3. Case 3: Only the parent is red
7.4. Case 4: Distant nephew is black
7.5. Case 5: Distant nephew must be red
7.6. Summary
8. Source Code
9. Conclusion
10. Appendix: About Red-Black Trees elsewhere

Introduction

The previous article outlined the classic Binary Search Tree data structure. It was observed that the BST structure is fundamentally flawed due to its inability to maintain a balancedrecall, in a search tree, the path from root to any leaf would ideally be the same length to make worst-case performance as good as possible. tree structure. As a result, the BST could degenerate into a slow linked list in the extreme worst case.

Given that the BST was so powerless to stop its tree from becoming arbitrarily unbalanced, the question arises whether it is even possible to create a search tree data structure that can guarantee to maintain a balanced tree over a sequence of Insert and Delete key operations? Further, even if it is possible, can this guarantee be maintained efficiently without sacrificing either asymptotic or a large amount of constant factor performancein other words, is it going to be fast enough in the terms that pragmatic software engineers care about? to do so?

Remarkably, the answer is yes. It is possible to create a self-balancing search tree data structure that ensures that after any sequence of insertions and deletions, the search tree will remain balancedmaybe not strictly balanced, but reasonably balanced with an upper bound on the longest-vs-shortest path factor..

Even more remarkably, this guarantee can be provided without sacrificing asymptotic performance. The opposite, maintaining a balanced tree provides even better guarantees, and provides a strict worst case logarithmic time asymptotic performance, as opposed to an unbalanced BST that could degenerate into a linear time data structure.

Most remarkably, and this is something that textbooks unfortunately do not stress enough, this kind of a balanced search tree data structure can be realized with the highest level of efficiency in practice. That is, maintaining a self-balancing binary search tree can be pulled off with so little engineering overhead that there is arguably little reason to ever use a non-balancing binary search tree. Self-balancing binary search trees will make the unbalanced BST from the previous article look just broken.

Let's take a look at the Red-Black Tree (RBT), which is a highly celebrated balanced Binary Search Tree data structure that leaves the unbalanced BST biting the dust. The Red-Black Tree has everything going for it: it has strict asymptotic performance, it is extremely performant in practice, and it can be implemented in a compact number of lines of code.

The Red-Black Tree was invented by Leo J. Guibas and Robert Sedgewick in 1978.

Motivation

Then name Red-Black Tree comes from the idea that each node of a BST are colored either red or black. Here are some examples of Red-Black Trees:

From left to right:

  1. A perfectSomeone coined the adjective "perfect" to describe a binary tree that is completely filled up without empty slots, i.e. has 2N-1 nodes and is fully balanced. Binary Tree that consists of all-black nodes.
  2. A small Red-Black Tree where the black root node has one red left child, and no right child. The absent right child is explicitly depicted here with a NULL pointer, but later we'll leave these NULL pointers out for clarity.
  3. The result of inserting numbers 1,2,3,4,5 in that order into a RBT gives a balanced search tree, as opposed to degenerating into a linear linked list like was seen in the previous BST article.
  4. Adding some random numbers into a RBT, results in some nodes being red and others black.

At this point, one might have a question:

"Why exactly are the nodes colored red or black?"

The colors themselves are irrelevant. The general idea here is to augment the data structure with one extra bit of information in each node. Using red and black colorsYou can think: red=bit 0, black=bit 1. is a mnemonic to give some connotation to these auxiliary bits. This is a common strategy in data structures design. For example, the Fibonacci Heaps also use an auxiliary bit, calling it maybe slightly unimaginably just a mark bit, i.e. nodes in a Fibonacci Heap are either marked or unmarked.

Red or black, marked or unmarked, the name doesn't particularly matter. This bit carries with it certain structural information that ensures the intended performance of the tree. It is convenient that unique mnemonics are used for these different data structures, helping readers separate different ideas from each other.

The real behavior in a Red-Black Tree comes from two rules that are imposed onto the data structure. These rules define what the Red-Black Tree is. They are:

  1. The red-red invariant: A red node cannot have any red children.
  2. The black height invariant: The black height of both the left and right subtrees of each node must be equal.

Here the black height of a subtree means the number of black nodes that exist in any pathThere can be multiple such paths. The black height invariant says that all node→leaf paths find equal number of black nodes in them. from the root of a subtree to a leaf.

If a left or a right child is missing, the black height of that missing subtree is considered to be zero.Many articles talk about empty links being "nils" and that the color of the nils is black, but that often confuses more than it helps, so simpler to avoid that formulation.

A Red-Black tree is still a binary search tree, so the above two rules complement the original invariant that the Binary Search Tree provides:

  1. The search tree invariant: For each Node in a search tree, it's Left and Right children (if present) are ordered Left < Node < Right.

Here are two examples that are not valid Red-Black Trees:

  1. The black height of the left subtree of node 3 is one. The black height of the right subtree of node 3 is zero. Therefore the black height invariant is not met.
  2. A red parent node has a red child node, so the red-red invariant is not met.

Compare these two trees to example 2 from before, which depicts a valid Red-Black Tree.

From the above cases we can note that if a node has only one child, that child must be red. Or, put equivalently, if a node has one black child, then it must have a second black child as well. We will use this kind of reasoning later when considering how to rebalance a RBT after having deleted a node from it.

"Ok, I can see there are some rules to these colors, but what does all this coloring have to do with balancing a search tree?"

The key observation here is that if one "ignores" (casually speaking) all the red nodes in a Red-Black Tree, then the remaining black nodes in the tree always form a perfectly balanced search tree. That is, the black height invariant enforces that at all times, the number of black nodes from the root to any leaf is the same.

The red nodes do not count towards the black height. So these red nodes can be thought of as being "cheat nodes", or "ghost nodes", or "glue nodes", whatever term one might want to give to them. The motivation in analysis is to get to pretend that traversing the red nodes is "free""free" as in "it is just a constant factor overhead" like computer scientists like to say. and so does not cost any CPU time.

Because red nodes do not count towards the analyzed height of the tree, one must constrain how many red nodes there can be, or we definitely cannot pretend to get them to be traversed for free. That is exactly what the first red-red invariant rule provides: there can never exist two red nodes "in a row" in a RBT.This lets one pretend in asymptotic complexity analysis that whenever one traverses a black node, one can "bank the time" to traverse at most one red node as well, subsuming the cost of traversing any red nodes during a Search.

The red nodes are the "accidentals" of a Red-Black Tree, so to say.

In the best case, when one traverses a path from root to a leaf, all nodes that are walked are black. This provides the shortest walk. In the worst case, we can have at most every second walked node being redno two reds in a row, remember, and every second node being black. I.e. in the worst case, we will only ever have to walk twice the number2x == a constant factor. of nodes compared to the best case.

For example, here is a Red-Black Tree with numbers 1,2,3,...,64 added to it in order. For clarity, only the node colors are shown.

Notice how the tree has balanced itself to contain these smaller subtrees of perfect binary search trees, with a small number of red "accidental" nodes "glueing" all these perfect sub-parts together? The red nodes are the imperfections in the structure, some of which are always necessary since perfect binary trees can only come in sizes of powers of two.

The "achilles heel" of BSTs, namely inserting already sorted data, now became an absolute no-problem scenario for Red-Black Trees, which produces neatly packed search trees.

There is such an elegance to this self-balancing behavior that is remarkable that it even exists.

On the other end of the spectrum, when randomly ordered keys are inserted, the generated tree will contain more red nodes:

However, even then, the search tree stays very efficiently balanced. Note how every path from root to leaf hops over the same number of black nodes, plus a fewer number of red nodes.

But we are getting ahead of ourselves, since we haven't yet looked at how exactly insertions into the tree is to be done. Hopefully at this point it is clear what the intuitive purpose of the red and black colors is.

One more observation: whereas for searching and insertion, we would prefer there to be as few of these red "accidental" nodes in our way as possible; for deletion, we actually would prefer there to be as many red nodes around as possible, because the red nodes represent flexibility to restructure the tree. So it does not necessarily mean that "red=bad" or "red=slow". This will be clear later when deletion is discussed.

In C++, one might represent a Red-Black tree with the following code.

template<typename K, typename V>
struct rbtree
{
  struct node
  {
    node *child[2];
    node *parent;                 // Parent pointer to each node, to be able to traverse the tree upwards
    bool black;                   // Color of each node
    K key;
    V value;
  } *root;
};
Listing 1. Red-Black Tree representation.

This is a pragmatic representation, although not the most efficient. For the purposes of this article, it works the job.

Search

In the previous section, it was noted that a Red-Black Tree is a Binary Search Tree that adds two new invariantsthe red-red invariant and the black height invariant. on top of the existing search tree invariant.

These two new invariants have no effect on the existing search tree invariant, so searching in a Red-Black Tree if a given key exists is exactly identical to searching a BST. See BST Search for details.

Preliminaries

In order to successfully manage a Red-Black Tree, there are a couple of helper operations that will be used, which are good to be examined in isolation before taking a look at Insert and Delete operations.

Root is always Black

A quirk about Red-Black Trees is that there is a degree of flexibility with the color of the root node. That is, if one has a valid RBT with a red root, it is always possible to just flip the color of the root to black without breaking any invariant.

In an efficient software implementation, we'll want to leverage this flexibility and enforce that the root node of the whole tree is always black. This will provide a natural termination condition for the Insert operation.

Color Flip

An interesting play that is possible with the colors in an RBT is to perform a flip of colors between a node and its children.

If the parent P of node N is black (or doesn't exist), and both L and R are red, then it is possible to "hoist up" the red color from the children up to N.

Likewise, if the children or L and R are all black (or don't exist), then it is possible to demote the color from N back to the children L and R.

Doing a color flip like this does not disturb the black height invariant, since in both cases, exactly one black node on the path remains. And if the parent P and/or grandchildren are black, then the red-red invariant is not disturbed either.

Rotate + Color Swap

Binary Search Trees can be mutated via a so-called rotation operation, which reverses the roles of a node N and its parent P, so that N becomes the parent of P, and P the child of N.

The left and right children L and R stay with their respective nodes, but the middle child M gets adopted by the demoted parent P.

This rotation can be performed either way. The important aspect about the rotation is that it preserves the search tree invariant. This is easy to see: by performing an in-order traversali.e. walk the tree recursively in search order: left child first, then the node itself, then the right child last. of both trees, the results are the same: L→N→M→P→R.

In a Red-Black Tree, performing a rotation may violate the black height property. In the above special case that N is red and P is black (or N and P are of the same color), a rotation in a Red-Black can preserve the black height property if we also swap the colors of the nodes N and P.

Later we'll make extensive use of this Rotate + Color Swap operation, so let's build a few helper functions.

void _link(node *parent, bool dir, node *child) // Connects given parent node to a new child.
{
  parent->child[dir] = child;
  if (child) child->parent = parent;
}

void _rotate_and_color_swap(node *n, bool dir) // Rotates N with its child in the given direction and swaps their colors.
{
  node *p = n->parent, *r = n->child[dir];
  std::swap(n->black, r->black);               // Swap the colors between N and the child.
  if (p) _link(p, (p->child[1] == n), r);      // Re-link parent of N to the rotated child node.
  else                                         // If no parent, N must have been the root of the whole tree, so relabel the root.
  {
    r->parent = 0;
    root = r;
    root->black = true;
  }
  _link(n, dir, r->child[!dir]);               // Relink the children.
  _link(r, !dir, n);
}
Listing 2. Rotate + Color Swap helper function.

The above helper functions operate in a generic fashion to account for if node N might be root, or if a child might be null. This is convenient and nice for generality, but not the sharpest choice for performance. In a later article, we'll actually tear these general purpose helpers down to speed up the code. Nevertheless, for now to get started, these will do more than fine.

Insert

Adding the Red-Black Tree invariants guarantees that the search tree stays balanced. However these guarantees do not come for free, but when modifying the data structure, all modifications must be done in a manner that preserves the invariants that have been declared.

So when performing an insertion, it must be ensured that the red-red and black height invariants are not violated when the insertion has finished. How is that achieved?

Recall that with BSTs, the location where a new key is to be inserted is fixed: the Search algorithm specifies where the new key will have to appear at, or otherwise the search tree invariant would not hold.

When adding a new node, the only choice we seem to have is whether to color the new node red or black.

If we color the new node black, the black height invariant will be broken. However, if we color the new node red, then if the parent of the new node was also red, then that would violate the red-red invariant. Seems we can't win either way?

It turns out that coloring the new node black and violating the black height invariant would be quite a costly endeavour to try to fix. But as luck would have it, there does exist a "fix-up" strategy that allows one to always color the new node red, and then fix the tree so that this newly created red-red violation is corrected.

If we get lucky and the parent node is black to begin with (or if there was no parent node, maybe this was the very fist node to be added to the tree?), then there will be no violation of the red-red invariant. One can call it a day.

So let's look at the scenario where a new red node N has just been added to the tree, and the parent node P is also red. This would look something like below:

We cannot naively recolor the parent node black, or that would cause a height invariant violation. However, we do have to make P black somehow, or the red-red conflict between N and P cannot be resolved.

The winning idea is to push the red color of P further up the tree. In order to do that, we need to examine nodes higher up at the local neighborhood of N.

Let's take a look at the grandparent GP of N, and the other child of GP, i.e. the "uncle" of N:

The grandparent GP must exist (i.e. P cannot be the root of the tree) because we took the convention that the root of the tree must be black, and P is red.. or we wouldn't have a red-red conflict to fix here..

The Uncle node however might not exist, or if it does, it might be either red or black. This is illustrated in a "tri-state" color in the diagram above.

Case 0. Iterating red Uncles up the tree

If the Uncle node exists and is red, then we can perform the color flip operation from before. Change colors of P, U and GP.

Well that was simple, and now the red-red conflict with N and P is gone. The black height invariant is also not disturbed since color flip preserves that invariant. However, there is a potential new problem: the node GP may now have evolved a red-red clash of its own with the grand-grandparent node.

If that happens, we can (or rather, are unlucky to have to) iterate the problem "up the tree". I.e. by relabeling GP to be the new node N in question, we can repeat this red-red clash examination at two levels higher up the tree. At worst, this examination will walk all the way up to the root doing color flips (taking logN time), but if that happens, the red-red conflict must finally end, because we maintain the root node of the tree to be always black.

"Ok, that's great, but hold on a sec.. what happens if Uncle node was black or doesn't exist? Then we cannot color flip!"

Yes, exactly. We have to look at the Uncle node being black as a separate case and do something else.

Case 1. A distant black Uncle

Fortunately, this another scenario is computationally even easier: if the Uncle node is black, or does not exist at all, it effectively means that there is free room on the Uncle's side to "receive red color" conflict-free, so we use the other tool in our toolbox: rotate + color swap:

Here P will be promoted to become the parent of GP, and the two will switch colors. Because Uncle is black or absent, the color swap will not cause a red-red conflict on the Uncle's side of the tree.

Also after this rotation, there is no need to continue iterating up the tree or anything like that. There is no more possibility of red-red conflicts in the tree anywhere.

It might seem like we have now covered all cases (black uncle, red uncle), but this is actually not true. The above diagrams have been drawn in a bit of a sneaky way to hide a particular gotcha: recall that all children in a BST are ordered to be either left or right children.

So the above diagram only shows one scenario, where N is a left child of P, P is a left child of GP, and U is a right child of GP. It could also happen for example that N was a right child of P instead of the left one; or that P was a right child of GP.

For completeness, here are all the four possible cases we could have:

That is, the node N could either be a left or a right child of P. And P could either be a left or a right child of GP.

Case 1 is the one that was handled above. Cases 1 and 4 are mirror images of each other, as are cases 2 and 3.

"Do we have to handle the four cases separately? This is getting really messy and not fun at all!"

Don't give up here. Indeed it would be the case that if one tackled the problem naively head-on, they might end up tediously enumerating all the possible four cases. Unfortunately course literature such as the CLRS book, tend to present the material this way as well, i.e. "we show the left-to-right case here, and coding the symmetric right-to-left case is left as an exercise to the reader." One should NOT approach this challenge that way, as that would be a big mistake. As the Finnish folklore epic Kalevala says (probably): "don't code yourself into the swamp."

To maintain sanity in this situation, one should apply the strategy of the indexed child array, which we already explored in the preliminary Binary Search Tree article.

With this strategy, one uses an array node *child[2]; for the children of each node, which enables one to use a variable, e.g. a bool dir; to indicate direction, left or right. The opposite direction can then be calculated simply with an expression !dir (or 1-dir if using an integer variable).

With this kind of a scheme, one does not need to maintain duplicate mirror copies rotateLeft() and rotateRight() which are so often seen in implementations, but a single elegant function rotate(bool dir) suffices. Don't code yourself into the swamp.

With this in mind, the mirror cases 3 and 4 are handled "for free" by flipping the dir variable, and case 1 was already handled above. This means that only case 2 needs manual handling.

Case 2. The close black Uncle

In the second case, N is a child of P, and Uncle is a child of GP, both at the same side of their parents respectively. I.e. Uncle is closer to N as opposed to being farther away from N as in the configuration 1 from before.

Handling this case is straightforward: we apply the same rotation toolWell what'd you expect? We only know how to do rotations and color flips. as we have before, by rotating N on top of P, so that N becomes the parent of P.

The result of this rotation is shown above. This did not fix the red-red conflict, but it did something else. This is easier to see by relabeling N as P, and P as N. The problem has been reduced back to the "distant black Uncle" case 1 that was already solved above. Neat!

Summary

The cookbook recipe for inserting a new node to a Red-Black Tree is as follows:

  1. Search the tree for the leaf where the new node N needs to be added, and add it with a red color.
  2. While N and N.parent have a red-red clash:
    1. If N has a red Uncle, color flip Parent, Uncle and GrandParent. Take GP to be the new N and repeat.
    2. If N has a close black Uncle, rotate N with P to make it a distant black Uncle, and continue to next step.
    3. If N has a distant black Uncle, rotate+colorswap P with GP. The red-red clash is now resolved.

Take care to be smart with the code implementation, to avoid the code from degenerating into tedious mirrored case-by-case work duplication. An implementation that avoids this pitfall is presented below.

void insert(const K &key, const V &value)
{
  uint32_t c;
  node **e = &root, *p = 0;
  for(node *n = root; n; p = n, e = n->child + (c>>31), n = *e)    // Search the tree for the spot to insert new key.
    if (!(c = (uint32_t)trinary_cmp(key, n->key)))
    {
      n->value = value;                                            // Replace previous value if key already existed.
      return;
    }
  node *n = *e = new node(key, value, p, 0, 0);                    // New node always starts as red.

  for(node *p = n->parent, *gp, *u; p; n = gp, p = n->parent)      // Fix red-red conflict after inserting a node.
  {
    if (p->black) return;                                          // No red-red clash left?
    gp = p->parent;                                                // Find Grandparent, which must exist since P was red.
    bool ndir = (p->child[1] == n), pdir = (gp->child[1] == p);    // Which direction do N and P lean from their parents?
    if (!(u = gp->child[!pdir]) || u->black)                       // No uncle, or a black uncle?
    {
      if (ndir != pdir) _rotate_and_color_swap(p, ndir), p = n;    // Case 2: Make close uncle a distant uncle
      _rotate_and_color_swap(gp, pdir);                            // Case 1: Handle distant uncle
      return;                                                      // All done, no need to further iterate up the tree
    }
    gp->black = false;                                             // Case 0: Color flip between U, P and GP.
    u->black = p->black = true;
  }
  root->black = true;                                              // If we end up here, we iterated all the way up to the root. Recolor root black.
}
Listing 3. Inserting to a Red-Black tree.

The algorithm can be packed in a quite compact 28 lines of code. See how handling of case 2 is kind of like a "normalization" step before handling the actual case 1, instead of being a code duplicating if ... else ... structure, and also how handling the mirror cases comes for free with no extra code lines overhead. Keep these strategies close in mind, we'll lean on them extensively later.

Delete

Removing a node from a Red-Black Tree is a bit more complicated. The main difficulty here is that if a black node is to be deleted, then the black height invariant of the tree is disturbed. Fixing that disturbance will call for a bit of sophistication.

Just like with removing a node from a BST, we can look at the number of children that the to-be-deleted node N has:

  1. Two children: then just like with BSTs, we can swap the node with its in-order predecessor or successor, which must have at most one child. And then remove the predecessor/successor. When swapping nodes like this, remember to also swap their colors.
  2. One child: then removing it is as easy as with BSTs. Splice N off, and reattach the child to where N was, and recolor it black. This way the black height of the tree is not disrupted.
  3. Zero children, and the color of N is red: In that case, we can splice the node off without any disruption.
  4. Zero children, and the color of N is black: Convert the color of N to red. This will cause the black height of the subree rooted at N to become deficient by one. Run a special helper utility increase_height(N) to increase the height of subtree rooted at N by one in order to balance for this effect. After this, we can splice the red leaf off the tree.

The most difficult case 4 to remove a node, is illustrated in the following picture. Imagine removing any of the leaf keys, e.g. key 12 from this tree.

Program code to perform deletion in a Red-Black Tree is shown below.

void remove(const K &key)
{
  uint32_t c;
  node **e = &root, *s;
  for(node *n = root; n; e = n->child + (c>>31), n = *e)
    if (!(c = (uint32_t)trinary_cmp(key, n->key)))
    {
      if (n->child[0] && n->child[1])                             // Two children? Then swap this node with its successor.
      {
        for(s = n->child[1]; s->child[0]; s = s->child[0]) ;      // Find successor to remove
        n->value = s->value;                                      // Replace value with that of the successor
        n = s;                                                    // And delete successor node instead
        e = &n->parent->child[n->parent->child[1] == n];          // Reset edge link to the new node to delete
      }
      if (n->child[0] || n->child[1])                             // One child? Replace node with that child, and turn the child black.
      {
        *e = n->child[0] ? n->child[0] : n->child[1];             // Get the only child,
        (*e)->parent = n->parent;                                 // root it to where N was
        (*e)->black = true;                                       // and recolor it black.
      }
      else
      {
        if (n->black) _increase_height(n);                        // Increase height of node N by one to enable removal.
        *e = 0;                                                   // Leaf can now be spliced off from the tree as if it were red.
      }
      delete n;
      return;
    }
}
Listing 4. Removing a node from a Red-Black Tree.

This code is not particularly long either, though it is not quite complete: how does one then implement the missing increase_height(n) helper function? Let's take a look at that in the next chapter.

Increase_height(N)

The functionality of removing a node depends on the viability of implementing this increase_height(N) helper. What this function must do is to increase the black height of node N by one, or alternatively, reduce the black height of every sibling node on the path from root to N. One can think that the black height of N is deficient by one, and this function seeks to remedy that imbalance.

Just like in the previous Insert case, what we will want to do is carefully analyze the local neighborhood of node N for what kind of colors the neighboring nodes have. In the insertion case, we were hoping to find black neighbors that would make room for the new red node. However, in the case of deletion, we are hoping for the opposite: to find red nodes in the local neighborhood so that we can restructure that red color in a way that will balance out the height imbalance and increase the black height of N.

The reason that we are looking at the local neighborhood of N is that it is just a constant number of nodes, so can be accessed in O(1) time. If we get unlucky and it happens that the local neighborhood configuration is not suitable to help fix the black height imbalance, we will revert to look up at the path from N all the way up to the root, which is a set of size O(logN). In either case, we will at least be able to avoid needing to look at the whole tree (which would be of size O(n)), which is important.

(Sidenote: one might dream to come up with an increase_height(N) algorithm that would only ever need to look at O(1) number of neighboring nodes to fix this height imbalance, but sadly it is highly likely not possible)

One of the challenges in implementing this function is that the local neighborhood that must be analyzed is a bit larger compared to what we had with the Insert operation, and hence the number of possible color combinations the neighboring nodes can have is a bit higher. However, by carefully tackling the possible scenarios in smartly selected case-by-case fashion, the possible color configurations are systematically exhausted.

Just like before, the developer should be careful to avoid being trapped to tediously hand-writing duplicate mirrored implementations. That way lies madness. Instead, a generic convention of using a direction variable that corresponds to the direction of node N with respect to its parent P is advisable. This way the mirror configurations vanish automatically.

To start off, observe that the node N may or may not have a parent node P. If there is no parent P, N must have been the root of the tree. And therefore there is no problem to solve in the first place. The black height of the root does not have to compete against any other sibling node. So if N.parent does not exist, we can let the function be a no-op and just return.

Therefore we can assume that node P must exist. With the help of our direction normalization variable, we can assume to draw N as if it were the left child of P (i.e. what is "left" is dictated by the direction variable). And because P exists and N is black (remember, we would not be in this subroutine in the first place if N was red), then it must follow that P has two childrenOtherwise black height of P.left would be >= 1 and black height of P.right would be zero. Let's call the other child of P "the sibling S".

Further, in this analysis, we are going to want to take a look at the children of this sibling S. Let's call these the "close nephew C" and the "distant nephew D". The nodes C and D may or may not exist.

This is starting to look like quite the family tree. To visualize, the neighborhood configuration space of node N, its parent P, sibling S, close nephew C and distant nephew D that we are interested in looks as follows:

N is the node that is (or is about to be after we delete it) deficient in black height by one, which we want to remedy by increasing its height by one. P is the parent of N, and without loss of generality (due the use of our direction variable), we can assume that N is the left child of P. Node S is the sibling of N, possibly either red or black. Neither P or S can be absent. C and D are the Close and Distant nephews of arbitrary color that may or may not be present.

The name of the game will be to try to find red color in this neighborhood, so that we could then suitably rotate or recolor the nodes in this neighborhood. As the end result, the black height of node N should grow by one (or height of everything else reduce by one), to make the black height invariant balanced.

Naively counting... because due to the red-red invariant many of the configurations could not actually exist., there are two state possibilities for P, two for S, three for C and D each. And due to the case analysis you will see below, also the nodes C.left and C.right will play a role. And then times two for mirroring. This makes 648 configurations total. This would be a ridiculous amount of combinations to have to address on a naive case-by-case basis. Getting smart is key here.

We will reduce the state space one assumption at a time in a way that gradually collapses the possible states to fewer remaining configurations. Almost beggaring belief, this can actually be achieved by examining only five(!) cases, where processing flows in order from case 1→2→3→4→5. After each case, we will have systematically reduced the number of possibilities that the local neighborhood configuration of N might be in, simplifying the remaining work. Let's get started.

Case 1: Iterating to find red in the neighborhood

When deleting, finding red colored nodes will be our best friend (because red colored nodes allows us to shuffle black height imbalance around left and right via rotations). So let's first tackle the unluckiest worst case scenario possible: all nodes in the neighborhood of N are black. Yikes, we don't have any red color structure to work with.

Well, if this happens, we can at least spontaneously color S red without fear of introducing a red-red clash around it. This has the effect of reducing the black height of subtree S by one, making it level with the black height of N (that was deficient by one to begin with).

This means that subtree P will now be happy with the black height invariant, as its left and right subtrees are now balanced. However, from P's parent (the "GrandParent") perspective, P is now deficient by one black level with respect to its sibling (the uncle of N).

So in this case, just like before with insertion, we can (or if you're a "glass is half empty" kind of fella, are unlucky to have to) iterate the problem up the tree. Relabel NPRecursively thinking, this would be a recursive tail call to insert_height(N.parent) and take another look of the colors in the neighborhood one level higher up in the tree. This process might iterate all the way up the tree to its root, in which case there is nothing more to do and increase_height(N) returns without proceeding to case 2. If this happens, step 1 will have been iterated at most a logarithmic time.

If the opposite happens: that is, one of P, S, C or D are red, then we fall through to process the next case 2 below.

It is worth nothing that the work for the remaining cases 2-5 all take only a constant O(1) time. It is only this "hunt for red nodes in the neighborhood of N" case 1 that will possibly take up a log(n) time.

Case 2: Red sibling

The above case 1 handled the situation when the whole local neighborhood of N is black. So after case 1 completes, if/when we fall through here, we can now assume that there must exist red color at least in one node somewhere in the neighboring nodes.

Question: what if the sibling S was red?

Well, if that is found to be the case, then because N has a black height > 0, then both C and D must be present, and each of P, C and D must be black.

In that case, let's try to do a left rotation + color swap operation to bring S on top of P.

The rotate + color swap does not affect the red-red or black height invariants, so essentially "did nothing".

However, after finishing this operation and relabelingrelabeling means: since we have shuffled nodes around the node N in question, we will re-initialize/recalculate what the new nodes P, S, C and D are with respect to N after this rotation took place. our neighboring nodes (SC, CS.left, DS.right), we see that we have shuffled the nodes around to find that our neighborhood configuration is now colored like one of the remaining cases we haven't yet handled (as the parent P is now red and S is black).

In other words, after this rotation, we haven't gone back to case 1, and we cannot be in case 2 anymore either.

This is great, since it means we have effectively collapsed (or say, normalized, or reduced?) the state space we need to handle into fewer remaining possibilities. So we continue execution to case 3 next.

Case 3: Only the parent is red

At this point,

So let's next check if both C and D are either black or absent. If that turns out to be true, then this means that the only remaining spot that we can have red (since we must have red somewhere as per being post case 1) will be the parent P.

Well, if that is the case that P is red and C and D are black or absent, we can do the simplest transformation possible and bring the red of P down to S. That is, no rotations this time, just swap colors of P and S. This will have the net effect of increasing the black height of N, which this routine increase_height(N) is after in the first place.

After this transformation, we are fully finished, as the black height invariant now holds. No need to proceed further to cases 4 or 5, so return;

However, if we checked for C and D to be black or absent, and they weren't, then we must continue onwards to executing case 4.

Case 4: Distant nephew is black

Let's recap so far:

Well, let's next check if the Distant nephew is still black (or absent). This would imply that the Close nephew must be red.

In this scenario, we can do the familiar rotate right + color swap operation to rotate Close nephew on top of the Sibling, like this:

After this transformation, the new Distant nephew (shown in above picture as S) will now be red, which is exactly the last remaining case that we have yet to cover.

This means that after all of these cases 1,2,3 and 4, we are now reduced to this one last possible case, so continue to executing case 5.

Case 5: Distant nephew must be red

Let's do one more final recap:

From all of this we can conclude that when we reach here, the Distant nephew must indeed now be red.

The final transformation left to do is to rotate the sibling S on top of parent P, color swap the two, and then turn the Distant nephew black.

This has the effect of pushing one more level of black height over to the side of node N, while keeping the black height of the other nodes intact. Victory at last!

Summary

The process of increasing the black height of a node N was a bit more involved compared to the case-by-case checks that implementing Insert needed. This involved searching the local neighborhood of N for a nearby red node, and then suitably rotating and flipping the colors of these nearby nodes so that the black height of N was increased with respect to its sibling node.

By meticulously exhausting all possibilities, the possible color combinations in the neighborhood of N were reduced into a scenario where the sibling S must be black, and the Distant nephew red, which enabled a suitable rotation to be carried out to increase the black height of N itself.

The full program code for increase_height(n) is shown below.

void _increase_height(node *n) // Rebalances and recolors the tree to add +1 to black height of N.
{
  for(node *p = n->parent; p; n = p, p = n->parent)
  {
    bool dir = (p->child[1] == n);                                     // Establish normalized direction variable to orient which way is left.
    node *s = p->child[!dir], *c = s->child[dir], *d = s->child[!dir]; // Find neighborhood
    if (p->black && s->black && (!c || c->black) && (!d || d->black))  // Case 1: All neighboring nodes P, S, C and D are black?
    {
      s->black = false;
      continue; // Iterate the problem up the tree.
    }
    if (!s->black)                                                     // Case 2: Sibling is red
    {
      _rotate_and_color_swap(p, !dir);
      s = c; c = s->child[dir]; d = s->child[!dir];                    // Relabel
    }
    if ((!c || c->black) && (!d || d->black))                          // Case 3: Close and Distant nephews are black
    {
      p->black = true;
      s->black = false;
      return;
    }
    if (!d || d->black)                                                // Case 4: Distant nephew is black, so Close nephew must be red
    {
      _rotate_and_color_swap(s, dir);
      s = c; c = s->child[dir]; d = s->child[!dir];                    // Relabel
    }
    _rotate_and_color_swap(p, !dir);                                   // Case 5: Distant nephew must be red
    d->black = true;
    return;
  }
}
Listing 5. The increase_height(N) helper function.

What makes Red-Black Trees such a fantastic choice is that even though there were a number of cases to consider, the actual program code to handle these various cases is pleasingly short, only 32 lines (+ 14 lines for rotate_and_color_swap). Each of the cases flow linearly into handling the next one, except the first case 1 which is possibly iterated multiple times until leading to case 2, and case 3 that early-outs the routine. That is, the case-by-case flow can be visualized as follows:

It is this cleverly designed state normalization/reduction logic that powers the performance of dynamic Red-Black Tree node deletion. Consider, if for example the transformation in case 4 might result in the state configuration that was examined before in case 2 being possible again, none of this would work. This linearity of 1→2→3→4→5 is powerful.

Source Code

The following code block assembles together the code presented above for Red-Black Tree Search, Insert and Delete operations. The code omits C++ boilerplate (copy-ctors etc.) for clarity of presentation. Please visit GitHub(TODOLINK) for the complete library.
#pragma once

#include <stdint.h>
#include <utility>

template<typename K, typename V> struct rbtree
{
  rbtree():root(0){}
  ~rbtree() { _clear(root); }

  struct node
  {
    node(const K &k, const V &v, node *p, node *c0, node *c1):parent(p),key(k),value(v),black(false) { child[0] = c0; child[1] = c1; }
    node *child[2], *parent;
    bool black;
    K key;
    V value;
  } *root;

  V *search(const K &key)                                           // Searches if a given key exists. Identical to BST search.
  {
    uint32_t c;
    for(node *n = root; n; n = n->child[c>>31])
      if (!(c = (uint32_t)trinary_cmp(key, n->key)))
        return &n->value;
    return 0;
  }

  void _link(node *parent, bool dir, node *child)                   // Links given nodes to have a parent <-> child relationship.
  {
    parent->child[dir] = child;
    if (child) child->parent = parent;
  }

  void _rotate_and_color_swap(node *n, bool dir)                    // Rotates N with its child in the given direction and swaps their colors.
  {
    node *p = n->parent, *r = n->child[dir];
    std::swap(n->black, r->black);                                  // Swap the colors between N and the child.
    if (p) _link(p, (p->child[1] == n), r);                         // Re-link parent of N to the rotated child node.
    else                                                            // If no parent, N must have been the root of the whole tree, so relabel the root.
    {
      r->parent = 0;
      root = r;
      root->black = true;
    }
    _link(n, dir, r->child[!dir]);                                  // Relink the children.
    _link(r, !dir, n);
  }

  void insert(const K &key, const V &value)
  {
    uint32_t c;
    node **e = &root, *p = 0;
    for(node *n = root; n; p = n, e = n->child + (c>>31), n = *e)   // Search the tree for the spot to insert new key.
      if (!(c = (uint32_t)trinary_cmp(key, n->key)))
      {
        n->value = value;                                           // Replace previous value if key already existed.
        return;
      }
    node *n = *e = new node(key, value, p, 0, 0);                   // New node starts always as red.

    for(node *p = n->parent, *gp, *u; p; n = gp, p = n->parent)     // Fix red-red conflict after inserting a node.
    {
      if (p->black) return;                                         // No red-red clash left?
      gp = p->parent;                                               // Find Grandparent, which must exist since P was red.
      bool ndir = (p->child[1] == n), pdir = (gp->child[1] == p);   // Which direction do N and P lean from their parents?
      if (!(u = gp->child[!pdir]) || u->black)                      // No uncle, or a black uncle?
      {
        if (ndir != pdir) _rotate_and_color_swap(p, ndir), p = n;   // Case 2: Make close uncle a distant uncle
        _rotate_and_color_swap(gp, pdir);                           // Case 1: Handle distant uncle
        return;                                                     // All done, no need to further iterate up the tree
      }
      gp->black = false;                                            // Case 0: Color flip between U, P and GP.
      u->black = p->black = true;
    }
    root->black = true;                                             // If we end up here, we iterated all the way up to the root. Recolor root black.
  }

  void remove(const K &key)                                         // Deletes a given key from the tree.
  {
    uint32_t c;
    node **e = &root, *s;
    for(node *n = root; n; e = n->child + (c>>31), n = *e)
      if (!(c = (uint32_t)trinary_cmp(key, n->key)))
      {
        if (n->child[0] && n->child[1])                             // Two children? Then swap this node with its successor.
        {
          for(s = n->child[1]; s->child[0]; s = s->child[0]) ;      // Find successor to remove
          n->value = s->value;                                      // Replace value with that of the successor
          n = s;                                                    // And delete successor node instead
          e = &n->parent->child[n->parent->child[1] == n];          // Reset edge link to the new node to delete
        }
        if (n->child[0] || n->child[1])                             // One child? Replace node with that child, and turn the child black.
        {
          *e = n->child[0] ? n->child[0] : n->child[1];             // Get the only child,
          (*e)->parent = n->parent;                                 // root it to where N was
          (*e)->black = true;                                       // and recolor it black.
        }
        else
        {
          if (n->black) _increase_height(n);                        // Increase height of node N by one to enable removal.
          *e = 0;                                                   // Leaf can now be spliced off from the tree as if it were red.
        }
        delete n;
        return;
      }
  }

  void _increase_height(node *n)                                         // Rebalances and recolors the tree to add +1 to black height of N.
  {
    for(node *p = n->parent; p; n = p, p = n->parent)
    {
      bool dir = (p->child[1] == n);                                     // Establish normalized direction variable to orient which way is left.
      node *s = p->child[!dir], *c = s->child[dir], *d = s->child[!dir]; // Find neighborhood
      if (p->black && s->black && (!c || c->black) && (!d || d->black))  // Case 1: All neighboring nodes P, S, C and D are black?
      {
        s->black = false;
        continue; // Iterate the problem up the tree.
      }
      if (!s->black)                                                     // Case 2: Sibling is red
      {
        _rotate_and_color_swap(p, !dir);
        s = c; c = s->child[dir]; d = s->child[!dir];                    // Relabel
      }
      if ((!c || c->black) && (!d || d->black))                          // Case 3: Close and Distant nephews are black
      {
        p->black = true;
        s->black = false;
        return;
      }
      if (!d || d->black)                                                // Case 4: Distant nephew is black, so Close nephew must be red
      {
        _rotate_and_color_swap(s, dir);
        s = c; c = s->child[dir]; d = s->child[!dir];                    // Relabel
      }
      _rotate_and_color_swap(p, !dir);                                   // Case 5: Distant nephew must be red
      d->black = true;
      return;
    }
  }

  void clear() { _clear(root); root = 0; }
  void _clear(node *n) { if (n) { _clear(n->child[0]); _clear(n->child[1]); delete n; }}
};
Listing 6. rbtree.h: Full code example.

The above code is fairly lean and gives decent performance. Nevertheless, there are yet several ways to improve it to run much faster. Later we will take a look at how to optimize this code further.

Conclusion

In this second part of binary search trees (see part 1 here), an implementation of a Red-Black Tree was presented. This is an elegant data structure that ensures that the search tree stays balanced across insertions and deletions.

The complexity in implementing Red-Black Trees comes from understanding how nodes are added and removed so that the needed color and height balance invariants are preserved. As was illustrated above, this is possible with a careful examination of the local neighborhood of the modified node.

The final implementation looks like follows:

for a total of 117 lines of code.

In the next and last part of this series, we will look at how to turbocharge the performance of the code that was presented here, in order to make our tree run as fast as possible. Until then!

Appendix: About Red-Black Trees elsewhere

This section provides a few closing comments about Red-Black Tree material out there.

The "CLRS" book

Cormen, Leiserson, Rivest, Stein is the de facto source for everything about algorithms. The wealth of information there is invaluable, and the comprehensive survey to publication references makes one appreciate that the book is in general written for both the student and the researcher alike.

On the scope of Red-Black Trees however, the material that they present is maybe not the best. For some reason, I felt that the CLRS presentation veers a bit too much into "technical plumbing" aspects of the algorithm, covering pedantic details about nils and sentinels, and other unnecessary technical dressing like "doubly black" nodes, and invites the already discussed duplication pitfall of left/right mirroring. CLRS unfortunately makes Red-Black Tree analysis look complex.

I think that the best presentation by far on Red-Black Trees is this old 2002 video by Shai Simonson from the now defunct Ars Digita university. The whole lecture series is definitely worth the watch, as Shai Simonson is an excellent and motivating lecturer.

Left-Leaning Red-Black Trees

In 2008, Robert Sedgewick, co-author of the original Red-Black Tree algorithm, published a paper on a new Left-Leaning Red-Black Tree variant of the algorithm. They state:

"In this paper, we describe a new variant of red black trees that meets many of the original design goals and leads to substantially simpler code for insert/delete, less than one-fourth as much code as in implementations in common use."

The author's university courses repeat this takeaway in their lectures. For example, in Princeton course material they have taught:

Unfortunately, I feel that this has been a great misstep and the LLRB paper misrepresents efficient software engineering. The reality rather looks that LLRBs gain little, unless one has implemented a Red-Black Tree inefficiently to begin with.

The core problem with LLRBs is that the asymmetric nature of the new left-leaning property invites a developer down the wrong path to "cement" bad structural decisions into their code. In other words, LLRB implements the wrong kind of optimization.

The proper optimization is to exploit this symmetry by using the symmetric left/right mirroring via the "indexed child array and a direction variable" trick like has been highlighted in the code on this webpage. (This idea is not new, as e.g. the Wikipedia page on RBTs also describes the use of a direction variable)

Sedgewick advertises the benefits of Left-Leaning Red-Black Trees by stating that the Insert algorithm in a RBT is "150 lines by experienced professional programmers" and only "30 lines for LLRB." I am not sure where they got those experienced professional programmers from, but a more performant Insert implementation was presented on this web page in just 28 lines of code, even fewer than their LLRB variantI get how silly comparing LoC as an efficiency measurement is, but, hey, they started it..

Similarly, Sedgewick mentions a full implementation of the Delete code for RBTs would be 400+ lines of code with "case bloat", and with LLRBs only <80 lines.

However, from what we just wrote on this webpage, RBT Delete implementation took up just 29 + 32 lines = 61 lines of codeI am excluding the two general purpose rotate and link helpers that were 20 lines total, since they also excluded helper functions on the slides, even while stating 'you just saw all the code'., shorter than the asymmetric handling of left and right cases that LLRBs have. Further, since maintaining the Left-Leaning property requires more rotations, the runtime performance of LLRBs will be worse than that of traditional RBTs.

The conclusion that I find is that Left-Leaning Red-Black Trees are best to be sidestepped as an idea that did not pan out. Another researcher Eddie Kohler came to the very same conclusion in their writeup.

Maybe the morale of the story here is that efficient software engineering to real-world x86-64 and ARM architectures should be held in equal importance to the computer science stuff. If that was the case, maybe the common knowledge of the "direction variable trick" would have obviated the need to search for LLRBs in the first place. Symmetry is powerful, don't give it up!

The 2-3-4 tree analogy

Some articles covering Red-Black Trees do so by stating how simple and obvious understanding RBTs will become by thinking in terms of the isometry with 2-3-4 trees. Pedagogically, this is something I don't agree is a good idea. Since for a first-time reader, they will then be met with the task to think in terms of two new data structures, which gives more confusion than it helps. Rather, I think it is better to either directly teach Red-Black Trees without 2-3-4 trees, or do the opposite and teach 2-3-4 trees without Red-Black Trees, and only then move on to teaching RBTs after 2-3-4 trees are familiar.

Simultaneously providing exposition to both has an effect of making students confused, and it loses the effective train of thought.

For the interested, there exists a very visual introduction to 2-3-4 trees (B-trees of order 4) on YouTube.

AVL Trees

When AVL Trees get mentioned in the context of Red-Black Trees, they are sometimes stated with the remark of "an older data structure that fell out of fashion." After reading Ben Pfaff's research paper, I do not feel so sure whether AVL Trees should be written off like this. One thing is clear: these data structures should *not* be compared in terms of "how many tree rotations do each make" or "how many key comparisons do each make", as such analyses gain little new information into the modern day software engineering field. Concrete written code with concrete data sets, like Ben Pfaff has done, is the industry standard. In Ben's paper, AVL Trees trade blows well with Red-Black Trees. The tighter balanced trees are also appealing. This all makes one curious to investigate.