Red-black tree notes

Back to Red-black tree

This section summarizes the main decisions that have been made in the implementation of the red-black tree.

Requirements

Techniques

Symmetry

There is redundancy in the implementation of red-black trees, since for many cases there is a similar mirrored case, which is solved exactly the same as the non-mirrored case, except that left children are replaced with right children and vice versa. This symmetry can be used in advantage to cut off the code in the insertion and erase implementations into half, by using the relative term sibling, rather than the absolute terms left and right.

To take advantage of symmetry, one must make sure that the sibling node can be retrieved efficiently. Directly related to this, the children can be stored either separately (left and right), or in a two-element array. The latter has the unexpected advantage of offering efficient access to the sibling node, since the index of a child i can be denoted by an integer 0 (left) or 1 (right), and its logical negation !i gives the sibling index. I credit this technique to Julienne Walker in her website named Eternally Confuzzled, on her tutorial on red-black tree implementation. Compare this logical negation with needing to compare which child node you have in hand, and then choosing the other one.

Left-leaning red-black trees

Robert Sedgewick has written a paper on left-leaning red-black trees, where no right child can be red. He shows that, using a top-down implementation, both removal and insertion can then be implemented in a few lines of code (compared to a traditional implementation). Unfortunately, in that implementation the removal code needs to do use key-comparisons to reach down from the root, even if we already knew the node to remove (as is usually the case when iterators are offered). However, we require from our implementation that key-comparisons must not be used in removal. Because top-down algorithms can’t be implemented without key-comparisons, it follows that our implementation must use a bottom-up algorithm. A bottom-up removal algorithm for left-leaning red-black trees, which I regrettedly have once implemented, are even more worksome to implement than ordinary red-black trees, since the left-leaning requirement breaks the symmetry of the algorithms. Thus, although the left-leaning property seems tempting at first, we deliberately do not require it.

Sentinel node

The sentinel node plays an essential role in the RedBlackTree implementation. The idea is that every parent link and child link points to some valid node, where the sentinel node is interpreted as an empty node. Thus, for example, if a node does not have children, then the child pointers are assigned the sentinel node. This makes the implementation much more easier, since then we can stop worrying about boundary cases, and whether we are accessing a null-pointer or not. In addition to this, the sentinel node works as the one-past-end iterator, when the tree is interpreted as an ordered sequence. When the parent of the sentinel node points to the maximum node, one is able to naturally get to the precedessor (the maximum node) from the one-past-end iterator. The color of the sentinel node is black, and it does not have any children. The children of the sentinel node point to itself, a unique property which can be used to identify a sentinel node. Using the sentinel node is a standard trick, but its connection to the one-past-end iterator is rarely mentioned.

Logarithmic join

Two red-black trees and can be joined together in time, where and are the number of elements in the larger and the smaller tree, respectively. The cases where either tree is empty is trivial, so we will now assume neither is empty. Without loss of generality, we may assume does not have a smaller black-height than . Let be the minimum (maximum) node of , if ().

The join can be achieved as follows:

Logarithmic split

A red-black tree can be split in two along a given element in logarithmic time, using the logarithmic join as a basic building block. However, it is easy to end up with an algorithm which is instead. To avoid this, the propagation updates must be shared between the joins, so that redundant propagation updates are avoided. The key is to amortize the propagation costs, during the split algorithm, by marking the paths from the join-roots to the root as invalid, and then updating those paths, at the same, at the end of the split algorithm. The nodes in the join algorithm must be tracked incrementally by the split algorithm, and not searched all over again. These nodes are all contained in the left and right spines of the trees. To do this, the tree needs to track its black-height, which is easy since it can only change in the rebalancing algorithms.

Starting a search from the minimum node

In a complete binary tree of height , the expected number of moves in a search starting from the minimum (or maximum) node is

This has an exact solution of

Dividing this by gives the ratio of moves compared to the search starting from the root. The first numbers in this sequence, starting from , are

1.0000    1.2500    1.4167    1.5313    1.6125    1.6719    1.7165    1.7510    1.7782    1.8002 
1.8183    1.8334    1.8462    1.8572    1.8667    1.8750    1.8824    1.8889    1.8947

This sequence is increasing, and its limit is 2. Therefore a good rule of thumb is that the search starting from the minimum node takes roughly twice the time the search takes from the root node.

Filtered searches in red-black trees

We wish to support filtered searches. Here each element is marked or not, and the search is only over the marked elements. For an efficient implementation, a filter needs to be specify

Provide both filters as member functions of an object

The filters are provided as member functions element() and downSet() of an object. The object is constructed by a helper function which takes as arguments both of the filters.

Provide both filters separately

We reject this option because it clutters the interface. In addition, it does not enforce the concept that the filter need to be consistent, and provided together.

Use a three-valued filter

That is, use a 2-bit value, where the first bit tells whether a subtree is marked, and the second bit tells whether the root of the subtree is marked. Since a marked element implies a marked subtree, only the values 0, 1 and 3 are possible.

(((uinteger)node.data().marked) << 1) + (uinteger)node.propagation().marked;

Count the number of marked elements

The subtree-filter can be specified by counting the number of marked elements in a subtree. The element-filter can be recovered from this subtree-filter by whether the count is different compared to the count of both children.

We reject this option because it is not general enough; propagating the presence of a marked element in a subtree by a bit is enough information to perform a filtered search, but the abstraction does not support it.

Node types

The nodes of the red-black tree form the following inheritance hierarchy:

Node
  ^
  |
Propagation_Node --> Propagation_Class 
  ^    ^
  |    |
  |  Sentinel_Node --> SentinelData_Class
  |
Data_Node --> Data_Class 
  ^    ^
  |    |
  |  Sentinel_Node --> SentinelData_Class
  |
Key_Node --> Key_Class

The Node contains the information common to all nodes, which means the children and the parent of the node, the color of the node, and the size of the subtree under that node (for efficient splits). The other node-types are derived from the Node to add additional user-defined data. Inheritance is used for two reasons. First, this way the data-types make use of the empty base-class optimization, in case some data is not used. Second, the sentinel data is not used for a data node, and user data and a key is not used for a sentinel node, which again saves memory. Each allocated node is either of the type Sentinel_Node or Node. The user may also specify the sentinel node to contain user-data, in which case the base class of Sentinel_Node is the Data_Node.

An iterator refers to a node via Node*. Unfortunately, this means that the associated data cannot be directly inspected through an iterator.