Red-black tree

Back to Data structures

A red-black tree is a self-balancing binary search tree. Among other things, it implements a dynamic ordered dictionary, provided the elements can be put in a strict weak order. Using propagation data it can solve several other interesting problems, such as finding any quantile in logarithmic time, while being a completely dynamic set.

Definition

A red-black tree is a binary search tree where

These properties guarantee that the height of the tree is , where is the number of elements stored in the tree. The black-height of a node is the number of black nodes on a downward path from that node to a sentinel node. This is well-defined by the last property of the red-black tree. The black-height of a tree is the black-height of the root of the tree.

Color of the root

It is common to require, in addition, that the root be black; this is also guaranteed by Pastel. However, as far as the theory goes, we allow roots of both colors, so that in a red-black tree every subtree under a given node is a red-black tree. This results in a simpler theory of red-black trees.

Types

Generic data structure
RedBlackTree
Stable settings-aliases
`RedBlackTree_Set, RedBlackTree_MultiSet.

Properties

Let be the number of stored elements, and let be the time it takes to run the propagation function for a single node (usually ). The red-black tree implementation in Pastel has the following complexities:

Operation / Property | Complexity | Amortized complexity ---------------------------------------| --------------------------------------------|------------------------------ Insert/remove an element. |  | Join trees. | | Split tree. | | Find an element. |  | Find the next smaller/greater element. |  | Find an alpha-quantile. | | Find the number of equivalent elements.| | Find the minimum/maximum element. | | Find the root node. | | Find the black-height of the tree. | | Find the size of a subtree. | | Move an iterator in the tree. | | Decrement/increment an iterator. | | Monotonic traversal: Space |  |

In the above, and refer to the number of elements in the larger tree and the smaller tree, respectively. The iterator increment is amortized only over the repetitions of that operation; a sequence of increment and decrement operations has amortized complexity. Similarly for the iterator decrement.

Node data

Each node stores the tree-structure and a propagation. Each normal node additionally stores a user-data, and a key. Each sentinel node additional stores a sentinel data, and a user-data if the user so specifies. The types of these objects are defined by the user.

Key

A key is used to arrange the elements in increasing order with respect to the given strict weak order. The key can not be modified after insertion. Defining the type of the key as void avoids allocating any memory for it. The default order is then given by the less-than operator of the Class<void>, under which every element is equivalent to each other. Thus, for example, a search for a given key always returns the first element of the tree, because that is the first element equivalent to the searched key. Having equivalent elements does not degrade the performance of the tree in any way; the tree still retains its logarithmic height, and has its propagation information correctly updated. In addition, the elements can be freely spliced into new positions, since any hinted position is a correct position. The order need not be trivial even when the the key is void; for example, one can define an order by comparing the addresses of the keys, which are the same as the addresses of the nodes. A nullptr can be passed in place of a key if the key is of type void.

User data

A user data can be modified freely; the tree does not enforce any invariants on it. Defining the type of the user data as void avoids allocating any memory for it. The user may specify the user-data to be available also in the sentinel node.

Propagation

A propagation is the evaluation of a user-defined propagation function at a given node, with the guarantee that the propagation has already been computed for all the nodes in the left and the right subtree of the node. A propagation can only be modified through the propagation function. After each operation, the red-black tree guarantees that the propagation data is up-to-date. The propagation data is also present in the sentinel nodes, because this gets rid of the boundary cases in the propagation function. Definining the type of the propagation data as void avoids allocating any memory for it.

Sentinel data

A sentinel data can be modified freely. It is like user data, except that it is only available on the sentinel nodes.

Color

The color of a node is either red or black. It can be read, but not modified.

Size

The size of a node is the number of nodes in the subtree rooted at that node. It can only be modified by the tree operations, and is an internal propagation. The size of a subtree is needed when splitting red-black trees to track the sizes of the resulting trees in constant time. In addition, the size information can be used to compute any alpha-quantile in logarithmic time.

Propagation functions

A propagation function is used to compute the propagation at a node, subject to knowing that the propagation function has already been computed for all the nodes in the left and the right subtree of the node. The time-complexity of the propagation function acts multiplicatively on the complexity of the insert() and erase() functions of the red-black tree. For example, if the propagation function takes time, then the time-complexity of insert() will be . In most cases, as in the examples below, the propagation function takes only time, in which case it does not affect the time complexities at all.

Examples

Name Propagation type Propagation function Sentinel propagation
Number of elements in a subtree integer node.left().propagation() + 1 + node.right().propagation() 0
The sum of keys in a subtree Key node.left().propagation() + node.key() + node.right().propagation() 0
The maximum of keys in a subtree Key std::max(std::max(node.left().propagation(), node.right().propagation()), node.propagation()) -(Key)Infinity()
Black-height of a node integer node.left().propagation() + node.black() 0

The number of elements in a subtree can be used to find any quantile in time. The sum of keys in a subtree can be used to draw a random sample from any finite distribution with n values in time. Computing the black-height is useful for testing the implementation of the red-black tree.

Black-height

The red-black tree tracks the black-height of the tree, and thus can report it in time. This saves time, for example, when joining two red-black trees.

Sentinel nodes

In addition to the elements, the red-black tree contains two additional nodes, the end node and the bottom node. Together they are called sentinels. The end node represents the missing parent of the root, and the missing right child of the maximum node. This is so that the end node works naturally as the one-past-end iterator. The bottom node represents a missing child, with the exception of the right child of the maximum. For a single red-black tree the bottom node and the end node coincide, unless they are split, explicitly or implicitly, by the user. The concept of a sentinel node is important because it removes boundary cases; the child node and the parent node always exist. This simplifies the algorithms that work with the red-black tree.

While the end node is always unique to its tree, the bottom node can be shared between red-black trees. Sharing a bottom node is required for efficiently joining and splitting red-black trees. The whole concept of a separate bottom node is motivated by this need.

Filtered search and traversal

The red-black tree supports filtered searches, which are searches over subsets of elements. A subset is specified in a search function with a down filter, where the partial order is given by the is-a-descendant-of relation between the nodes. The iterator’s next() and prev() also accept a down filter, and can thus be used to iterate over a subset.

Debugging

The red-black tree is subject to the iterator debugging problem, and so a user-data, a propagation, or an end-data cannot be inspected directly through the debugger.

Theory

A tree is a connected graph without cycles. A tree is called rooted if some node, called the root, is chosen specifically to measure distances in the tree. The root induces a partial order for the nodes by the subset relation for the paths from a node to the root. If for nodes A and B it holds that , then B is called a descendant of A and A is called an ancestor of B. If and there is no other node C such that , then B is called a child of A, and A is called the parent of B. If a node does not have any descendants, it is called a leaf node. The root node is the only node which does not have any ancestors. A tree is called n-ary, if every node has at most n children. The binary tree is a synonym for a 2-ary tree.

Let the children of a node in a binary tree be labeled as left and right. Suppose for each node there is associated a key , and that the keys are ordered by the strict weak order . If it holds that , for every node , then the tree is said to be a binary search tree.

Learn more