2-3 trees

Back to Deprecated

The 2-3-tree is a balanced search tree consisting of nodes with two and three children. Because its nodes are not homogeneous, it is rarely implemented in itself. However, at least two data structures, AA-trees and deterministic 1-2-skip-lists, can be thought to simulate 2-3-trees by a homogeneous data-structure. In these cases knowledge of the algorithms for the maintenance of 2-3-trees directly maps to knowledge of the algorithms for the simulating data-structures. In this section we provide the maintenance algorithms for 2-3-trees.

Notation

Let $S$ be a set together with a strict weak order $<$ in $S$ . If $s in S$ and $(a, b) in S^2$ , then we will define $(a, b) < s$ by $a < s$ and $b < s$ .

Definition

A 2-3-tree of height $h$ over $S$ is a rooted tree $T = (V, E)$ , where

$V subset S cup S^2$ ,
if $h = 0$ , then $V = emptyset$ ,
if $v in S cap V$ , then $v$ has two children $l(v), r(v) in V$ , both roots of 2-3-trees of height $h - 1$ over $S$ , such that $w < v$ for all $w in T(l(v))$ , and $w >= v$ for all $w in T(r(v))$ ,
if $v = (a, b) in S^2 cap V$ , then $v$ has three children $l(v), m(v), r(v) in V$ , all roots of 2-3-trees of height $h - 1$ over $S$ , such that $w < a$ for all $w in T(l(v))$ , $a <= w < b$ for all $w in T(m(v))$ , and $w >= b$ for all $w in T(r(v))$ .

Properties

The leaf nodes are all at the same depth in a 2-3-tree; it is always perfectly balanced. If a 2-3-tree contains $n$ elements in $V$ , then the height $h$ of the tree is bounded by

$ceil(log_3(n + 1)) <= h <= ceil(log_2(n + 1)),$

the bounds corresponding to the height of a balanced 3-ary search tree over $V$ , and the height of a balanced 2-ary search tree over $V$ , respectively. Therefore, if one can show that the property of being a 2-3-tree can be efficiently maintained under insertions and removals, then the 2-3-tree offers an efficient solution to the dynamic dictionary problem.

This turns out to be the case. Even better, the algorithms to do so are relatively simple (compared to a red-black tree, say). However, 2-3-trees are not seen to be implemented very commonly. I suspect that the reason for this is that the nodes are not homogeneous. Indeed, there would need to be a way to distinguish 2-nodes from 3-nodes, and then algorithms would need to branch their action based on the number of children the node has. Furthermore, these inconveniences can be avoided by simulating the 2-3-tree with a data-structure which is homogeneous.

Simulation data-structures

A 2-3-tree can be simulated by an AA-tree (for Arne Andersson, their inventor). An AA-tree is a simpler version of the red-black tree; a red-black tree simulates a 2-3-4-tree. Knowing this, the prominence of the red-black trees immediately becomes questionable; why not always prefer AA-trees over red-black trees? I do not know the answer.

A 2-3-tree can also be simulated by a deterministic 1-2-skip-list.

Since both of these data-structures simulate the 2-3-tree, knowledge of the algorithms for its maintenance provides knowledge for the maintenance of the simulation data-structures. Since these algorithms are also simple, knowledge of the 2-3-tree is useful.

Insertion

In the following, the round brackets denote a sub-tree of the same height, and square brackets denote a node.

The leaf node A to which the new element should be inserted can be found by a tree search.
If the parent-sibling neighborhood contains less than 8 elements, then the new element can be inserted into the tree by a reordering of the neighborhood.
If the parent-sibling neighborhood contains exactly 8 elements (there can not be more), then the tree can be reordered to the form

However, this sub-tree has height one more than its sibling, so the tree needs rebalancing. There are two cases, up to sibling symmetry, to consider.

Case 1

This case

         |
        [4]
       /   \
     /       \
   [2]       (5)
  /   \
(1)   (3)

can be rebalanced to

     |
   [2|4]
  /  |  \
(1) (3) (5)

Case 2

This case

           |
         [4|6]
       /   |   \
     /     |     \
   [2]    (5)    (7)
  /   \
(1)   (3)

can be locally rebalanced by

          |
         [4]
       /     \
     /         \
   [2]         [6]
  /   \       /   \
(1)   (3)   (5)   (7)

Since the height is still one too large, the rebalancing must recurse on the top node of the subtree. The recursion stops at the root at the latest.

Removal

If the removed element is not a leaf node, then replace it with its successor or predecessor.
Remove the element from its leaf node.
If the parent-sibling neighborhood of the removed element contains more than 3 elements, then the neighborhood can be reorganized to a proper 2-3-tree.
Otherwise, if there are exactly 3 elements in the neighorhood, the neighborhood is organized into a single 3-node.

      |
    [1|2]

Since this subtree has height one less than its sibling, the tree needs to be rebalanced as follows.

Case 1 (1-X-1)

This case

      |
     [2]
    /   \
  /       \
(1)       [4]
         /   \
       (3)   (5)

can be locally rebalanced to

     |
   [2|4]
  /  |  \
(1) (3) (5)

Since this subtree is of height one less than its sibling, the rebalancing must recurse upwards.

Case 2 (1-X-2)

This case

      |
     [2]
    /   \
  /       \
(1)      [4|6]
        /  |  \
      (3) (5) (7)

can be rebalanced to

         |
        [4]
       /   \
     /       \
   [2]       [6]
  /   \     /   \
(1)   (3) (5)   (7)

Case 3 (2-X-1-1)

This case

           |
         [2|6]
       /   |   \
     /     |     \
   /       |       \
(1)       [4]       [8]
         /   \     /   \
       (3)   (5) (7)   (9)

can be rebalanced to

          |
         [4]
       /     \
     /         \
   [2]        [6|8]
  /   \      /  |  \
(1)   (3)  (5) (7) (9)

Case 4 (2-X-1-2)

This case

          |
        [2|6]
      /   |   \
    /     |     \
  /       |       \
(1)      [4]       [8|10]
        /   \     /  |  \
      (3)   (5) (7) (9) (11)

can be rebalanced to

              |
            [4|8]
          /   |   \
        /     |     \
      /       |       \
   [2]       [6]       [10]
  /   \     /   \     /   \
(1)   (3) (5)   (7) (9)   (11)

Case 5 (2-X-2-Y)

This case

          |
        [2|8]
      /   |   \
    /     |     \
  /       |       \
(1)     [4|6]    ((9))
       /  |  \ 
     (3) (5) (7)

can be rebalanced to

              |
            [4|8]
          /   |   \
        /     |     \
      /       |       \
   [2]       [6]      ((9))
  /   \     /   \
(1)   (3) (5)   (7)

Case 6 (2-1-X-Y)

This case

            |
          [4|6]
        /   |   \
      /     |     \
   [2]     (5)   ((7))
  /   \
(1)   (3)

can be rebalanced to

           |
          [6]
        /     \
      /         \
   [2|4]       ((9))
  /  |  \
(1) (3) (5)

Case 7 (2-2-X-Y)

This case

            |
          [6|8]
        /   |   \
      /     |     \
   [2|4]   (7)   ((9))
  /  |  \
(1) (3) (5)

can be rebalanced to

              |
            [4|8]
          /   |   \
        /     |     \
      /       |       \
   [2]       [6]      ((9))
  /   \     /   \
(1)   (3) (5)   (7)