PointKdTree design notes

Back to Point kd-tree

This section contains some design notes on the implementation of the PointKdTree.

Invariants

How to use the data structure?

The data structure is traversed using cursors. A cursor is a pointer-like object that points to a node of the kd-tree. As always, this allows for the separation of the data structure and the algorithms that use it.

Should you store points at all nodes or just leaf nodes?

Proposition 1: Store one point at each split node

Advantages:

Disadvantages:

Proposition 2: Store multiple points at leaf nodes

Advantages:

Disadvantages:

Decision

Choose Proposition 2.

How to support different kinds of points?

Proposition 1: an object oriented interface

class PointKdTreePoint
{
public:
    virtual ~PointKdTreePoint()
    {
    }

    virtual real position(integer axis) const = 0;
};

Disadvantages:

Proposition 2: generic programming

Store arbitrary user points and require to supply a functor that gives projection to an axis given a point.

template <typename Real, integer N, typename Point,
typename ProjectionFunctor>
class PointKdtree
{
//...
};

Advantages:

Disadvantages:

Proposition 3: generic programming with a policy

template <typename Real, integer N, typename Locator>
class PointKdTree
{
};

class Locator
{
public:
    using Point = int;

    real position(const Point& point, integer axis) const;
};

Advantages:

Decision

Choose Proposition 3.

Interface for adding points to the tree

Proposition 1: Give the PointKdTree a cursor and a point to add to that node

The problem with this approach is that the point might not be contained in the node that the cursor is pointing to. This would in turn break our invariant. Also, containment can’t be checked without traversing the tree from top to bottom because nodes do not store node bounds for all dimensions.

Proposition 2: Give the PointKdTree a point to add to the tree

The tree is then traversed from top to bottom searching for the correct leaf node to add the point to. Sounds good, however, if there are n points, then the traversals for each point take extra time which could probably be amortized by adding a set of points at the same time.

Proposition 3: Give the PointKdTree a set of points to add to the tree

At each node, the set of points is partitioned into two sets, which are sent down the node hierarchy. This way the traversal cost is amortized among the set of added points.

Decision

Choose Proposition 3.

How to store the points of a leaf node?

Proposition 1: An array or a linked list for each node

Advantages:

Disadvantages:

Proposition 2: An array shared by all nodes

Leaf nodes would refer to points in this array by integer intervals.

Advantages:

Disadvantages:

Proposition 3: A linked list shared by all nodes

Leaf nodes would refer to points in this list by inclusive iterator intervals. Why inclusive intervals? Because the point that is pointed to by an end-iterator of a normal range can get removed.

Advantages:

Disadvantages:

Decision

Random access to the point list is important when parallelizing work that is to be done for each point in the point list (or more properly, for each point iterator in the tree). If one chooses the linked list approach, one can get around this problem by copying the iterators of the point list to a random access array and then using that to carry out the task. One does not have this problem with an array since it already is random-access. However, this advantage of arrays is diminished when considering semidynamic point sets: if most of the points have been hided, then one still has to visit those hided points in the point list. Particularly because of this reason, Proposition 3 is chosen.

The importance of dividing equal points evenly

Consider that there are n points and that they are all at the same position. If you divide the points to child nodes based purely on location (e.g. always store a point on the splitting plane to the negative child), then all the points necessarily end up at the same leaf node. Furthermore, if n exceeds the splitting threshold for a node, the leaf node is the result of splitting until the depth threshold is reached.

This example underlines the importance of distributing the points on the splitting plane evenly to both sides of the plane. When this is done, the number of points at the same position diminishes exponentially. Thus, for our example the height of the tree will be O(log n).

In combination with the sliding midpoint rule, which guarantees that at least 1 point is separated from the others at each split, distributing equal points evenly guarantees that the number of nodes is O(n) in the worst case.

Removal of points

The removal of points requires an efficient way to find out for each point (its iterator) the leaf node that it is stored in.

Proposition 1: Do not allow removal of points

This is the only choice if one decides to store the points in an array.

Proposition 2: Search the tree for the containing leaf node

This approach does not work, because one does not know which node to follow in the case of a point exactly on the splitting plane… Unless one bases the positioning purely on location which however leads to other robustness problems, as discussed above.

Proposition 3: Store with each point its containing leaf node

Advantages:

Disadvantages:

Decision

Choose Proposition 3. The increased memory requirement is of no concern. When a lot of points need to be removed, it would be useful to have a function that removes all points, but leaves the subdivision structure intact.

How should you deal with changing geometry?

Because you can actually be holding only references to the geometry data, it is possible for the user to change geometry without the kd-tree knowing about it. The end-result is that one can invalidate both invariants of the kdtree by modifying point locations. We have no way to prevent this. This problem is similar to what happens with the std::set (for example) if one uses pointers as keys and compares them by their values. We can only educate the user that whenever he wishes to change geometry, he should do it by removing, changing, and inserting back.

Decision

Document that the user should not change geometry, but instead follow the remove-change-insert pattern.

How should you store the nodes?

Proposition 1: Store nodes in an array

Proposition 2: Allocate each node dynamically

You can avoid allocation overhead by using a PoolAllocator or ArenaAllocator.

Decision

Choose Proposition 2 as the only working one.

Null references to children

If there are no points in a child of a split node, you can either create an empty leaf node or give a null reference to the intermediate node. However, empty leaf nodes are helpful when you want to subdivide, because then you can explicitly point to the node you want to subdivide. With null references you have to be indirect from the parent: “subdivide the negative child of this node”. This is very impractical.

Decision

A splitting node always refers to two concrete children.

Should the kd-tree know its bounding box?

The bounding box of the kd-tree (or its contained box) is often needed. Tracing the bounding box is easily done when adding points to the tree. In contrast, if the bounding box wasn’t traced, then it would have to be recomputed manually and this can mean that it can be done many times redundantly.

If the bounding box is traced, then it would be useful to have a function to enlarge it (but not shrink it).

Decision

Trace the bounding box of the tree when adding points. Make it possible to enlarge the bounding box but not shrink it.

Should you use the same data structure for points and arbitrary shapes?

This is possible. However:

Decision

Because of the abstraction penalties in both directions, a better alternative is to separately create a kd-tree that is designed for point sets (PointKdTree) and a kd-tree that is designed for ray tracing (KdTree)

Should the nodes know their parent nodes?

A parent link is a link from a child node to its parent node. In particular, a parent link makes it possible to manage hierarhical information, i.e., information in a node which is dependent on information in its children. This makes it possible to turn the kdtree into a self-adapting tree under removal and addition of points. The addition of parent links allows a considerably expanded design space. Therefore we do not decide the question here, but enumerate a selection of possibilities in a moment.

Enumerations of different kinds of kdtrees

Basic version

Description:

Advantages:

Shortcomings:

Removable version

Additional features to basic version:

Advantages compared to basic version:

Adaptable version

Additional features to removable version:

Advantages compared to removable version:

Bentley’s version

Additional features to adaptable version:

Comments:

Advantages compared to adaptable version:

Shortcomings:

Removal adaptation version

Additional features to adaptable version:

Comments:

Advantages to adaptable version:

Shortcomings:

Multi-level version

Additional features to adaptable version:

Comments:

Advantages compared to adaptable version:

Summary

Implementation experience