Details of the Kozachenko-Leonenko estimator

Back to Kozachenko-Leonenko estimator

This page takes a deeper look at the implementation of the Kozachenko-Leonenko estimator and derives an upper bound to the maximum absolute error introduced by allowing relative error in nearest neighbor searching.

Computation

Let be a point-set and such that is the distance from to its k:th nearest neighbor. The Kozachenko-Leonenko estimator for Shannon differential entropy is computed by

where

In case all are zero, a NaN (Not-A-Number) is returned.

Allowing error in nearest neighbor distances

Because of the guaranteed efficiency of approximate nearest neighbor searching, it would be interesting to see if some relative error can be allowed in without introducing too much error in the result. Assume the distances have a amount of relative error in them, where . Then the estimator is affected as follows:

That is, the absolute error is bounded by

Since in practice needs to be greater than 1 to get the benefits of approximate nearest neighbor searching, the resulting error is unfortunately too large. Therefore we shall always compute exact nearest neighbor distances.