Derivation of the coherent point-drift algorithm

Back to Coherent point-drift

In this section I will rederive the coherent point-drift algorithm, described in Point Set Registration: Coherent Point Drift, Myronenko, A., Xubo Song, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, no.12, pp.2262,2275, Dec. 2010.

Expectation-Maximization algorithm

Input

Meaning Set
Probability space
Actual data; random element
Measurable space
Measurable space
Observation function; measurable
Observed data
Set of distribution parameters
Set of probability densities

Maximum-likelihood

The maximum-likelihood estimator for the distribution parameter — given — is a random element such that

Since is an increasing function, and probability densities are non-negative, this is equivalent to

Iteration

The expectation-maximization algorithm, or the EM algorithm, is an iterative algorithm for finding a maximum-likelihood estimator in a neighborhood of the initial parameter — given observed data . The EM-iteration is given by

for all .

Intuition

Ideally, we would like to compute from Equation . However, this is not possible, since we only observe , and not .

Therefore, we integrate out from the information that is not contained in by the conditional expectational, and then try to maximize the remaining likelihood. However, this is often not possible in closed form.

Therefore, we change the strategy to an iterative approximation instead. We assume that the conditional expectation is taken with respect to constant parameters; those from the previous step. If this can be maximized in closed form, then we can improve the approximation by repeating the local maximization. This provides us with a local maximum in the limit. The local maximum may or not may not be a global maximum.

Coherent point-drift

The coherent point-drift algorithm is a point-set registration algorithm. It treats the points in one point-set as the centroids of normal distributions, and then finds the parameters for the normal distributions to best fit the other point-set. This is done by maximimizing likelihood using the EM algorithm. This is useful, because the information on which normal distribution generated the point is lost — something the EM algorithm can deal with.

The normal distributions share the same isotropic variance. The coherent point-drift algorithm includes the variance as an estimated parameter. This requires to add yet another component to the mixture model. The choice here is an improper uniform “distribution”. It is here where the probabilistic interpretability of the algorithm suffers somewhat. However, from the viewpoint of likelihood maximization there are no problems.

Input

Meaning Set
Model point-set
Scene point-set
Set of transformation parameters
Set of transformations
Initial transformation parameter
Initial variance parameter
Noise tolerance

Output

Name Set
Transformation parameter
Variance parameter

Here matches well — in some sense.

Model

Let be i.i.d. random elements — the actual data — such that

for all and . Here

The missing component is an improper uniform distribution:

We assume the to be a realization of — there exists such that — and that the observation function is such that .

Location-index distribution

The probability density of is given by

for , and

The logarithm of the former is

Location distribution

As a marginal distribution, we have that

for all .

Index given location

Substituting location-index and location,

Expectation

Conditional expectation

The conditional expectation of given is given by

Joint conditional expectation

The conditional expectation of given is given by

Let us denote

Maximization

Since the first and the last term of the joint conditional expectation do not depend on or , maximizing it is equivalent to maximizing

However, this does not seem to be possible in closed form. Instead — following the EM algorithm — we assume constant parameters, and minimize

A necessary condition for minimizing is that the partial derivative of with respect to is zero:

This can be used to solve for :

Substituting back, we get that we must minimize

Since is increasing, minimizing this is equivalent to minimizing

Therefore,

The solution of this problem for many subgroups of affine transformations can be found from here.