Home
 Contact
 People
Research
 Themes
 Projects
 PhD Thesis
 Publications
Activities
 Education
 ESANN
 Neur.Proc.Lett.
Links
 Academic
 Neural Stuff
 Conferences

 
 

Curvilinear Distance Analysis

Introduction

Curvilinear Distance Analysis (CDA in short) is the last development in a well known family of nonlinear projection methods. The story began with the so called Sammon's mapping' method, created as a nonlinear alternative to the widely used Principal Component Analysis, which is a linear projection method. Sammon's mapping is also closely related with Multi-Dimensional Scaling (MDS) used in psychology. The first improvement to Sammon's mapping were brought by Jeanny Hérault and Pierre Demartines: in order to underline the nonlinear capabilities of their method, they call it Curvilinear Component Analysis (CCA), by opposition with PCA. CCA is more powerful and faster than Sammon's mapping. Curvilinear Distance Analysis (CDA) lies one step beyond: the main difference between CCA and CDA is that the latter uses a special kind of distance measurement called Curvilinear Distances.

The problem of nonlinear projection

The problem of nonlinear projection can be stated as follows.  Suppose we have a numerical dataset of N vectors embedded in an n-dimensional space.  Such database may originate from a set of n sensors.  If there exist dependencies between the features of the vector, data are not randomly distributed and the database is said to have a structure.  For example, a plane in the 3-dimensional space is a 2-dimensional structure since there is one dependency linking the 3 coordinates x, y and z.  Very often, the dimension of a structure (say p) is lower than the dimension of its embedding n-dimensional space, so the structure can be projected to a p-dimensional space.  A well known projection method is the Principal Component Analysis (PCA): it works pretty well, but only when the dependencies are strictly linear.  Once you have nonlinear dependencies, an other method is needed, in order to find a nonlinear mapping between the n-dimensional embedding space and the p-dimensional projection space.  

Sammon's mapping

The idea behind Sammon's mapping is Topology Preservation. One way to achieve this goal consists in preserving distances between samples of the numerical database:
Two samples that are close to each other have to stay close when projected;
Two projected samples that are close to each other have to originate from two samples that were close two each other.

This kind of topology conservation is also used in the well known Kohonen's maps. In order to guarantee topology preservation, Sammon defines an error function:

where dni,j and d are the distances between the i-th and j-th vector, respectively in the n-dimensional embedding space, and in the p-dimensional projection space.  Given this error function,an optimal projection can be computed using a gradient descent algorithm.  

Curvilinear Component Analysis

Curvilinear Component Analysis brings some improvements to Sammon's mapping.  Actually, when unfolding a nonlinear structure, Sammon's mapping cannot reproduce all distances.  One way to face this problem consists in favouring local topology: CCA tries to reproduce short distances firstly, long distances being secondary.  Formally, this reasoning leads to the following error function(without normalization):

 

By comparison with ESammon, ECCA has an additional weighting fonction F depending on di,jp and on parameter l.  The factor F is a decreasing function of its argument, so it is used to favour local topology preservation.  For example, F could be a step function of (l-d

Another improvement brought by CCA to Sammon's mapping is the use of Vector Quantization (VQ).  VQ decreases the computational load of the distance calculation (there 0.5*N*(N-1) distance to compute!).  Unfortunately, the use of VQ implies the use of an interpolation in order to project vectors that are not prototypes of the codebook.  

Curvilinear Distance Analysis

Curvilinear Distances Analysis is the last development in the Sammon/CCA family.  The novelty in CDA is the use of true curvilinear distances.  The curvilinear distance di,jn between the i-th and j-th points of a structure is the distance measured along the structure and not through the object, like the Euclidean distance.  In practice, the curvilinear distance is computed as the shortest path between two prototypes of the codebook, after quantization and linking of the prototypes. The curvilinear distance is only a little change in the error function:

But it has great consequences.  Hard nonlinear objects like spirals can now be perfectly unfolded!

Projection of a spiral  (2D to 1D).  

Projection of a ribbon (3D to 2D).  

For more details, please read this article (or the slides).

This webpage is still under construction!

 

Send any comment to : John A. Lee

Last updated : 07/12/2000