Also available as an Acrobat File |

The Projection Pursuit |
An Investigation of Methods for Visualising Highly Multivariate Datasets## 2. The Projection Pursuit Approach to Visualisation## 2.1 ContextSuppose, for a set of cases, Figure 1: Example of point projection (1)
In figure 1 the data points are projected onto a plane to the right. Here the
projected image shows two distinct clusters of points. In figure 2 the same
data points are projected onto a plane above. Here the projected image shows
only a single cluster of points. Obviously in this case the projection is from
R, but similar (and sometimes more
complex) phenomena occur when the projection is from ^{2}R
dimensions and ^{m}m > 3.
Figure 2: Example of point projection (2) The above example demonstrates that different projections of the same data set
can reveal different aspects of the data structure - indeed some projections
can fail to reveal any structure. There are in fact an infinite number of
possible projections to choose from, so which one should be used? ## 2.2 The Projection Pursuit MethodTo see how this technique operates, it is first worth noting that
projections from R are linear
mappings. Thus, if ^{2}X = {x} is a matrix whose
(_{ij}i,j)th element is the jth variable for observation i, we
can write the general projection from R to
^{m}R as (^{2}z_{1}, z_{2}) = (Xa',
Xb'). Here a and b are m-dimensional row vectors
defining the linear transform, and z_{1 }and z_{2
}are n-dimensional column vectors representing the points on the
projection screen. The prime denotes transposition. Choosing a projection is
now a matter of choosing a and b.
The next problem is to decide what kind of feature one wishes to detect. When
this decision is made, one attempts to measure the degree to which this feature
is exhibited in (
At this stage, however, some careful thought should take place. Clearly the
nearest neighbour distance index is scale dependent. Multiplying
The projection pursuit algorithm is thus equivalent to a constrained
optimisation problem. The difficulty with the specification given is that the
constraints are given in terms of
This is a standard form for a constrained optimisation problem.
Computationally, the difficulty of this problem depends on the index function
Figure 3: Minimised MNND projection of census data In figure 3 the result of applying this technique to the census data is shown. Due to the nature of the index function, the rotation of the point pattern obtained is arbitrary, so there is no clear interpretation of the individual axes in the plot. No obvious clusters exist in the plot, suggesting perhaps that the data is not bimodal in any way detectable by projecting onto a two dimensional plane. However, some features are very clear, most notably a ´spur' in the lower part of the plot. |

Graphics Multimedia Virtual Environments Visualisation Contents