Also available as an Acrobat File |

Introduction |
An Investigation of Methods for Visualising Highly Multivariate Datasets## 1. IntroductionSuppose we have a set of $m$ continuous observed variables for each of $n$
cases, and denote the single variable.In order to decide how useful a representation is, one needs to consider the
kind of feature in data that one wishes to detect. Three common possibilities
in social science data are
For most types of feature, there is variability in subtlety. For example an
extremely high or low value of one particular variable would be a fairly crude
type of outlier. This could be detected using a well-established univariate
graphical tool such as a box-and-whisker plot (Velleman and Hoaglin, 1981). On
the other hand, a more subtle outlier might be a point in the centre of a
sphere, when all of the other points are close to its surface. The problem
here is that none of the three coordinate values ( x)_{1}, x_{2}_{ }are unusual. Thus, no
simple univariate or bivariate representation could detect this outlier. The
problem would become even worse if instead of a sphere, a ten dimensional
hypersphere were substituted in the previous example! Generally, more subtlety
tends to imply a greater degree of sophistication required in the graphical
representation. This leads to the statement of a fundamental problem: "How
can the interactions between large numbers of variables be represented in a
managable number of dimensions?".In this Case Study, two data sets will be used to demonstrate a number of ways of addressing the above problem. The two data sets are described in detail in Appendix A, but, briefly, the first is a set of six socio-economic variables for northern England measured at census ward level, taken from the 1991 census, and the second is a simulated data set designed to have a ´pathological' outlier, as discussed above. The following sections each describe a particular approach to visualisation, giving examples using the census data set. After these sections, a number of specific issues are considered, including a comparison of the way each method responds to the synthesised data. |

Graphics Multimedia Virtual Environments Visualisation Contents