Also available as an Acrobat File

Editorial

Abstract

Introduction

The Projection Pursuit

Conclusions

Appendix A - Datasets

Appendix B

Case Studies Index

An Investigation of Methods for Visualising Highly Multivariate Datasets

7. Conclusions

This study has shown that there are several possible ways to visualise multidimensional data and that each has merits and pitfalls. The projection pursuit approach, particularly in the autocorrelation form, is able to identify patterns in linear combinations of the data that perhaps would not be unearthed with any of the other techniques. This is particularly true of geographical patterns. These can either be discovered in a very direct way (the autocorrelation approach) or by using packages such as XLISP-STAT to link two-dimensional projections with maps. The technique also offers a great deal of flexibility. The index function I can be chosen in many different ways, and in each case an optimal projection for serving some very specific purpose can be found.

On the negative side, the technique is perhaps one of the hardest to interpret. This is mainly due to the fact that the projection plots produced are of indices produced by linear combinations of variables - and one has the problem of assigning meaning to these indices. Interpretation plots such as figues 4 and 6 are of some help, but they still leave some ambiguity. This is perhaps inevitable, as any linear projection from a higher dimensional space onto a lower one is likely to map several points in the domain space onto the same point in the image. For the method to come into its own, it is usually necessary to use the projection as a single view in a system of linked views in an interactive system, as suggested above.

The RADVIZ approach is also a projective technique, and so many of the comments that might be applied to projection pursuit also apply here, particularly those relating to the interpretation of plotted patterns. However, when exploring compositional data RADVIZ comes into its own. In this instance, the patterns have a very intuitive interpretation in that points near to the vertices of the regular polygon are correspond to observations dominated by a particular component of the compositional breakdown. If the vertices are labelled by their corresponding variables, as in figure 8, then it becomes immediately clear which variable it is. It is also worth noting that the RADVIZ projection of the census data was quite similar to the maximising MNND projection, but required considerably less computational effort to achieve.

The parallel coordinates approach is perhaps the most intuitive. The labelling of the axes makes it very clear exactly what values individual variables take - a property which none of the other approaches have. As with RADVIZ, the choice of representation for a given data set is not unique, and the problem of choosing an optimal representation is a difficult one. In this case, a choice of the ordering of the parallel axes must be made. However, in the author's experience, non-optimal choices of parallel coordinate axes can work reasonably well such that in many cases sub-optimality may not imply unacceptably poor quality. Perhaps a useful compromise is to allow the user to swap the axes interactively, and explore more than one possible axis ordering. Indeed, a similar approach could be applied with RADVIZ.

Graphics Multimedia Virtual Environments Visualisation Contents