Also available as an Acrobat File |
Appendix A - Datasets |
An Investigation of Methods for Visualising Highly Multivariate Datasets9. Appendix A - Data Sets Used in the StudyIn this study, two datasets are used. The first of these is derived from the 1991 UK census, at ward level for the Northern region of England, using variables as follows: LLTI The percentage of persons in households in each ward where a member of the household has some limiting long-term illness. This is the response variable. Note that to control for different age profiles in areas, this is only computed for 45-65 year olds - an age category that is perhaps most at likely to suffer LLTI s a result of working in the extractive industries.
CROWDING This is the proportion of households in each census ward having an average of more than one person per room. This is an attempt to measure the level of cramped housing conditions in each ward.
UNEMP The proportion of male unemployment in an area. This is generally regarded as a measure of economic well-being for an area. SC1 The proportion of heads of households whose jobs are classed in social class I in the Census. These are professional and managerial occupations. Whilst the previous variable measures general well-being, this measures affluence.
The second dataset is a synthesised, six-variable data set. The variables are named V1 to V6. Each data point lies on the surface of a six-dimensional hypersphere of radius one, with the exception of one outlier, which lies at the centroid of the hypersphere. This outlier is particularly ´pathological' in that in any five-dimensional subset of the six variables, the value of this outlier is not particularly unusual. While it is uncertain how often this situation will happen with ´real life' social science data, it does provide a yardstick for assessing each visualisation method in a worst case scenario. The data may be generated as follows: Note that a point on the circumference of a unit circle may be parametrised in terms of a single variable by the expression
If theta is a uniform random variable, then random points on the circumference may be generated from this expression. Call this expression C_{2}(). Now let be a point on the surface of an n-dimensional unit hypersphere. For example, if n=3, then is a point on the surface of a sphere. Recursively, we can parametrise C_{n+1} by
where * is a vector concatenation operator such that (x,y)*z = (x,y,z). One can check inductively that if the squared elements of C_{n} sum to one, then the squared elements of C_{n+1} also sum to one. Since this can be checked directly for C_{2}, it is true for all n > 2 also. Thus, the surface on an n-dimensional sphere can be parametrised by a vector . By generating uniform random numbers for the elements of this vector and applying this transform, it is possible to generate points on the surface of the hypersphere. The simulation is then finalised by adding the origin point as the outlier in the data set. |
Graphics Multimedia Virtual Environments Visualisation Contents