Also available as an Acrobat File

Editorial

Abstract

Introduction
Background
Aim and Structure of the Report

The Visualisation of Area-based Spatial Data

1. Introduction

1.1 Background

Data relating to geographically-defined units (hereafter referred to as `spatial data') are an extremely important source of information for many social science disciplines. The clearest example of this is the decennial census of population, which provides the basic information on the socio-economic characteristics of the population of the country. The data are collected for individuals, but because of the necessary confidentiality restrictions are reported as aggregate statistics for small areas (Enumeration Districts) and various larger, standard sets of areas (such as wards, districts and councils) which are aggregates of EDs.

The census is by no means an isolated example - data on employment, unemployment, family expenditure, health, crime, industrial productivity and agricultural productivity are all provided on a regular basis as counts or rates for geographical areas (and much of it available to researchers from the ESRC Data Archive at Essex).

Geographical location is also an important factor in many areas of social and economic policy, at both national and European level. It is recognised that social and economic conditions vary across the country, and that in some instances assistance and resources can usefully be targeted at problem `areas'. Spatial data are not simply of interest to those working in disciplines with a strong spatial element, such as Geography and Planning, but are of relevance to researchers in Politics, Sociology, Economics, Social and Economic History and Criminology.

Areal spatial data have a number of characteristics which distinguish them from other types of data used in the social sciences, and which have necessitated the development of specialist techniques and software.

There are a very large number of ways to define a set of areal units for the collection or reporting of statistics. The difficulty is that the choice can have important consequences for the later analysis of the data. The problem is compounded by the fact that in many cases the variables being measured cannot be defined independently of the frame of measurement. This is in direct contrast to the physical sciences, where variables such as pressure and temperature can be considered to exist independently of the sampling framework. By contrast a variable such as population density only has meaning in terms of a particular physical area.
The commonest method for displaying spatial data is by the use of a map - in the case of area data, choropleth and point symbol maps are the commonest two forms. Maps are commonly used to look for `patterns' in the data, where the word pattern normally implies regularities in the data. The purpose is often to separate out broad (or smooth) features of the data from local irregularities (or rough feautures). The problem is that the same data may be mapped many different ways, each yielding a different visual pattern. This has been the subject of a great deal of research by cartographers and geographers, and is the subject of another of the case studies in this initiative. A particular problem is that the areas will vary in their spatial extent, to the extent that physically large areas may visually dominate the map.
Areas tend to vary in their base population. Even though standard reporting areas, such as EDs and wards, are designed to have approximately equal populations, other factors such as the variation in population density and historical and logistical constraints on boundary placements, mean that base populations within one set of areal units can vary by an order of magnitude. This means that rates calculated using population as the denominator will have different degrees of reliability across the map. This leads to the possibility that some aspects of the variation in the mapped values may be due to variations in the size of the base populations from which the rates have been calculated.
Even if the purpose of the analysis is not inherently spatial, it may still be necessary to understand some of the particular characteristics of spatial data in order that analytical results will not be misinterpreted. For example, published health statistics might be analysed to look for associations between material deprivation and the incidence of ill health. The areal reporting units provide a ready-made sampling frame for examining this question, splitting the country into a series of samples which can be analysed using regression for example. However, since the samples have been drawn spatially the assumption of independence of errors may not hold, undermining, among other things the usual inferential tests of significance.

Given the widespread use of spatial data in social science disciplines where there may not be a strong tradition of expertise in spatial data handling, there is clearly a need for software tools which assists the non-specialist in the analysis of spatial data handling. In this context, visualization methods may assist in the analysis of both the spatial and non-spatial elements of area-based data in four areas:

The identification of unusual data values or errors in the data.
The detection of patterns in the arrangement of data values.
Hypothesis formulation from the data.
The assessment of models.

Areas 1-3 are often referred to as exploratory spatial data analysis (ESDA. Area 4 includes disgnostic checks on models as part of confirmatory spatial data analysis (CSDA).

Visualization methods are not intended to replace numerical methods of analysis but rather complement them. They may also assist in the development and testing of numerical models of data.

Much of the work on the visualization of area-based data has been undertaken by academic researchers in disciplines with a specialist interest in spatial data and reported in the scientific literature in these disciplines. This means that many social scientists will be unaware of the potential use of visualization in analysing area-based data. The purpose of this case study is therefore to review the state of the art in this field to help disseminate knowledge about these techniques to a wider audience. We also identify opportunities for JISC and the ESRC for further work in this area.

1.2 Aim and Structure of Report

The aims of this report are as follows:

1. To assess the state of the art in the visualization of area-based data through an examination of four recently developed pieces of software.

2. To suggest areas where further work is needed, and what form this might take.

To this end, we first provide a brief review of the literature in this area to provide some background. The bulk of the document then consists of a review of four pieces of software designed to assist with the visualization of area-based data. These are reviewed in terms both of the range of graphical and statistical tools provided, but also in terms of the graphical effectiveness of their visualization tools. The review was undertaken by a researcher who is skilled in social science research using spatial data, but had very little previous experience of these kinds of software packages, since we felt this would provide a more representative view of how the existing software might be received by the social science community in general. Given this focus, and the limited time available, the review is not a comprehensive assessment of all the facilities provided by these packages.

Visualization may be used for many different types of analysis, and we have decided to focus on the use of visualization for the statistical analysis of spatial data, and in particular Exploratory Spatial Data Analysis or ESDA. As part of the review, we have developed a simple data model for ESDA which suggests the range of facilities one might require in order to undertake a full programme of ESDA in the case of area-based data. By comparing the existing offerings with this theoretical `model' we are able to suggest those areas where additional facilities might be required.

Graphics Multimedia Virtual Environments Visualisation Contents