Also available as an Acrobat File

Software Evaluations
  Functionality
  Ease of Use
  Implementation
  Methodology

The Visualisation of Area-based Spatial Data

3. Software evaluations

The assessment criteria are grouped into three categories. The first of these covers the functionality of the package - the range of visualization tools and their graphical effectiveness. The second assessment category covers ease of use whilst the third covers implementation. These categories are typical of those used in more formal comparisons between software packages (Wise 1991, AGOCG 1992,1993) although assessment of the effectiveness of graphical facilities, as opposed to simply enumerating their range, does not appear to have been undertaken before, and this is one area we identify as providing scope for further work.

3.1 Functionality

The functionality of the packages has been assessed using two sets of criteria: (1) what is the range of graphical and numerical facilities offered and (2) how effective are the graphical tools at enabling the visualization of data.

3.1.1 Range of tools

We have taken the view that the range of visualization tools provided by any software package should not be assessed against a simple list (of possible tools) but rather against an appropriate data model which identifies generic properties of spatial data and the corresponding tools needed to do the job of identifying these properties. The range of visualization tools within any package is therefore assessed against some definition of "coherence" or "complementarity". Stated differently, "does the package offer the analyst the appropriate range of tools necessary to undertake specified tasks?". A package with a small but coherent set of tools which complement one another effectively with respect to some defined analytical goal would be more valuable to the spatial data analyst than a package which may have more tools but which fails the complementarity test or for which there is no clearly defined design criterion for deciding which tools to include.

The authors of all the packages have linked their software to the requirements of ESDA. This implies that the visual tools embedded in the systems should use statistically resistant methods and employ both graphical and cartographical forms of display for the purpose of identifying data properties, suggesting hypotheses, identifying atypical values and (if modelling capability is included) assessing model fits. Visualization is an integral part of ESDA.

In the EDA literature, the data being explored is often considered to consist of two components:

DATA = SMOOTH + ROUGH.

Smooth properties of non spatial data on a single variable are characteristics such as the centre of a set of values and degree of spread around this value, while outliers would constitute the rough element. Smooth properties of two variables might be the trend line specifying the relationship between the collection of pairs of values whilst rough properties refer to residuals around the fit line. For spatial data, the data model generates four components: smooth and rough elements of attribute values (without any reference to the arrangement of these values and hence equivalent to the non spatial case) and smooth and rough properties of the first and second order arrangement characteristics of the attribute values (which loosely equate to trend and spatial autocorrelation respectively). Table 2 below expands each of these components in more detail and identifies techniques used to identify each property.

	Smooth	Rough
	Properties	Techniques	Properties	Techniques
Non-Spatial	Univariate: Centre; Spread; Distribution.	Boxplot; dotplot; histogram	Outliers	Boxplot;
Non-Spatial	Multivariate relationships	scatterplot plus resistant fit or loess curve; mosaic plot;q-q plot; multi-way dotplot matrix scatter- plot	Relationship outliers	scatterplot plus resistant fit; mosaic plot
Spatial	Trend	Median filters; Trellis displays	Spatial outliers	Moran plots with resistant fit
Spatial	Autocorrelation	Moran plot with resistant fit	Local clusters	Getis-Ord

Table 2: Classification of ESDA techniques

Taking this as a model of spatial data, then software for ESDA should provide tools which allow the analyst to explore all four components. However there are a number of additional difficulties that arise in exploring spatial data and we turn to these now.

In the case of area-based spatial data, data values are not independent of the particular set of areas used to measure them moreover there is usually no "natural" or "right" partition. This is referred to as the Modifiable Areal Unit Problem (or MAUP) which has two components:

Scale effects - as the average size of the areal units increases, variation in data values between areas is smoothed out. One consequence of this is that statistical measures such as correlation are influenced - in this case correlation values tend to rise.
Aggegation effects - at any given scale there are a large number of ways to subdivide the study region into areal units, and different partitions can produce different analytical results.

We suggest that a fundamental tool for ESDA is the ability to construct new spatial partitions (whenever this is possible) in order to explore the sensitivity of ESDA results to the spatial framework. This means incorporating within the software the ability to construct new regional partitions for the data from the set of smallest units stored (Wise et al 1997).

Many techniques of spatial analysis depend on the construction of the connectivity (or W) matrix. This matrix defines the analysts assumptions about the nature of the spatial relationships between areas of the chosen regionalisation. Just as there is no "correct" regionalisation there is no "correct" connectivity matrix and we suggest that to the extent that tools are made available in the software for analysing spatial relationships and which depend on defining the spatial relationships then it should be possible to replicate the analysis for other assumptions about the matrix W (Haining 1990).

Whilst not unique to spatial data the problem of missing data values is a particularly severe one. The fact that values may be missing has particular significance in a data set where interest focuses on the arrangement properties of the data and where a missing value refers to a particular area, not, as in classical statistics, simply a missing value from an experimental set which contains replication. Hence, the ability of the software to handle missing cases is potentially very important.

Graphics Multimedia Virtual Environments Visualisation Contents