Also available as an Acrobat File

Editorial

Abstract

Introduction

The Issues
  Numbers
  Areas
  Symbolism

Dynamic mapping tools

cdv

Alternatives to choropleth maps

Maps of the Census: a rough guide

2. Issues to consider when area value mapping

In this section we outline and illustrate some of the standard ideas from cartography about good and bad practice in choropleth mapping. These have been grouped into sections on the numbers mapped, the zones used, the classification into class intervals and the symbolism employed.

2.1 Issues concerning the numbers

Figure 2 shows the same data as were used in Figure 1, but instead of mapping numbers that express the areal density of the population we have simply attempted to visualise the total population.

Figure 2: Total population of Leicestershire at the 1991 Census.

In this example the inner city wards that have the highest populations also have small spatial areas, so that the pattern remains similar to that shown in Figure 1. In other cases large zones may tend to have large population totals and vice versa, so that the effect of the zones used is that the map tells us very little about the underlying distribution. The distribution of population is better revealed by mapping a ratio, in which the total count is expressed relative either to the area over which it has been aggregated (giving an areal density) or to some population total (giving a population ratio). Figure 3 shows a population ratio, the percentage of the population aged between 0-15. The transformed variable mapped is now a dimensionless quantity whose value is not dependent on the areas of the zones used, but the map still contains a dependence on the areal units in that physically large areas tend to dominate the display.

Figure 3: Total children aged 0-15 expressed as a proportion of the total ward populations, Leicestershire 1991.

Depending on the problem being studied, it is possible to refine the numbers used in several ways to provide more revealing maps. Some options to consider are:

Standardisation of the denominator in ratios mapped in some problem relevant way. In spatial epidemiology, for example, use is often made of Standardised Mortality Ratios (SMR) which are the ratios of the number of deaths in each area to those expected on the basis of some externally specified (typically national) age-sex specific rates. More generally, in a Census Atlas (CRU 1980) produced using the 1971 Population Census of UK, a team based at Durham University mapped a number of variables using as a measure the so-called signed chi-square statistic. This was defined as the squares of the differences between the actual numbers in the zone and those expected if the variable in question were uniformly distributed over the entire area divided by this expected total as in a conventional chi-square statistic. Use of the square of the differences necessitated each chi-square value to have its positive (more than expected) and negative (less than expected) signs to be added after the calculation. A simpler alternative is to map the square root of these quantities (O_i - E_i)/ sqrt(E_i) which automatically takes care of the sign. Figure 4 shows the distribution of number of cars in each ward expressed as a chi-square relative to the expected number per capita. The calculation clearly draws out attention to greater than expected car ownership in the larger rural areas in a clear commuter belt stretching south-eastwards from Leicester and the smaller inner city wards with less than expected car ownership. The chi square value automatically emphasises the greater significance of variation in zones with more cases.

Figure 4: A chi square map. Total numbers of cars relative to the numbers expected under the assumption of uniform distribution throughout the population. The large map shows the full range with low values dark and high values light. The smaller maps use the full range of grey shading to focus on the areas with fewer (top) and more (bottom) cars than expected given their population totals.
A common problem in any such mapping is where either the numerator or denominator in the ratio mapped is very small and this often occurs when the cases being mapped are rare or of very high spatial variability. The result is that the ratios become very unstable to small changes in these data. In such circumstances it is sensible to map quantities that are more robust to such changes. Instead of mapping the ratio of observed to expected counts, the probability of getting a count more extreme than that actually observed can be mapped. Such a map must be based on some probability model and that usually employed is the Poisson, as is appropriate for an assumption of a random distribution of the individuals. Details of the approach and is assumptions can be found in Bailey and Gatrell (1995, pages 300-302).
Another alternative where there are small numbers involved is to adjust the estimated ratios in each zone either away from, or towards an overall global value for the rate according to some prior measure of our confidence in these ratios using concepts from Bayesian statistics. For a very clear introduction to this method see either Marshall (1991) or Langford (1994).
A number of recent workers have suggested that in many cases it is appropriate to visualise using what have been termed local statistics. Typically, in using local statistics we attempt to learn more about each individual zone by relating it in some way to the values in its neighbours. Several local statistics have been suggested and their use illustrated. For example, Getis and Ord (1992; see also Ord and Getis, 1995) define a G-function which gives an index of spatial clustering of a set of observations over a defined neighbourhood as G_i (d) = å w_ij(d). x_j / å x_j.Here x is the regional variable and W(d) is a symmetric 0/1 matrix of weights with 1's for all the areas defined to be within distance, d, of the given area, i. All other elements are zero, including the link of i to itself. Computation of G(d) requires a suitable data structure from which to determine the W(d). The vector of values for each region, G(d), shows how locally anomalous the region is with increasing distance for the given variable, x . Each area has its associated G-function that can be mapped for given d, or plotted as a function of distance. A restriction on this statistic is that as defined it is only useful if the variable has a natural origin. It is thus inappropriate for the study of change variables or variables that have negative values. Second, Anselin (1995) has shown that global coefficients for measuring spatial dependence, or spatial autocorrelation, can be decomposed into local values. These include Moran's index of spatial autocorrelation, I, and Geary's contiguity ratio, C. The local form of Moran's I is the product of the zone value and the average in the surrounding zones, I_i (d) = z_i . å w_ij (d) . z_j and the local variant of Geary's contiguity ratio is C_i(d) = å w_ij (d) . ( z_i - z_j ) ². Almost any of the classical statistics can be calculated as a local value (for example the mean, standard deviation and correlation). This idea has recently been exploited by Fotheringham et al. (1996) who compute maps of how the estimated parameters of a regression model vary spatially over a fine grid of cells. Maps of these estimates provide additional information about the spatial stationarity of the phenomenon being mapped. Figure 5 shows a map of the values of each zone's local departure from the neighbouring values (expressed as a local z-score) again using the data for children aged 0-15.

Figure 5: A map of a local statistic. Maps show percentage of the total population aged 0-15. Variation from the local mean is calculated as a z-score. The main map shows local lows and local highs ranging from light to dark. The small maps show positive and negative outliers mapped with darkness to represent magnitude in the top and bottom maps respectively. Localities are defined by first order adjacency.

Graphics Multimedia Virtual Environments Visualisation Contents