Also available as an Acrobat File

Visualising transitions

Visualising trajectories

Discussion

Mapping the Life Course:
Visualising Migrations, Transitions & Trajectories

5. Discussion: Computerising Lifelines

So far, this essay has been concerned with visualisation but not with computer graphics. The remainder of the essay explores the pros and cons of generating lifeline diagrams through software, using the Steam Engine Makers' database as the central example but also drawing on discussion with researchers working with two large scale modern longitudinal datasets:

The Office of National Statistics' Longitudinal Study (LS) is based upon the census and vital event data (births, cancers, deaths) routinely collected for 1% of the population of England and Wales -- approximately 500,000 individuals at any one census point. The study contains data on LS members present at the 1971, 1981 and 1991 Censuses plus information on other individuals living in the same household at each census. Academic access to the Longitudinal Study is the responsibility of the Social Statistics Research Unit at City University, and we have been in contact with both the Unit and a number of their collaborators at other universities. For further information, see: http://ssru.city.ac.uk/Ls/lshomepage.html
The British Household Panel Study (BHPS) is following all members of a sample of households through repeated interviewing in a series of `Waves', now up to wave 6. The Wave 1 panel consists of some 5,500 households and 10,300 individuals drawn from 250 different areas of Great Britain, although successive waves each lose a few more members of the panel. It is the responsibility of the ESRC Research Centre on Micro-Social Change at Essex University and more information is available from: http://www.irc.essex.ac.uk/bhps

From these contacts, a number of general points emerged:

5.1 Life-course datasets are both large and complex.

While time geographers sometimes based their work on an individual's diary, the Steam Engine Makers' database already consists of nearly 200,000 records, while adding the full run of data to 1919 would bring it close to a million records. Modern datasets such as the LS and BHPS are even larger. Further, longitudinal datasets necessarily have a complex structure, consisting of a varying number of different types of event affecting an individual at irregular intervals over time, often linked with a more conventional set of attribute data (occupation, nationality, height, weight and so on). This often involves many separate tables within a relational framework. In such a context, visualisation is as much concerned with extracting data in a usable form as with creation of a graphical image per se. This often involves creating a more regular data structure which records an individual's location/status at fixed intervals; for example, within the SEM database many analyses and graphics are based on a derived dataset which records whether an individual was on unemployment, sickness, superannuation or no benefit on the first day of each month. The Micro-Social Change centre describe such a dataset as a `calendar'.

If much of the work involved in writing a lifeline diagram generator concerns creating a calendar, a task which will necessarily vary depending on the structure of the database, the development of such software may be uneconomic unless the structure of longitudinal databases can be standardised. Further, the computation involved in generating a lifeline diagram covering several thousand individuals from a relational database is such that it is hard to imagine such graphics being generated entirely interactively. If it could be done at all, it would arguably require some combination of a very large in-memory database and parallel processing.

5.2 Researchers using longitudinal datasets are attempting visualisation work, but lack specialised tools.

The Social Statistics Research Unit and the Micro-Social Change Centre contain large numbers of full time researchers but lack specialised graphics facilities. The level of interest in visualisation is therefore remarkable, but the tools being used are remarkably crude. As in other areas of the social sciences, Microsoft Excel provides a lingua franca, and can be persuaded to create a lifeline diagram but only once a calendar dataset (see above) has been generated, often by strictly manual methods. For example, Professor Peter Elias of the Institute for Employment Research at Warwick University has written software for graphical analysis of the National Child Development Study (sweep 5) event histories which is primarily a data management package which then interfaces to SPSS or Excel for plotting.

The most extreme example of unlikely software being pressed into service is Brendan Halpin of the Micro-Social Change centre's use of the EMACS text editor! A calendar data set is created consisting of a series of lines, one for each person, each containing a sequences of letter codes indicating the person's status at each point in time, EMACS is used to globally append escape codes changing the background colour depending on the letter that appears. The result is a coloured lifeline diagram similar to those in this essay -- and so long as you know the escape codes EMACS is much faster than Excel.

5.3 Lifeline diagrams on paper are generally overloaded; computers could help.

The published lifeline diagrams reproduced in this essay tend to be overloaded, containing a jumble of information in an attempt to provide a full record. For example, figure 5 attempts to show many different places of residence by styles of line (and the original does not even have colour), while the numbers in the left hand margin indicate occupations; figure 6 sorts individuals by cause of death, shown in the left-hand margin, and then by age, shown in the right; different lifeline diagrams organise time differently, figure 4 uses calendar years, figure 5 years from birth and figure 6 years prior to death.

While the interactive generation of lifeline diagrams covering large numbers of individuals may be impractical, an interactive tool for manipulating such diagrams, and associated information about individual characteristics, seems quite feasible. It should be able to move individual lifelines around within a viewer in two ways: Firstly, it should be able to sort the lifelines vertically by various criteria: by occupation and then by age, or by occupation and cause of death; NB this is not too hard to achieve within Excel. Secondly, it should be able to vary the basis for the time axis, and move lifelines horizontally to fit; for example, if figure 6 could be rearranged to use calendar years, it might be possible to identify epidemics.

5.4 Visualisation work may be designed to convince the `consumers' of research of some conclusion, or to enable the researcher better to know their dataset and form hypotheses; the two roles point to different types of tool.

Even with a few thousand lives in the Steam Engine Makers' database, any graphical presentation including all the individuals or even a substantial subset will overwhelm the consumer; this case study has tried to present comprehensible examples, but it would have been easy to include many `spaghetti' diagrams composed of endless intersecting and superimposed lifelines. In practice, users must be presented with a summary in which individual lives are aggregated in some way; arguably, this is best done using relatively conventional statistical methods, although visualisation methods may be relevant to presenting the resulting parameter estimates. One obvious example is the traditional hazard curve, expressing changes in the probability of some transition over time; another is demographic charts which summarise individual experience by comparing birth cohorts (see, for example, McKnight (forthcoming) and Anderson (1990)).

If this argument is accepted, the main users of lifeline diagrams and similar apparatus should be the researchers themselves. Here the need is to reveal complexity and, for example, the influence of exceptional cases, not to conceal them. One possibility, further explored on our web site, is a drill-down system. For example, figure 4 concerns just the Bolton members of the SEM, and might be revealed when a researcher clicked on the symbol marking Bolton on a map showing mobility rates in the different towns covered by the Bolton; the data for such a map exists and, for example, men first recorded in Bolton were markedly more mobile than Londoners. Finding that the average rate of mobility of Bolton members reflected a polarisation between men who never left and those who moved repeatedly, the researcher might wish to examine the detailed history of some of the latter. On our web site, clicking on the line within figure 4 marked by the third green arrow down takes users to the life history of James Beardpark, and to figure 2; a further click on Derby within figure 2 brings up a scanned image of Beardpark's death certificate. These samples were prepared manually, but a programme already exists which can create a textual life history for a specified member by repeatedly querying the database, and we aim to make this accessible over the web. The cost of such a system may be hard to justify for the SEM database, but would be a relatively small part of the cost of providing researchers with resources such as the British Household Panel Study.

5.5 The best-resourced developments may be taking place in the private sector.

Several people we met suggested that the high cost of creating visualisation systems which had to be tailored to work with specific databases meant that the most interesting work was likely to be going on in commercial organisations. For example, supermarket loyalty cards, which identify individual shoppers each time they pass through a store's checkouts, are leading to the assembly of vast longitudinal datasets, covering each item purchased by each shopper, with locations and dates, and linked to data on individual characteristics gathered when the customer joined the scheme and, via home addresses, to socio-economic profiles derived from the census and similar sources. What tools are being used to exploit this data?

Academic links to commercial research are much weaker here than in, say, molecular modeling. However, one interesting example was provided by the Institut für Verkehrswesen (Institute for Transport Studies) at the University of Karlsruhe, Germany. Their work on the German Mobility Panel involves specially written software for generating essentially lifeline diagrams from very short term data on individual movements, similar to that conceptualised in figure 3 (see Chlond and Lipps, 1997).

Graphics Multimedia Virtual Environments Visualisation Contents