Visualisation in the Social Sciences Workshop

Also available as an Acrobat File

Introduction

Visualization Environments

Show and Tell

Problems and Solutions

Pyramid Exercise

Participants

Visualisation in the Social Sciences Workshop

Problems and Solutions in Visualisation in the Social Sciences

The workshop was split into four groups, who were asked to look at the problems and suggest solutions, in the following areas:

Imagery
Visualising networks
Visualising statistical data
Maps

Imagery

The first question raised was 'What do we mean by imagery?'. It became clear that this does not simply refer to graphics, but also the relationship between visual and other media, such as text and sounds. Images on the Web present particular problems:

Consistency - will the viewer see the image in the same way as the author, or will colours, resolution etc., be changed due to the technological constraints
Control - the author 'loses' control of the image, which affects copyright, IPR and ethical issues, since the image can be taken off the web and used in unforseen ways.
Formats - are they suitable and are they used appropriately. Suitable formats must support transparency, as data is often overlayed on top of other data. There is a need to store metadata with images, for example author, how the visualisation was created and from what dataset. Work in this are is being carried out as part of the Electronic Libraries project, TASI and IMS.

There seems to be a divide between qualitative and quantitative data, perhaps reflecting an 'art/science dichotomey' in the use of visualisation. Qualitative researchers need to be encouraged in their use of computers and visualisation.

Finally concern was raised about how the user interprets the visualisation or images, reflecting the need to asses the useability of such visualisations.

It was noted in the discussion that in scientific visualisation the author and viewer are the same person, with the visualisation being used as a tool for analysising the data. In the social sciences there seems to be a larger role for visualisations as a method of presenting data and results to an audience.

Visualising Networks

There are a number of different types of networks -

representations of space e.g., eculidian topology of the rail network
non-euclidian such as global trade networks or representations of the Internet
conceptual networks e.g., family trees or relationships
artefacts - networks of symbolic meaning (see Show and Tell, Diego Jimenez)
programming can also be visualised as a network - e.g., in Explorer, AVS and Khoros

There appears to be a gap between the supply and demand for visualisation tools and techniques to display these networks. In some cases the tools are available, but in a different discipline, and there are often problems with interdisciplinary communicaiton, particularly between computer scientists and social scientists. Tools are available for showing and developing links within networks, but these are mostly text based.

Problem Areas

Although mapping physical networks and cartogram transformations are relatively easy, visualising very large networks, e.g., journey to work data for Great Britain, is difficult. Visualisation of conceptual networks, e.g., academic collaborators, is also difficult. The addition of a temporal aspect requiring dynamic visualisation of netwoks adds further difficulties.

When a visualisation has been created, we must be confident it is conveying the right message. We can learn from cartographers and graphic designers about the aesthetics of design to ensure that the visualisations are readable.

Visualising Statistical Data

A number of restrictions on the success of visualising data were highlighted:

lack of interdisciplinary communication
lack of understanding of the tools and techniques available
lack of imagination - a restriction we place on ourselves
understanding the problem in the first place
the amount of data to be analysed - too much data can be hard to manage
have we collected the right data?

The need to choose the right visualisation tools for the problem was shown with a simple example, height and weight data from every county in the US. The scatterplot below represents a data set with several thousand points, one point for each county. With such as large dataset, a simple scatterplot is effectively meaningless, the main problems being:

most dots overlap, so what you see in the plot is mainly the outliers rather than the bulk of the data
some dots may be more important than others - in this case some counties have much larger populations than others
a minor, but important, part of the pattern can easily be lost

As computer power has increased, and readily available tools such as Microsoft Excel have become able to handle large datasets, the danger of inappropriate tools being used has increased.

Solutions

Density kernel estimation - to produce contours which will show up clustering. The gets round the problem of plot density, and points can be weight to reflect their importance
Divide the scatterplot into squares and count the number of points in each. Draw a square or circle within the grid whosize relates to this count. This conveys density information and again allows for points to be weighted
Divide the graph into categories of weight and use a box and whisker plot of height (or visa versa) in each category.

All of the above solutions can be easily printed. Other solutions include:

An interactive 'magnifying glass' which when moved over the plot shows more details
Jittering - adding a small random displacement to each point, which allows many points in the same place to be visible

For visualisation to be most successful, it is important that visualisation experts are be involved right at the start of the research, in a similar way to statisticians. It is important that the visualisation tool chosen does not confuse the issue.

Maps

The group identified a number of problems, but concentrated on three in particular:

Time and spatial units changing simultaneously
Hypervariate data
Representation of flows

Time and spatial units changing simultaneously

For example, in census data the census wards may change. A number of solutions were proposed:

animation of the data: though this can be difficult to interpret when the spatial units are changing
cartogram: the problem here is linking the cartogram back to the real world, as cartograms on their own can be difficult to interpret. One method of doing this is to link the cartogram to a more traditional euclidian representation of the space, or to animate the two in parallel.
grid data: reducing the spatial data to a standard form over time. To do this values within the grid must be estimated, and therefore the technique is only useful if the estimation is good / accurate.

It was suggested that it would be useful to look at both cartograms and grids, so that any anomalies in one method would be shown up.

Hypervariate Data

This refers to data with more than three dimensions, i.e., not easily represented in normal space. One standard solution to representing many variables is a pie chart map, but this is often hard to read. Other solutions include:

multiple maps
sequential view of variables on a single map. The order of the variables may be changed, and the sequence run through to see if any patterns emerge
reduce the dimensionality and link it to the map, for example a RadViz plot

Representation of Flows

There are a number of different types of flow, for example individuals, aggregate flows of people/materials, flows between points or areas and flows along routes, but flows are difficult to represent in traditional cartography. Solutions suggested include:

using arrows to represent the flow on the map
animate traditional maps
animate the flow, giving a visual representation of the magnitutude of flow
animate the changing state of the end point

Graphics Multimedia Virtual Environments Visualisation Contents