Show and Tell
Problems and Solutions
Visualisation in the Social Sciences Workshop
Problems and Solutions in Visualisation in the Social Sciences
The workshop was split into four groups, who were asked to look at the problems and suggest solutions, in the following areas:
- Visualising networks
- Visualising statistical data
The first question raised was 'What do we mean by imagery?'. It became clear that this does not simply refer to graphics, but also the relationship between visual and other media, such as text and sounds.
Images on the Web present particular problems:
There seems to be a divide between qualitative and quantitative data, perhaps reflecting an 'art/science dichotomey' in the use of visualisation. Qualitative researchers need to be encouraged in their use of computers and visualisation.
- Consistency - will the viewer see the image in the same way as the author, or will colours, resolution etc., be changed due to the technological constraints
- Control - the author 'loses' control of the image, which affects copyright, IPR and ethical issues, since the image can be taken off the web and used in unforseen ways.
- Formats - are they suitable and are they used appropriately. Suitable formats must support transparency, as data is often overlayed on top of other data. There is a need to store metadata with images, for example author, how the visualisation was created and from what dataset. Work in this are is being carried out as part of the Electronic Libraries project, TASI and IMS.
Finally concern was raised about how the user interprets the visualisation or images, reflecting the need to asses the useability of such visualisations.
It was noted in the discussion that in scientific visualisation the author and viewer are the same person, with the visualisation being used as a tool for analysising the data. In the social sciences there seems to be a larger role for visualisations as a method of presenting data and results to an audience.
There are a number of different types of networks -
There appears to be a gap between the supply and demand for visualisation tools and techniques to display these networks. In some cases the tools are available, but in a different discipline, and there are often problems with interdisciplinary communicaiton, particularly between computer scientists and social scientists. Tools are available for showing and developing links within networks, but these are mostly text based.
- representations of space e.g., eculidian topology of the rail network
- non-euclidian such as global trade networks or representations of the Internet
- conceptual networks e.g., family trees or relationships
- artefacts - networks of symbolic meaning (see Show and Tell, Diego Jimenez)
- programming can also be visualised as a network - e.g., in Explorer, AVS and Khoros
Although mapping physical networks and cartogram transformations are relatively easy, visualising very large networks, e.g., journey to work data for Great Britain, is difficult.
Visualisation of conceptual networks, e.g., academic collaborators, is also difficult. The addition of a temporal aspect requiring dynamic visualisation of netwoks adds further difficulties.
When a visualisation has been created, we must be confident it is conveying the right message. We can learn from cartographers and graphic designers about the aesthetics of design to ensure that the visualisations are readable.
Visualising Statistical Data
A number of restrictions on the success of visualising data were highlighted:
The need to choose the right visualisation tools for the problem was shown with a simple example, height and weight data from every county in the US. The scatterplot below represents a data set with several thousand points, one point for each county.
With such as large dataset, a simple scatterplot is effectively meaningless, the main problems being:
- lack of interdisciplinary communication
- lack of understanding of the tools and techniques available
- lack of imagination - a restriction we place on ourselves
- understanding the problem in the first place
- the amount of data to be analysed - too much data can be hard to manage
- have we collected the right data?
As computer power has increased, and readily available tools such as Microsoft Excel have become able to handle large datasets, the danger of inappropriate tools being used has increased.
- most dots overlap, so what you see in the plot is mainly the outliers rather than the bulk of the data
- some dots may be more important than others - in this case some counties have much larger populations than others
- a minor, but important, part of the pattern can easily be lost
All of the above solutions can be easily printed. Other solutions include:
- Density kernel estimation - to produce contours which will show up clustering. The gets round the problem of plot density, and points can be weight to reflect their importance
- Divide the scatterplot into squares and count the number of points in each. Draw a square or circle within the grid whosize relates to this count. This conveys density information and again allows for points to be weighted
- Divide the graph into categories of weight and use a box and whisker plot of height (or visa versa) in each category.
For visualisation to be most successful, it is important that visualisation experts are be involved right at the start of the research, in a similar way to statisticians. It is important that the visualisation tool chosen does not confuse the issue.
- An interactive 'magnifying glass' which when moved over the plot shows more details
- Jittering - adding a small random displacement to each point, which allows many points in the same place to be visible
The group identified a number of problems, but concentrated on three in particular:
- Time and spatial units changing simultaneously
- Hypervariate data
- Representation of flows
Time and spatial units changing simultaneously
For example, in census data the census wards may change. A number of solutions were proposed:
It was suggested that it would be useful to look at both cartograms and grids, so that any anomalies in one method would be shown up.
- animation of the data
- though this can be difficult to interpret when the spatial units are changing
- the problem here is linking the cartogram back to the real world, as cartograms on their own can be difficult to interpret. One method of doing this is to link the cartogram to a more traditional euclidian representation of the space, or to animate the two in parallel.
- grid data
- reducing the spatial data to a standard form over time. To do this values within the grid must be estimated, and therefore the technique is only useful if the estimation is good / accurate.
This refers to data with more than three dimensions, i.e., not easily represented in normal space. One standard solution to representing many variables is a pie chart map, but this is often hard to read. Other solutions include:
- multiple maps
- sequential view of variables on a single map. The order of the variables may be changed, and the sequence run through to see if any patterns emerge
- reduce the dimensionality and link it to the map, for example a RadViz plot
Representation of Flows
There are a number of different types of flow, for example individuals, aggregate flows of people/materials, flows between points or areas and flows along routes, but flows are difficult to represent in traditional cartography. Solutions suggested include:
- using arrows to represent the flow on the map
- animate traditional maps
- animate the flow, giving a visual representation of the magnitutude of flow
- animate the changing state of the end point