Visualization of Statistics
Visualization Software, Complex Datasets and the Social Sciences — Brian Francis and John Pritchard
This paper examined the issues and complexities in using scientific visualisation systems for the graphical examination of complex social science data, specifically concentrating on individual work and life histories. Previous approaches to graphing such event history data have been unsatisfactory, providing graphs which are static and non-interactive, with difficulty in representing more than a few individuals in one display. A new approach (Francis and Fuller, 1996) is to use a scientific visualisation system such as AVS to display an event history as a multi-faceted pencil-like object in three-dimensional space, with changes of state within a variable being represented by changes in colour, shading and height on each of the faces of the pencil. Viewing a single object allows a detailed history which concentrates on the relationships between the changes of state in many variables to be examined, using the zooming, fly-through, selection and clipping provided by AVS. To view collections of individuals, we can extend the idea of Lexis diagrams (Keiding, 1990) into three dimensions, placing the pencils according to the date of the start event and age. This allows patterns in event histories to be examined. Dissemination of such work can be provided by writing VRML files which can be posted on the web. An example of the relationship between male unemployment and female participation in the labour force was demonstrated, showing that new insights can be obtained into such data by the use of these 'lexis pencils'.
Developing a Visualization Gateway to Census Data at MIDAS — Jackie Carter
The new post of Data Visualization Support Officer has recently been created at MIDAS (Manchester Information Datasets and Associated Services). This role has been created to allow the integration of software tools, developed under JISC (Joint Information Systems Committee) initiatives, to develop an intelligent, highly interactive on-line mapping visualization gateway to the 1991 Census and associated digitised boundary data held on MIDAS. One of these software tools is a cartographic data visualizer, cdv, which is the subject of this article. cdv has been developed to provide a toolkit for teaching and learning in the spatial sciences. It was written using the Tcl/Tk scripting language, from Sun Microsystems, which enables applications to be built rapidly.
cdv allows a user to produce different views of the same dataset. For example, a user may choose to look at a single attribute, such as cars per capita, for a county, and this can be displayed in a variety of ways, by means of a choropleth, a proportional circle map, or a cartogram. Furthermore, cdv can display all these views concurrently to allow a user to see the effects of using different cartographic representations to display data. Up to three variables at a time can be displayed using cdv. Colour symbolism is used effectively to allow the variables to be viewed concurrently. In addition, boxplots and scatter plots can be used to help a user determine the existence of relationships between the variables, and to allow classification to take place. cdv is highly interactive allowing the user to highlight any symbol in one map view and immediately see it in any other linked view. Such a technique enables a user to explore their data sets very rapidly, and investigate patterns and outliers that might, without visualization, be difficult to detect. Whilst cdv may not provide all the answers, it enables a user to ask questions of their dataset, and facilitates the provision of the production of 'soft maps' to allow exploratory spatial data analysis to be carried out.
Current work involves enabling users to access the cartographic data visualizer at MIDAS. Promoting the use of cdv in teaching and research is seen as an important task if visualization techniques in the use of census data are to be fully exploited.
The visualisation of area-based geographical data using SAGE - Steve Wise, Jingsheng Ma and Bob Haining
Much social science data is available for geographical areas, such as wards or counties and software for visualization and analysis of such data is potentially useful in a number of academic disciplines. However, existing visualization software, rooted in the needs of the physical and environmental sciences, do not contain many of the necessary tools for the visualization of such data. We have therefore developed a software package called SAGE (Spatial Analysis in a GIS Environment) which provides a range of tools for the analysis of area-based data. A number of the features of the software relate directly to the special nature of such data and of the analytical questions which are of interest. Firstly, the spatial element of the data is important since it is often of interest to identify spatial trends and outliers. Visualization can assist in this when a map can be linked to other statistical views of the data, such that values for areas of interest can be identified, or the location of distributional outliers shown. Secondly, the linking of different views of the same data has a more general utility in exploring relationships between variables, and in assessing the results of fitting statistical models. Thirdly, many variables can only be defined in relation to a set of areal units (the population density of a point is a meaningless concept) and hence there is a direct dependence between the units used and the results of analysis. This means there is a need for tools to assess the sensitivity of findings to changes in areal units, and to design purpose built areal frameworks. These are provided in SAGE by a set of regionalisation methods. (There may also be a direct analogy here with the analysis of temporal data, in which some variables are defined in terms of fixed temporal frameworks). Fourthly, spatial data often invalidates the assumptions of classical statistical methods, and requires special methods which can account for this. The software is based on existing, low cost packages wherever possible (public domain graphics code, and the ARC/INFO GIS which is widely available to the UK academic community). However, the writing of SAGE would have been greatly assisted by the availability of commercial graphical widgets and GUI building tools.
Adding colour to the 1997 General election — Graham Upton
The display of socio-political data, consisting of compositional variables aggregated at the constituency level, requires diagrams of types that are not available in standard graphics packages. This paper presents cartograms to show geographic variations, ternary diagrams to show inter-election change and an unwrapped torus to show the relation between vote, household tenure and social class.
Facilitating the use of Visualisation by Social Scientists — Robert Inder
Social Scientists make extensive use of statistical packages, for analysing data and presenting and justifying results. Differences between these packages mean users tend to become locked in to one or other of them, and thus into the set of statistical operations and display options it provides. This can impede collaboration and obstruct the exploiting published collections of data, and can limit the analyses that individuals consider. Most researchers can exploit new techniques only after they are available in their package of choice. We believe it is desirable, and feasible, to address this situation by using knowledge-based techniques to create a front-end system to allow users to "mix and match" between the facilities of the various packages. By combining meta-data and knowledge about the available software, the system would automatically sequence and parameterise library programs to convert data between formats, and use specific capabilities of the various packages to analyse the data, or create relevant presentations for users. The resulting system would provide a rich environment for supporting the use of the underlying tools.
These discussion points and the conclusions and recommendations which follow came out of 2 groups (see the agenda at the end of Section 1 of this report for details of the presentations within groups).
Social scientists have to deal with: spatial, temporal and spatio-temporal data. Techniques such as linked windows (spatial and statistical information as in cdv) work well for spatial data and those such as lexis pencils, animation and glyphs work for temporal data, but we need techniques for the display of spatio-temporal data. Studies relating to social networks changing through time, economic data, survey panel and migration data all could benefit by the use of such visualizations.
There are a range of emerging techniques which look of particular interest in using visualization tools in the social sciences. cdv presents the user with a range of tools. SPSS and other statistics and database packages offer other tools. These need to be investigated — many people do not know what they can do for them. A possible "statisticians workbench" was discussed which might have a common interface to a range of underlying statistical and other techniques and tools coming from different packages which the user does not need to know about. This was considered to be an attractive idea.
Issues which emerged were:
• how do we make the techniques and tools available to the "novice user", give people knowledge about what is available and persuade them visualization is worth considering?
• exchange issues are important in relation to input, output, metadata.
• colour fidelity and use of colour need to be considered (AGOCG have addressed this issue in some earlier work).
• we need guidelines for graphics (good/bad practice).
• social scientists have special needs: their data are often multivariate, temporal, qualitative, categorical and may also have discontinuities
• are there useful general displays for social science data? tenary, parallel plots, lexis pencils?
• should we be delivering the data or standard graphical representations? There is a lot of support for displays to be authored on demand.
• visual techniques have a lot of potential in validating data, increasing understanding, showing behaviour, scenario building and the communication of complexity.
• however, poor graphics will confuse/mislead and have a capacity for time wasting.
• the requirements include: PC solutions, training/dissemination, "open" systems/standards or interfaces, the integration of visualization and statistics packages, case studies.
Conclusions and Recommendations
Rhere is a need for:
• awareness and training programmes for both computer literate and those with little/no familiarity with IT to introduce graphics and visualization techniques.
• promotion of the benefits of visualization in gaining understanding of complex datasets.
• the development of techniques for temporal and spatio-temporal statistics. If possible, these should be generic.
• encouragement of the use of visual interfaces to data.
• the discovery or development of easy to use software.
There is a need for:
• a review of existing graphics in statistics and database packages as well as visualization packages.
• identification of good practice and the development of guidelines.
• raising the status of visualization in the social sciences.
• a case study booklet/WWW site.
• a course on the use of visual techniques in the social sciences .
• a trawl for similar/relevant work elsewhere (through SOSIG?)
Graphics Multimedia Virtual Environments Visualisation Contents