AGOCG logo
Graphics Multimedia VR Visualization Contents
Training Reports Workshops Briefings Index
This report is also available as an Acrobat file.
Back Next Contents

Approaches to Wide Area Indexing

Martijn Koster

There are a number of approaches which can be taken to indexing. These include:

Manual Indexing

Manual indexing includes both personal hotlists and public hotlists. Both have a number of major problems. Personal hotlists tend not to be up to date or to be comprehensive. They also tend to collect references which have become outdated. Public hotlists have a high signal to noise ratio and permission needs to be sought to update or remove information.

Robot Assisted Indexing

An example of this is Lycos. On the positive side, these tools provide automatic indexing. However, they do have problems. They tend to overload the network and/or the host. They can also give the wrong impression regarding the resources to the person searching for information. They can also provide too much information. The indexing is centralised.

Manual Distributed Indexing

An example of this is ALIWEB which is described as:
"ALIWEB is a system that automatically combines distributed WWW server descriptions into a single searchable database. ALIWEB basically does for the WWW what veronica does for gopher or Archie does for anonymous FTP. Because the original server descriptions are maintained by server administrators, the information is likely to be correct and up-to-date. It also uses a special format that makes the results look very concise."
Aliweb is a public service run by NEXOR. See http://web.nexor.co.uk/public/aliweb/aliweb.html

Aliweb has a number of advantages. It is simple and cheap. It has high quality summarising. There are likely to be fewer stale references. It does still need manual effort though and uses centralised or mirrored searching

Automated Distributed Indexing

An example of this is Harvest. see http://harvest.cs.colorado.edu/

Harvest is an integrated set of tools to gather, extract, organise, search, cache and replicate information across the internet. It is therefore designed to help users find information as well as helping in its management.

Harvest has a number of advantages, not the least of which is that it is available. It is automatic, extensible, scalable. On the negative side it is complex. It has the potential to offer a general search interface, automated summarising and distributed searching.

In Summary

We need to have indexing tools which are automated. The solutions need to be distributed ones. We need to adjust peoples' expectancies so that they understand the reality of the problems and available solutions. Harvest may offer a solution. We also need to accompany the use of tools with high level manual resources which complement what we can achieve automatically.
Back Next Contents