AGOCG logo
Graphics Multimedia VR Visualization Contents
Training Reports Workshops Briefings Index
 

Introduction to the WWW

Introduction

The concept of 'hypertext' has been around a long time, and was first described by Vannevar Bush in 1945. A hypertext document allows the user to navigate through it in a non-linear or non-sequential fashion, by selecting parts of the text which are linked to other parts of the same document or other documents. Hypermedia is a hypertext system that is not restricted to text documents, but includes other media, effectively multimedia hypertext.

The WWW is a platform independent, distributed hypermedia system, allowing the user to access hypermedia documents stored on remote servers around the world using a range of different computing platforms.

History

The World Wide Web was developed by Tim Berners-Lee and Robert Cailliau of CERN Laboratories, Geneva, to allow particle physicists throughout Europe to share information. Initially mainly a text-based system, its cross platform capability and ease of use ensured its uptake into the wider community and its continuing development. It has grown from 50 servers in January 1993 to the millions that now exist throughout the world, and is continuing to grow rapidly.

URLs

The URL (Uniform Resource Locator) is used for specifying an object, such as a file, and how to access it. It is the 'address' of the file. It consists of two main parts, and takes the form: access-method://address
where the access method is the protocol for retrieving the file. These include HTTP (Hypertext Transport Protocol), used for HTML files, and FTP (File Transfer Protocol).

HTML

HTML (Hypertext Markup Language) is a simple markup language recognised by all WWW browsers. A standard HTML document is still the most effective way to widely disseminate information on the WWW. An HTML document consists of the normal text of the document, and markup tags that define elements of the document, such as title and paragraph.
<h1>Advisory Group on Computer Graphics</h1>
Advising UK Higher Education on Computer Graphics, 
Visualization, Multimedia and Virtual Environments.

<P>The Advisory Group on Computer Graphics (AGOCG) 
is an initiative of the 
<a href=http://www.jisc.ac.uk/> JISC</a>) 
of the Higher Education Funding Councils and 
the Research Councils.</P>
The code fragment above shows a number of tags, including a heading <H1> and paragraph <p>. All tags are enclosed within angle brackets, and most have a start and end tag, e.g., <P></P> The other tag shown in this fragment (<A HREF... </a>) creates a link to another page.

Since web pages are just text files, they can be created with even the most basic text editor. However, a number of programs have been written to help with markup and automatically convert between other formats and other HTML

HTML Editors

The simplest are text-based editors, such as HTML Notepad for the PC, which is an extension of Windows Notepad. Text is created or imported as in a standard text editor, then marked-up using the additional pull down menus. The display shows the text and the markup tags and a separate browser is necessary to view the finished HTML document. Increasingly however, WYSIWYG HTML editors are becoming available, such as Microsoft's Front Page and Adobe's Pagemill. Front Page also provides extensions allowing greater functionality to be incorporated into the web page, but only if it is supported by the web server. A number of other packages, e.g., Word 97, now allow you to save your documents directly as HTML, providing an easy way to produces Web pages.

Conversion programs

There are many programs available to convert existing documents from various formats to HTML, including Word, Word Perfect, RTF (Rich Text Format) and Framemaker. The success with which documents can be converted often depends on how they were written originally. For example, a word processor document in which headings were created using the heading styles feature is more likely to convert correctly than one in which the headings were generated by manually increasing the font size. The more 'structural' information of this nature in the original document, the better the conversion will be. Having a structured HTML document makes it easier to maintain and more accessible to all users.

It is always worth checking that your pages, however they are produced, display correctly in a range of browsers and contain only valid HTML. For a list of HTML editors and other tools, see the SIMA Report 'Software Tools for the World-Wide Web'.

Clients and Servers

Introduction

The WWW is based on a client server architecture. The client, often called a browser, is the software that runs on the local machine allowing the user to view documents. The server is the software that delivers the information to the client. 'Server' is also used to mean the actual machine the server software rums on.

Text browsers

Text-based browsers provide WWW access when a graphical interface is not available, or impractical, for example for visually disabled users or on PDAs (Personal Digital Assistant) with small screen sizes. However, they lack many of the functions of graphical browsers, and increasingly Web designers are making use of graphics as an integral part of their pages. One of the best known text-based browsers is Lynx, developed at the University of Kansas. Versions for Unix systems and DOS are available.

Graphical Browsers

These comprise the bulk of the clients currently in use. They can display not only text, but also handle images, and other file formats. The most popular browsers at the current time are Netscape and Internet Explorer, accounting for over 90% of the browsers in use, both of which are available on a range of platforms.

The WWW has evolved a great deal, and is continuing to develop. The latest browser versions provide many new features, including greater multimedia support and style sheets (for more details see the AGOCG briefing report 'Multimedia on the WWW'). However, it should be remembered that many users will still be using older browsers, and any HTML files should also be tested with these to make sure they are accessible to a wide audience.

Helper Applications and Plugins

Helper applications are external programs that display file types that the browser cannot handle. Browsers can be configured to automatically launch such applications when particular file types are encountered. Plugins are similar applications, but work within the browser, displaying the file within the browser window.

Increasingly browsers are supporting a much wider range of file types internally, and the need for helper applications for common file types is decreasing. For example, Microsoft's Internet Explorer will support several image and sound formats, MPEG, AVI and Quicktime movie files and VRML (Virtual Reality Markup Language). Modern browsers also support Java, a platform-independent programming language.

Accessing Information

The WWW continues to grow rapidly, and so, as the amount of information available increases, finding it becomes harder, as there is no central catalogue or cataloguing system. There are a number of general directories, such as Yahoo (http://www.yahoo.co.uk/) which provide page listings by topic, but perhaps more useful are the search engines.

Search engines

Search engines are databases indexing the contents of large numbers of web pages, which are accessed over the WWW using a query form. The usefulness of the engine depends on how many pages it has indexed, what information it has indexed from those pages and how intelligent a query you can submit, e.g. does it support 'NEAR', 'AND', 'OR' and 'NOT'. Some of the most widely used search engines include Lycos (http://www.lycos.com/) and AltaVista (http://www.altavista.digital.com/), but it is always worth trying more than one, as their databases differ. Meta search engines, such as Metacrawler (http://www.metacrawler.com) allow you to search more than one engine with a single query. The query is submitted to the meta engine, which automatically passes it on to several search engines and collates the results.

Domain Names

Domain names are unique addresses on the Internet. Usually a company or organisation will have its own domain name, e.g. mcc.ac.uk (University of Manchester) and addresses of specific machines within an organisation will end in the domain name. It is often possible to guess the correct URL given a limited amount of information. The first part of the domain name will relate to the company or organization name, and the last part refers to the type of company/organization. For example, .ac.uk refers to a UK HE site, .co.uk a UK commercial site. For a list of UK sub-domains see: http://www.nic.uk/domains/INDEX.HTMl

Providing Information

Home Pages

Many departments now have their own web servers, and the departmental home page should name the web master, who you will need to contact to set up your own pages on that server. Once you have setup your own home page there are a number of points to bear in mind:
  • Follow any local codes of practice (see below).
  • The WWW is not static, users expect information to be up to date. Consider updating your pages on a regular basis to include new information and links.
  • Periodically check that external links are still valid.
  • To ensure a wide audience, ask authors of related web sites to include a link to your pages, announce it in relevant newsgroups and submit the URL to some various search engines.

Good practice for the WWW

The following list is based on the SIMA report by Margaret Isaacs - Guide to good practices for WWW authors (see Bibliography).
  • For maximum portability and usability, use standard HTML.
  • Validate your HTML markup. Programs are available which will check your HTML file to ensure it is correct and conforms to the standard.
  • Make sure your documents are usable on a range of browsers.
  • Give a consistent appearance to a collection of documents.
  • Use a template as a starting point for composing documents.
  • Provide navigational aids at all times.
  • Always give a concise, meaningful title in the HEAD part of the document.
  • Limit pages to a manageable size.
  • Don't use 'Click here' for links.
  • Make regular checks of the links in your documents.
  • Minimise the size of graphics files and do not include large graphics files as inline images. Large files take much longer to download, slowing the network and irritating the viewer.
  • Always provide an alternative to images, not everyone will be using graphical browsers.
  • Provide a text alternative to hyperlinked graphics.

Codes of Practice

It is important to remember that intellectual property rights, defamation and data protection legislation all apply to the WWW as well as other forms of media. On top of this, most institutions will have their own codes of practice laying out acceptable practice, which you should read before setting up any web pages. Institutions are also bound by the JANET acceptable use policy, which governs the use of the links between the institutions. Usually, such codes of practice prohibit inclusion of (or direct links to):
  • official logos or similar material on personal, unofficial pages
  • sexist, racist, pornographic or other similarly offensive material
  • material to which a third party holds an intellectual property right, without the express written permission of the rightholder.
  • defamatory material
  • personal data about third parties, without their permission or registration with the Data Protection Registrar
For a sample code of practice and a full discussion of the legal aspects of the WWW, see Andrew Charlesworth's SIMA report, A Comprehensive Survey of the Legal Issues Relating to the Development and use of WWW Technology at Educational Sites.

Graphics     Multimedia      Virtual Environments      Visualisation      Contents