AGOCG logo
Graphics Multimedia VR Visualization Contents
Training Reports Workshops Briefings Index

Multimedia on the WWW


The Web and Multimedia are perhaps the two most common 'buzz words' of the moment. Although the Web can be reasonable easily defined and delimited, multimedia is much harder to pin down. A common definition is the use of two or more different media. This would make a video tape or television multimedia, which most people would agree they are not. What they lack is interactivity.

The World Wide Web was originally designed to allow physicists to share largely text-based information across the network. The first versions of HTML, the native markup language for documents on the Web, had little support for multimedia, in fact the original proposal said

'The project will not aim... to do research into fancy multimedia facilities such as sound and video'.
However, as multimedia became more readily available on computers, so the demand to make it accessible over the Web increased.

One of the main problems with multimedia delivery over the Web, or any network, is bandwidth. While most people would consider a single speed CD-ROM too slow for multimedia delivery, it can still deliver data about 40 times faster than a 28.8 modem, or about 9 times faster than an ISDN dual connection. The second problem is synchronization of various media, an issue which is now being addressed by the WWW consortium.


Text is often neglected when considering multimedia, but is a very important component, as most information is still conveyed as some form of text. The best way to present simple text over the Web is using HTML, the native language of the Web. It should be remembered that HTML is a structural markup language, i.e. the tags, such as Heading, Paragraph, define the structure of the document, not the style. How the HTML document appears to the reader will depend on how their browser interprets these tags.

Cascading Style Sheets

To give authors more control over how their documents appear, without losing device independence or adding new tags, Cascading Style Sheets (CSS) were developed. These allow attributes such as text colour, margins, font styles and sizes to be specified. For example, different fonts can be specified for headings and paragraphs. They also allow exact positioning of the content by specifying x and y coordinates, and supports a z-index, allowing items to overlap. Style sheets can be embedded within the document or linked as an external file.

Page Description Languages

Where the actual layout of a document is essential, it may be more practical to use a page description language such as Adobe's Portable Document Format (PDF). These are not really text formats, as they also store graphics, fonts and layout information.

Although not designed with the Web in mind, Adobe's PDF and similar products, such as Common Ground's Digital Paper (DP), have been adapted for Web publishing. For example, they can contain hyperlinks, linking not only within the document, but also external links using standard URLs. Support is also provided for 'page at a time' downloading over the Web and files can be viewed using integrated viewers for Netscape and Internet Explorer.


A survey of the most common file types delivered via the Web revealed GIF and animated GIFs were the most popular, with HTML files in second place and JPEG files in third. This shows how important images have become.

GIF stands for Graphic Interchange Format, and was developed by CompuServe to be a device-independent format. It can only store 8bits/pixel, i.e. 256 colours, and so does best on images with few colours. Although the compression technique used is lossless, it is less suitable for photo-realistic images where the loss of colour may result in visible degradation.

Animated GIFs are simply a series of GIF images stored within a single file and played back sequentially creating an animation sequence.

The PNG (Portable Network Graphics) format is a newer, lossless, format developed in the wake of patent problems with compression method used by GIF. It offers a number of advantages over GIF:

  • Alpha channels (variable transparency)
  • Gamma correction (cross-platform control of image brightness
  • Progressive display
  • Better compression
  • Support for true colour images

Although the specification for PNG is a W3C recommendation, it is still relatively uncommon to find PNG files on the Web. One reason for this is that the major browser manufacturers were slow to incorporate it into their products. Support, either direct or through plug-ins, is now available for most browsers.

JPEG (Joint Photographic Experts Group) is an open standard designed for compressing photo-realistic images and it supports up to 16 million colours. It employs an efficient, "lossy", compression method, resulting in much smaller file size than similar GIF images.


There are a large number of audio formats, but in all the file size (and quality) depend on:
  • Frequency
  • Bit depth
  • Number of channels (mono, stereo)
  • Lossiness of compression

The easiest way to reduce file size is to switch from stereo to mono. You immediately lose half the data, and for many audio files it will have only a small effect on perceived quality.

Bit depth is the amount of information stored for each point - equivalent to the bits/pixel in an image file.

Frequency is the number of times per second the sound was sampled - the higher the frequency, the better the quality. In practice the frequency must be set at one of the number of predetermined figures, most commonly 11KHz, 22KHz and 44KHz.

The most common sound formats found on the Web are WAV, a Microsoft format, and AU, primarily a UNIX based format. RealAudio files are also become more popular (for more details see the section on Streaming).

MIDI (Musical Instrument Digital Interface) files are different from the audio formats described above. MIDI is a communications standard developed for electronic musical instruments and computers. In some ways it is the sound equivalent of vector graphics. It is not digitized sound, but a series of commands which a MIDI playback device interprets to reproduce the sound, for example the pressing of a piano key. Like vector graphics MIDI files are very compact, however, how the sounds produced by the MIDI file depend on the playback device, and it may sound different from one machine to the next. MIDI files are only suitable for recording music; they cannot be used to store dialogue. They are also more difficult to edit and manipulate than digitized sound files, though if you have the necessary skills every detail can be manipulated.


When we refer to video, we usually mean a format that will contain both video and audio. Most standard video clips on the Web will be either AVI (developed by Microsoft), QuickTime (developed by Apple) or MPEG. AVI and QuickTime differ from MPEG in that they are 'wrappers', which may contain video encoded in a number of different ways, including MPEG. Although AVI was developed with PCs in mind, and QuickTime with Macs, players are available to allow both formats to be played on the other machine.

MPEG (Moving Picture Experts Group) is family of digital video compression standards. Currently there are two main MPEG standards, MPEG-1 and MPEG-2. MPEG-1 was optimized for delivery on CD-ROM at 1.15Mbit/s, and are usually much smaller than equivalent AVI or QuickTime files. MPEG-2 provides better quality, with a resolution up to 1280x720, 60 frames per second and multiple audio channels, but obviously at the cost of increased bandwidth. Typically it works at 4Mbit/s.

When producing video for the Web, the main consideration relating to bandwidth is "What resolution?" 'Full screen' (640x480) is not practical, and the most popular size is 160x120.


Until fairly recently to listen to an audio file or play a video over the Web, the whole file first had to be downloaded. This is fine for very short clips, but represents long delays when downloading longer clips. This changed with the release of RealAudio from Real Networks. Real Audio, and other similar products that have followed for both audio and video, allow streaming over the Internet. Streaming means that the audio or video file is played in realtime on the user's machine, without needing to store it as a local file first.

Although video can be streamed over a modem, audio files usually work better, as they are easier to compress and require less bandwidth. Over a 28.8 modem RealAudio can deliver stereo sound, and streamed video will deliver a small video window (160x120) with an update rate of around 3 or 4 frames/second.

Delivering streamed files usually requires a specially configured Web server, and this may entail upgrading server hardware. Products available which support streaming of various audio and video formats including MPEG, AVI and QuickTime, and some tools are available to stream from a standard Web server using the HTTP protocol.

Unlike most information sent over the Web, which uses the TCP transport protocol, streaming currently relies on the Real Time Transfer Protocol (RTP).

TCP is a reliable protocol, which will retransmit information to ensure it is received correctly. This can cause delays, making it unsuitable for audio and video. RTP (Real Time Transport Protocol) has been developed by the Internet Engineering Task Force as an alternative. RTP works alongside TCP to transport streaming data across networks and synchronize multiple streams. Unlike TCP, RTP works on the basis that it does not matter as much if there is an occasional loss of packets, as this can be compensated for. Bandwidth requirements can also be reduced through the support of multicast. With multicast, rather than sending out a separate packet to each user, a single packet is sent to a group receiver, reaching all recipients who want to receive it.

The Real Time Streaming Protocol (RTSP), originally developed by Real Networks and Netscape, is now being developed by the Internet Engineering Task Force (IETF). It builds on existing protocols such as RTP, TCP/IP and IP Multicast. While RTP is a transport protocol, RTSP is a control protocol, and will provide control mechanisms and address higher level issues, providing "VCR style" control functionality such as pause and fast forward.

Virtual Reality


The Virtual Reality Modeling Language (VRML, often pronounced 'vermal') was designed to allow 3D 'worlds' to be delivered over the World Wide Web (WWW). VRML files are analogous to HTML (hypertext markup language) files in that they are standard text files that are interpreted by browsers. Using a VRML browser the user can explore the VR world, zooming in and out, moving around and interacting with the virtual environment. This allows fairly complex 3D graphics to be transmitted across networks without the very high bandwidth that would be necessary if the files were transmitted as standard graphic files. VMRL 2.0 provides a much greater level of interactivity, with support audio and video clips within the world.

To produce simple worlds, a text editor and knowledge of the VRML specification is all that is required. However, as worlds become more complex, there are additional tools that can help. VRML modelers are 3-D drawing applications that can be used to create VRML worlds. Conversion programs are also available that take output from other packages and convert it to VRML.

Multi-user shared VR

There are an increasing number of multi-user shared VR worlds on the Web. In these, an avatar, e.g. a photo or cartoon, usually represents the user. You can move around the 3D world and chat to other users. Some may provide simple animations e.g. to show expressions or movement.

Panoramic Imaging

A limited VR is provided by a number of panoramic imaging formats, such as QuickTime VR and IBM's PanoramIX. QuickTime VR allows you to 'stitch' together a sequence of images into a 360-degree view, which the user can direct. Enhancements are likely to include stereo sound, animations and zoomable object movies.

Panoramic imaging and VRML are combined in RealSpace's RealVR browser. This supports a new node type, Vista, which is a scrollable dewarping background image. Scrollable 360-degree scenes are also support in a number of other VRML browsers.

HTML Developments

Although previous versions of HTML have allowed images to be included through the IMG element, they have not provided a general solution to including media. This has been addressed in HTML 4.0 using the OBJECT element. The OBJECT element allows HTML authors to specify everything required by an object for its presentation by a user agent: source code, initial values, and run-time data.

Style sheets will be fully supported in HTML 4.0, and may be designed to be applicable to particular media - e.g. printed version, screen reader. The browser will be responsible for applying the appropriate style sheets in a given circumstance.


Although HTML has been very successful, it is limited in what it can do. HTML is defined in SGML (Standard Generalised Markup Language), and it would be possible to use SGML to provide much greater functionality. However, SGML is quite complex, and contains many features that are not required. To bridge that gap, XML was designed. Extensible Markup Language (XML) is a restricted form of SGML, allowing new markup languages to be easily defined. This means documents could be encode much more precisely than with HTML. It also provides better support for hyper-linking features such as bi-directional and location independent links.

While additional functionality can be added using 'plug-ins' and Java, both approaches have limitations. Using 'plug-ins' locks data into proprietary data formats. Using Java requires a programmer, and content becomes embedded in specific programs. It is hoped that XML will provide an extensible, easy to use to solution allowing data to be more easily manipulated and exchanged over the Web. A couple of XML based approaches are already under development, SMIL and Dynamic HTML.

Synchronized Multimedia Integration Language (SMIL)

Where media synchronization is required on the Web, current solutions involve using a scripting language such as JavaScript or existing tools such as Macromedia Director. These present a number of problems in that they are not easy to use and usually produce high bandwidth content.

SMIL will allow sets of independent multimedia objects to be synchronized, using a simple language. It has been designed to be easy to author, with a simple text editor, making it accessible to anyone who can use HTML. According to Philip Hoschka of the W3C, SMIL will do for synchronized multimedia what HTML did for hypertext, and 90% of its power can be tapped using just two tags, "parallel" and "sequential". It will provide support for interactivity, allowing the user to move through the presentation, random access, and support for embedded hyperlinks.

Document Object Model

The Document Object Model (DOM) was designed to provide a standard model of how objects in an XML or HTML document are put together and to provide a standard interface for working with them. The HTML application of DOM builds on functionality provided by Netscape Navigator 3.0 and Internet Explorer 3.0. It exposes elements of HTML pages as objects, allowing them to be manipulated by scripts.

Both Microsoft and Netscape use a document object model to support Dynamic HTML (DHTML) in their current (version 4) browsers. Dynamic HTML is a term used to describe the combination of HTML, Style Sheets and scripts, such as JavaScripts, that allow documents to be animated and interactive without using external programs. It also allows exact position and layering of text and objects. Unfortunately, Microsoft and Netscape use different DOM. Microsoft's implementation is based on the W3C DOM. Both browsers provide support for Cascading Style Sheets (CSS1) and partial support for HTML 4.0.


New methods of support for multimedia delivery over the World Wide Web continue to be developed. Advances in streaming and compression technologies particularly look set to change our view of the Web. Despite the amount of platform dependent software available, cross platform applications are receiving support, not only from standards committees, but perhaps more importantly, major vendors such as Microsoft and Netscape. While Java provides a relatively secure development environment for those with programming skills, new developments such as XML and Dynamic HTML will ensure that multimedia functionality is accessible to the bulk of Web authors.

However, despite all the formats and protocols described, the Web continues to be biased towards text. Searching the Web with a standard search engine will only search for text. For the Web to be a truly hypermedia system standard audio and video indexing and searching facilities are required.

Graphics     Multimedia      Virtual Environments      Visualisation      Contents