Using Audio in Multimedia
IntroductionWhen considering multimedia applications audio is often neglected. Traditionally computers have relied on visual interfaces, and audio facilities were very limited. Now, however, most personal computers will have sound cards and speakers, and the hardware to upgrade those that do not is relatively cheap.
Audio can be used to enhance multimedia applications in a number of ways, for example in delivering lectures over the web, music used to add interest and emotion to a presentation, and other non-speech audio used as part of a general interface.
This introductory paper will look at some of the reasons for using audio, provide an overview of digital audio file formats and look at some novel audio interfaces.
One of the main uses of audio in a networked environment is in videoconferencing applications. Videoconferencing is beyond the scope of this paper, but for more information see the AGOCG briefing paper: 'Introduction to Video Conferencing'
Why Use AudioPerhaps the most obvious advantage of using audio is that it can provide an interface for visually disabled users, however using audio offers a number of other advantages for all users:
File FormatsThere are a large number of audio formats, but in all the file size (and quality) depend on:
Bit depth, or sample size, is the amount of information stored for each point - equivalent to the bits/pixel in an image file. This is usually 8 or 16 bits.
Frequency is the number of times per second the sound was sampled - the higher the frequency, the better the quality. In practice the frequency is usually set at one of a number of predetermined figures, most commonly 11KHz, 22KHz and 44KHz. 22kHz is very common in computer sound file formats, 44kHz is the standard for audio compact discs
The total size of a mono, uncompressed sound file will be the sample rate * bit depth * duration. Stereo sound will be twice this. For example, a CD quality sound file will be 16 bit, 44KHz, and uncompressed will be about 10.5Mb per minute.
The most common sound formats found on the Web are WAV, a Microsoft format, and AU, primarily a UNIX based format, AIFF (Audio Interchange File Format) mainly used on Mac and SGIs, and streamed formats such as RealAudio (.ra).
Recently MP3 files have become more popular, particularly for storing CD quality audio. MP3 refers to the MPEG (Motion Picture Expert Group) layer 3 audio encoding scheme, which is defined within both the MPEG-1 and MPEG-2 standards. The audio encoding scheme in MPEG-2 only differs from that in MPEG-1 in that it was extended to support very low bitrate applications.
MP3 can provide about 12:1 compression from an 44kHz 16-bit stereo WAV file without noticeable degradation of sound quality, much higher compression rates can be obtained, but at a cost of poorer sound quality. However, it is reasonably CPU intensive, encoding much more so than decoding. MP3 playback is not recommended on machines slower than a Pentium or equivalent.
MIDI (Musical Instrument Digital Interface) files are different from the audio formats described above. MIDI is a communications standard developed for electronic musical instruments and computers. In some ways it is the sound equivalent of vector graphics. It is not digitized sound, but a series of commands which a MIDI playback device interprets to reproduce the sound, for example the pressing of a piano key. Like vector graphics MIDI files are very compact, however, how the sounds produced by the MIDI file depend on the playback device, and it may sound different from one machine to the next. MIDI files are only suitable for recording music; they cannot be used to store dialogue. They are also more difficult to edit and manipulate than digitized sound files, though if you have the necessary skills every detail can be manipulated.
StreamingUntil relatively recently to listen to an audio file or play a video over the Web, the whole file first had to be downloaded. This changed with the release of Real Audio from Progressive Networks. Real Audio, and other similar products that have followed for both audio and video, allow streaming over the Internet. Streaming means that the audio or video file is played in real-time on the user's machine, without the need to store it as a local file first.
To play a RealMedia file, a link is included in the HTML document to a metafile, which contains the location of the media file, which is held on a RealServer. When the link is selected, the RealMedia player is invoked on the client, and the player begins to stream the media file. Generally the web browser plug-ins to play the streamed media files are freely available, but the server to deliver the files must be purchased.
There are now many products available which support streaming of various audio and video formats including MPEG, AVI and QuickTime, including Real Media (www.realmedia.com), Microsoft's Media Player (www.microsoft.com/Windows/MediaPlayer) and Xing's Streamworks (www.xingtech.com).
For an example of streaming, and more information about streaming see the presentation ' Streaming Multimedia on the Web' by Les Howles of the Department of Learning Technology and Distance Education, University of Wisconsin. (http://www.wisc.edu/learntech/HTMLStreamPres/StreamPres.html)
VRMLThe Virtual Reality Modelling Language (VRML, often pronounced 'vermal') was designed to allow 3D 'worlds' to be delivered over the World Wide Web (WWW). Although it is usually thought of in the context of graphics only, VRML 97 supports the inclusion of spatialised, 3D audio, giving the listener a sense of the location of a virtual sound source in a virtual listening space.
A VRML file consists of a collection of objects, called nodes, containing parameters or fields which modify the node. Audio is supported through the use of several nodes:
Audio InterfacesThere are a number of scenarios in which an audio interface or combined audio/visual interface may be more useful than a standard visual only interface. For example, in the increasingly popular Personal Digital Assistants (PDAs). These are no longer restricted to simple address books and electronic diaries, but can now also act as Internet terminals, word processors, etc. The main problem with PDAs is their very small screen size, where an audio interface may be more useful than a traditional GUI (graphical user interface).
Aural Style SheetsAural Cascading Style Sheets are currently being investigated by the WWW Consortium. These are being designed to make WWW documents more accessible to visually impaired users. This group of users will include not only the blind and partially sighted, but anyone for whom visual presentation is not appropriate, where eyes are engaged in another task, e.g., driving. Properties proposed in the aural CSS include:
Wearable Audio ComputingThe need for a "hands-and-eyes free" interface was recognized in the development of Nomadic Radio, a distributed computing platform designed to be worn round the neck giving access to a variety of functions through an auditory interface. This uses various auditory cues and speech input/output. It makes use of the "Cocktail Party Effect", which means humans can listen to several audio streams simultaneously, and selectively focus on the one that is of interest, and tune the rest into the background. This allows the users to be aware of messages or events without the interface requiring their full attention.
SummaryHardware and software support for audio has improved greatly in the last few years on most platforms, allowing audio to be used much more widely. Delivery of audio over networks has benefited from the development of streaming technologies, allowing existing audio files to be delivered in real time to many users, even those using slow connections such as modems. 3D-spatialised audio is now supported in a number of ways, including VRML and vendor specific implementations such as Microsoft's DirectX.
The use of non-speech audio to provide new interfaces looks set to increase, providing additional methods of accessing existing applications, and through the development of innovative products such as 'wearable' computers.
Careful use of audio can add significantly to the ease of use, effectiveness and appeal of many applications. However, poor use of audio can detract, making interfaces harder to use and causing cognitive problems.
How to select the Appropriate Media. Jim Martin. Martin Information Services, Inc., April, 1996. http://ettu618.edu.polyu.edu.hk/Umbrella/Marticles/Articles/Article8.html
Technology in Music Education: A Demonstration of Integrating Multimedia into Web Pages for Music Education. Steven G. Estrella Temple University http://fred.music.temple.edu/multimedia/outline.html
The Role of Non-speech Audio and its Applicability in Multi-Sensory Systems - A Review. Meera S Datta F 13 Conference 98 NIIT Ltd. http://126.96.36.199/conferences/F1398/Meera.html
Multimedia online: notes from the field - audio, video, animation. A step in the right direction. the NODE - learning technologies network. June 98 http://node.on.ca/tfl/fieldnotes/mclarke.html
Integrating Synchronous and Asynchronous Teaching Technologies. Robin Mason. Institute of Educational Technology, Open University. OTD REPORT NO 11 http://www-iet.open.ac.uk/iet/otd/otd11.html
Auditory Cues for Browsing, Surfing, and Navigating the WWW: The Audible Web Michael C. Albers Sun Microsystems, Inc. http://java.sun.com/people/mca/papers/ICAD96/ICAD96_AW.html
The MIT Wearable Computing Web Page http://lcs.www.media.mit.edu/projects/wearables/INDEX.HTMl
MPEG Audio Layer 3 - Information File. Fraunhofer-Gesellschaft http://www.iis.fhg.de/amm/techinf/layer3/INDEX.HTMl
Graphics Multimedia Virtual Environments Visualisation Contents