AGOCG logo
Graphics Multimedia VR Visualization Contents
Training Reports Workshops Briefings Index
This report is also available as an Acrobat file.
Back Next Contents

Appendix 1: List of relevant standards and definitions


This is the ITU (formally the CCITT) standard covering the technical requirements for narrow-band visual telephone services. Video processing and transmission is now governed by a group of three main standards (H.261, H.221 and H.242) which fall under the umbrella standard H.320. It defines the types of terminal and the transmission mode that can be used. H.320 stipulates that a terminal must be capable of supporting video communications with both 64kbit/s B channels in parallel or with each other independently. The 3 'satellite' standards relate to the coding, decoding, control and synchronisation of the video elements of a multimedia transmission, with three other standards defining the audio path, G.728, G.722 and G.711. It is anticipated that Recommendation H.320 will be extended to a number of Recommendations each of which would cover a single videoconferencing or videophone service (narrow-band, Broadband, etc.). However it is felt that because all of the standards would have so much duplication of wording, and as the points of divergence have not been settled, for the time being it is convenient to treat all the text in a single Recommendation.

The service requirements for visual telephone services are presented in Recommendation H.200/AV.120-Series; video and audio coding systems and other technical set aspects common to audiovisual services are covered in other Recommendations in the H.200/AV.200- Series.


This is the ITU recommendation for audiovisual Services at p x 64 kbit/s. It describes the video coding and decoding methods for the moving picture component of audiovisual services at the rate of n x 64 kbit/s, where n is in the range 1 to 30. This is for two way live transmissions such as video conferences and videotelephony. For video telephony, a bit rate of 384kbits/s has become the norm. The standard defines the compression algorithms that are to be used, the two picture formats that can be used as a source for the video encoder, and the ways in which the output from the encoder is formatted. The picture input format required by H.261 is known as the Common Intermediate Format (CIF).

This standard is intended for carrying video over ISDN - in particular for face-to- face videophone applications and for videoconferencing. Videotelephony is less demanding of image quality, and can be achieved for n=1 or 2. For videoconferencing applications (where there are more than one person in the field of view) higher picture quality is required and n must be at least 6.

There are actually two picture formats defined within H.261; CIF (Common Intermediate Format) has 288 lines by 360 pixels/line of luminance information and 144 x 180 of chrominance information; and QCIF (Quarter Common Intermediate Format) which is 144 lines by 180 pixels/line of luminance and 72 x 90 of chrominance. The choice of CIF or QCIF depends on available channel capacity - eg QCIF is normally used if n<3.

The actual encoding algorithm is similar to (but incompatible with) that of MPEG. Another difference is that H.261 needs substantially less CPU power for real-time encoding than MPEG. The algorithm includes a mechanism which optimises bandwidth usage by trading picture quality against motion, so that a quickly-changing picture will have a lower quality than a relatively static picture. H.261 used in this way is thus a constant-bit-rate encoding rather than a constant-quality, variable- bit-rate encoding.


The methods by which synchronisation of audio and video are achieved are specified in H.221. The standard defines how a transmission should be subdivided into subchannels so defining the order in which packets of encoded H.261 video are interleaved or multiplexed with the audio. It offers several advantages: Closely related to H.261 & H.242 and in fact supersedes H.220 Products complying to the standard include codecs from BT, GPT, PictureTel and VideoTel.


This specifies the protocols that are to be used to establish how communications between two terminals with differing capabilities should signal the facts to each other so that they can negotiate the highest level of performance. A number of applications utilising narrow (3 kHz) and wideband (7 kHz) speech together with video and/or data have been identified, including high quality telephony, audio and videoconferencing (with or without various kinds of telematic aids), audiographic conferencing, with other applications emerging all the time

To provide these services, a scheme is recommended in which a channel accommodates speech, and optionally video and/or data at several rates, in a number of different modes. Signalling procedures are required to establish a compatible mode upon call set-up, to switch between modes during a call and to allow for call transfer.

All audio and audiovisual terminals using G.722 audio coding and/or G.711 speech coding or other standardised audio codings at lower bit rates should be compatible to permit connection between any two terminals. This implied that a common mode of operation has to be established for the call. The initial mode might be the only one used during a call or, alternatively, switching to another mode can occur as needed depending on the capabilities of the terminals. Products complying to this recommendation include codecs from BT, GPT, PictureTel, VideoTel & others.


ITU standard for Pulse Code Modulation (PCM) of voice frequencies


ITU standard for audio encoding.


ITU standard for Audiographic, Videotelephony and Videoconference service standards. The individual recommendations are as follows:


Audiographic Conference Teleservice for ISDN


Videotelephony Services General


Videotelephony Teleservices for ISDN


Videotelephony Services General


Videoconference Service General


Broadband Videoconference Services


Audiovisual Interactive Services (AVIS)


This specifies how the results of digitising video signals should be presented to the encoder. This is needed because of the different scan frequencies and framing rates used by the TV standards (525 lines at 60 fps for NTSC and 625 lines at 50 fps for PAL). The TV frame is converted from the linear raster scan of the input TV signal to a two dimensional array of picture elements, The result is a compromise between the NTSC and PAL standards. The standard also defines a lower resolution format, quarter CIF, which has half the number of pixels per line and lines per frame. The use of Q-CIF in a video encoder is now mandatory according to H.261, while full CIF is optional.


This is the compression Standard for continuous-tone still images. JPEG stands for Joint Photographic Experts Group, the original name of the committee that wrote the standard. The standard is designed for compressing either full-colour (24 bit) or gray-scale digital images of "natural" (real-world) scenes. JPEG does not handle black-and-white (one bit/pixel) images, nor does it handle motion picture compression.

JPEG is "lossy", meaning that the image you get out of decompression isn't quite identical to what you originally put in. The algorithm achieves much of its compression by exploiting known limitations of the human eye, notably the fact that small colour details aren't perceived as well as small details of light- and-dark. Thus, JPEG is intended for compressing images that will be looked at by humans. If you plan to machine-analyse your images, the small errors introduced by JPEG may well be a problem for you, even if they are invisible to the eye.


The Moving Pictures Experts Group (MPEG) meets under the auspices of the International Standards Organisation (ISO) to generate standards for digital video and audio compression. In particular, they define a compressed bit stream which implicitly defines a decompressor. The MPEG group usually meet at the same time as the JPEG, JBIG and MHEG groups. However, they are different sets of people with few or no common individual members and they have different charters and requirements. JPEG, as already described, is for still image compression, JBIG for binary image compression (e.g. faxes) and MHEG is for multimedia data standards (such as integrating stills, video, audio, text, etc.)

MPEG 1 has been devised and is in three parts: video, audio and systems, where the last part provides for the integration of the audio and video streams with the proper time stamping to allow synchronisation of the two. The MPEG 1 standard is available as ISO CD 11172. MPEG has the potential for higher compression rates than H.261 (because of support for frame interpolation between frames, not just extrapolation) and it will also provide a standard for higher resolution images.

Supporters of the new Video on CD MPEG 1 standard include Philips (co-developer of the original audio CD); JVC (developer of the VHS video-tape standard); PC maker Commodore International; and Korean consumer electronics giants Samsung and Goldstar.

MPEG 2 is similar to MPEG 1, but includes extensions to cover a wider range of applications. It is designed to handle applications (e.g. video on CD) that need higher-quality playback. MPEG 2 supports CCIR 601 resolutions, but it requires transmission rates above 4 Mbps, as opposed to the 1 to 3 Mbps that MPEG 1 requires. It also requires a lot more processing time to encode the source video signal.

The primary application targeted during the MPEG 2 definition process was the all- digital transmission of broadcast TV quality video at coded bit-rates between 4 and 9 Mbit/sec. However, the MPEG-2 syntax has been found to be efficient for other applications such as those at higher bit rates and sample rages (e.g. HDTV). The most significant enhancement over MPEG-1 is the addition of syntax for efficient coding of interlaced video (e.g. 16x8 block size motion compensation, Dual Prime, etc.).

The MPEG 2 Audio Standard supports low bit-rate coding of multichannel audio, supplying up to five full bandwidth channels (left, right, centre, and two surround channels), plus an additional low frequency enhancement channel, and/or up to seven commentary/multilingual channels. It extends the stereo and mono coding of the MPEG-1 Audio Standard to half sampling-rates (16 kHz, 22.05 kHz, and 24 kHz), for improved quality for bit- rates at or below 64 Kbits/s per channel. The MPEG 2 Systems Standard specifies coding formats for multiplexing audio, video and other data into a form suitable for transmission or storage.

MPEG 3 targeted HDTV applications with sampling dimensions up to 1920 x 1080 x 3 Hz and coded bit-rates between 20 and 40 Mbit/sec. It was later discovered that with some (compatible) fine tuning, MPEG 2 and MPEG 1 syntax worked very well for HDTV rate video.

MPEG 4 Work on a new initiative for very low bit-rate coding of audio-visual programs has been approved by unanimous ballot of all national bodies of ISO/IEC JTC1. This work began in September 1993 and is now in the application identification phase. It is scheduled to result in a draft specification in 1997. When completed, the MPEG-4 standard will enable a whole spectrum of new applications, including interactive mobile multimedia communications, videophone, mobile audio-visual communication, multimedia electronic mail, remote sensing, electronic newspapers, interactive multimedia databases, multimedia videotex, games, interactive computer imagery, sign language captioning. Since the primary target for these applications is a bit-rate of up to 64 kbit/s at good quality, it is anticipated that new coding techniques allowing higher compression than traditional techniques may be necessary.


This is the ITU standard for the exchange of multimedia messages by store-and- forward transfer. The aim of the X.400 standards is to provide an international service for the exchange of electronic messages without restriction on the types of encoded information conveyed. Work on X.400 began in 1980 within ITU and resulted in the publication of the 1984 Recommendations, which still forms the basis of many of the products available today. Since then ITU formed a collaborative partnership with ISO for the further development of the technology and published technically aligned text in 1988 (1990 in ISO) for the first major revision of X.400.

Message handling technology is complex; as well as the sheer technical difficulties involved, as a global service it has had to take account of political, commercial, legal, and historical realities. Some issues which are dependent on national telecommunications regulation are not covered by the International Standards and are addressed by national standards.

The relatively poor penetration of X.400 messaging has been caused by a variety of factors. The heavy investment in developing 1984 products has lead to considerable resistance to change, regardless that global interconnectivity is severely constrained in 1984 products, and that 1984-1988 interworking degrades the quality of service offered. Paradoxically it is the attempt to recoup the investment in 1984 products which is impeding the introduction of 1988 products that are essential for a highly functional global messaging service.

IP Multicast

IP multicasting is the transmission of an IP datagram to a host group, which is a set of zero or more hosts identified by a single IP destination address. A multicast datagram is delivered to all members of a destination host group. The membership of the host group is dynamic. A host group may be transient or permanent. Multicasting of this nature is essential to optimise bandwidth usage for multiparty conferencing applications.


A Transport Protocol for Audio and Video Conferences and other Multiparticipant Real-Time Applications

Services typically required by multimedia conferences are playout synchronisation, demultiplexing, media identification and active-party identification. RTP is not restricted to multimedia conferences, however, and other real-time services such as data acquisition and control may use its services.


This is Apple Computer's file format for the storage and interchange of sequenced data, with cross-platform support.

A QuickTime movie contains time based data which may represent sound, video or other time-sequenced information such as financial data or lab results. A movie is constructed of one or more tracks, each track being a single data stream.

A QuickTime movie file on an Apple Macintosh consists of a "resource fork" containing the movie resources and a "data fork" containing the actual movie data or references to external data sources such as video tape. To facilitate the exchange of data with systems which use single fork files, it is possible to combine these into a file which uses only the data fork . It is possible that QuickTime could become a computer-industry standard for the interchange of video/audio sequences.


RIFF (Resource Interchange File Format) is a family of file structures rather than a single format. RIFF file architecture is suitable for the following multimedia tasks: Playing back multimedia data Recording multimedia data Exchanging multimedia data between applications and across platforms.
Back Next Contents

Graphics     Multimedia      Virtual Environments      Visualisation      Contents