AGOCG logo
Graphics Multimedia VR Visualization Contents
Training Reports Workshops Briefings Index
Also available in Acrobat format Back Next


What is Multimedia?
Pedagogy and technology
Future Work


Case Studies

Multimedia in the Teaching Space

APPENDIX-1 Video Coding Algorithm

Most image or video applications involving transmission or storage require some form of data compression to reduce the otherwise inordinate demand on bandwidth and storage. The principle of data compression is quite straightforward. Virtually all forms of data contains redundant elements. The data can be compressed by eliminating those redundant elements with various compression methods. However, when compressed data are received over a communications link, it must be possible to expand the data back to the original form. As long as the coding scheme is such that the code is shorter than the eliminated data, compression will still occur.

1. Video compression

Video compression is a process whereby a collection of algorithms and techniques replace the original pixel-related information with more compact mathematical descriptions. Decompression is the reverse process of decoding the mathematical descriptions back to pixels for display. At its best, video compression is transparent to the end user.
There are two types of compression techniques:
A compression technique that creates compressed files that decompress into exactly the same file as the original. Lossless compression is typically used for executables applications and data files for which any change in digital make-up renders the file useless. Lossless compression typically yields only about 2:1 compression, which barely dents high-resolution uncompressed video files.
Lossy compression, used primarily on still image and video image files, creates compressed files that decompress into images that look similar to the original but are different in digital make up. This "loss" allows lossy compression to deliver from 2:1 to 300:1 compression. A wide range of lossy compression techniques is available for digital video.
In addition to lossy or lossless compression techniques, video compression involves the use of two other compression techniques:
Interframe Compression
Compression between frames (also known as temporal compression because the compression is applied along the time dimension).
Intraframe Compression
Compression within individual frames (also known as spatial compression).
Some video compression algorithms use both interframe and intraframe compression. For example, MPEG uses JPEG, which is an intrafame technique, and a separate interframe algorithm. Motion-JPEG uses only intraframe compression.

1.1. Interframe compression.

Interframe compression uses a system of key and delta frames to eliminate redundant information between frames. Key frames store an entire frame, and delta frames record only changes. Some implementations compress the key frames, and others don't. Either way, the key frames serve as a reference source for delta frames. Delta frames contain only pixels that are different from the key frame or from the immediately preceding delta frame. During decompression, delta frames look back to their respective reference frames to fill in missing information.
All interframe compression techniques derive their effectiveness from interframe redundancy. Low-motion video sequences, such as the head and shoulders of a person, have a high degree of redundancy, which limits the amount of compression required to reduce the video to the target bandwidth. Until recently, interframe compression has addressed only pixel blocks that remained static between the delta and the key frame. Some new CODECs increase compression by tracking moving blocks of pixels from frame to frame. This technique is called motion compensation. The data that is carried forward from key frames is dynamic.
Although dynamic carry forwards are helpful, they cannot always be implemented. In many cases, the capture board cannot scale resolution and frame rate, digitise, and hunt for dynamic carry forwards at the same time. Dynamic carry forwards typically mark the dividing line between hardware and software CODECs.

1.2. Intraframe compression.

Intraframe compression is performed solely with reference to information within a particular frame. It is performed on pixels in delta frames that remain after interframe compression and on key frames. Although intraframe techniques are often given the most attention, overall CODEC performance relates more to interframe efficiency than intraframe efficiency. The following are the principal intraframe compression techniques.
Null suppression
Is one of the oldest, and simplest, data compression techniques. A common occurrence in text is the presence of a long string of blanks in the character stream. The transmitter scans the data for strings of blanks and substitutes a two-character code for any string that is encountered. While null suppression is a very primitive form of data compression, it has the advantage of being simple to implement. Further more, the payoff, even from this simple technique, can be substantial (gains of between 30 and 50 percent).
Run Length Encoding (RLE)
A simple lossless technique originally designed for data compression and later modified for facsimile. RLE compresses an image based on "runs" of pixels. Although it works well on black and white facsimiles, RLE is not very efficient for colour video, which have few long runs of identically coloured pixels.


A standard that has been adopted by two international standards organisations: the ITU (formerly CCITT) and the ISO. JPEG is most often used to compress still images using discrete cosine transform (DCT) analysis. First, DCT divides the image into 8x8 blocks and then converts the colours and pixels into frequency space by describing each block in terms of the number of colour shifts (frequency) and the extent of the change (amplitude). Because most natural images are relatively smooth, the changes that occur most often have low amplitude values, so the change is minor. In other words, images have many subtle shifts among similar colours but few dramatic shifts between very different colours. Next, quantisation and amplitude values are categorised by frequency and averaged. This is the lossy stage because the original values are permanently discarded. However, because most of the picture is categorised in the high-frequency/low-amplitude range, most of the loss occurs among subtle shifts that are largely indistinguishable to the human eye. After quantization, the values are further compressed through RLE using a special zigzag pattern designed to optimise compression of like regions within the image. At extremely high compression ratios, more high-frequency/low-amplitude changes are averaged, which can cause an entire pixel block to adopt the same colour. This causes a blockiness artefact that is characteristic of JPEG-compressed images. JPEG is used as the intraframe technique for MPEG.

3. Vector quantization (VQ)

A standard that is similar to JPEG in that it divides the image into 8x8 blocks. The difference between VQ and JPEG has to do with the quantization process. VQ is a recursive, or multi-step algorithm with inherently self-correcting features. With VQ, similar blocks are categorised and a reference block is constructed for each category. The original blocks are then discarded. During decompression, the single reference block replaces all of the original blocks in the category. After the first set of reference blocks is selected, the image is decompressed. Comparing the decompressed image to the original reveals many differences. To address the differences, an additional set of reference blocks is created that fills in the gaps created during the first estimation. This is the self-correcting part of the algorithm. The process is repeated to find a third set of reference blocks to fill in the remaining gaps. These reference blocks are posted in a lookup table to be used during decompression. The final step is to use lossless techniques, such as RLE, to further compress the remaining information. VQ compression is by its nature computationally intensive. However, decompression, which simply involves pulling values from the lookup table, is simple and fast.


MPEG addresses the compression, decompression and synchronisation of video and audio signals. In most general form, an MPEG system stream is made up of two layers:
System layer
The system layer containing timing and other information needed to de-multiplex the audio and video streams and to synchronise audio and video during playback.
Compression Layer
The compression layer includes the audio and video streams.
The system decoder extracts the timing from the MPEG system stream and sends it to the other system components. The system decoder also de-multiplexes the video and audio streams from the system stream; then sends each to the appropriate decoder. The video decoder decompresses the video stream as specifies in part 2 of the MPEG standard. The audio decoder decompresses the audio stream as specifies in part 3 of the MPEG standard.
The MPEG standard defines a hierarchy of data structures in the video stream.
Video Sequence
Begins with a sequence header (may contain additional sequence headers), includes one or more groups of pictures, and ends with an end-of-sequence code.
Group of Pictures (GOP)
A header and a series of one or more pictures intended to allow random access into the sequence.
The primary coding unit of a video sequence. A picture consists of three rectangular matrices representing luminance (Y) and two chrominance (Cb and Cr) values. The Y matrix has an even number of rows and columns. The Cb and Cr matrices are one-half the size of the Y matrix in each direction (horizontal and vertical).
One or more ´´contiguous'' macroblocks. The order of the macroblocks within a slice is from left-to-right and top-to-bottom. Slices are important in the handling of errors. If the bitstream contains an error, the decoder can skip to the start of the next slice. Having more slices in the bitstream allows better error concealment, but uses bits that could otherwise be used to improve picture quality.
A 16-pixel by 16-line section of luminance components and the corresponding 8-pixel by 8-line section of the two chrominance components.
A block is an 8-pixel by 8-line set of values of a luminance or a chrominance component. Note that a luminance block corresponds to one-fourth as large a portion of the displayed image as does a chrominance block.
The MPEG audio stream consists of a series of packets. Each audio packet contains an audio packet header and one or more audio frames.
Each audio packet header contains following information:

  • Packet start code Identifies the packet as being an audio packet.
  • Packet length Indicates the number of bytes in the audio packet.

An audio frame contains the following information:

  • Audio frame header Contains synchronisation, ID, bit rate, and sampling frequency information.
  • Error-checking code Contains error-checking information.
  • Audio data Contains information used to reconstruct the sampled audio data.
  • Ancillary data Contains user-defined data.

Graphics     Multimedia      Virtual Environments      Visualisation      Contents