What is Multimedia?
Pedagogy and technology
Multimedia in the Teaching Space
APPENDIX-1 Video Coding Algorithm
Most image or video applications involving transmission or storage require
some form of data compression to reduce the otherwise inordinate demand on
bandwidth and storage. The principle of data compression is quite
straightforward. Virtually all forms of data contains redundant elements. The
data can be compressed by eliminating those redundant elements with various
compression methods. However, when compressed data are received over a
communications link, it must be possible to expand the data back to the
original form. As long as the coding scheme is such that the code is shorter
than the eliminated data, compression will still occur.
1. Video compression
Video compression is a process whereby a collection of algorithms and
techniques replace the original pixel-related information with more compact
mathematical descriptions. Decompression is the reverse process of decoding the
mathematical descriptions back to pixels for display. At its best, video
compression is transparent to the end user.
There are two types of compression techniques:
A compression technique that creates compressed files that decompress into
exactly the same file as the original. Lossless compression is typically used
for executables applications and data files for which any change in digital
make-up renders the file useless. Lossless compression typically yields only
about 2:1 compression, which barely dents high-resolution uncompressed video
Lossy compression, used primarily on still image and video image files, creates
compressed files that decompress into images that look similar to the original
but are different in digital make up. This "loss" allows lossy compression to
deliver from 2:1 to 300:1 compression. A wide range of lossy compression
techniques is available for digital video.
In addition to lossy or lossless compression techniques, video compression
involves the use of two other compression techniques:
Compression between frames (also known as temporal compression because the
compression is applied along the time dimension).
Compression within individual frames (also known as spatial compression).
Some video compression algorithms use both interframe and intraframe
compression. For example, MPEG uses JPEG, which is an intrafame technique, and
a separate interframe algorithm. Motion-JPEG uses only intraframe
1.1. Interframe compression.
Interframe compression uses a system of key and delta frames to eliminate
redundant information between frames. Key frames store an entire frame, and
delta frames record only changes. Some implementations compress the key frames,
and others don't. Either way, the key frames serve as a reference source for
delta frames. Delta frames contain only pixels that are different from the key
frame or from the immediately preceding delta frame. During decompression,
delta frames look back to their respective reference frames to fill in missing
All interframe compression techniques derive their effectiveness from
interframe redundancy. Low-motion video sequences, such as the head and
shoulders of a person, have a high degree of redundancy, which limits the
amount of compression required to reduce the video to the target bandwidth.
Until recently, interframe compression has addressed only pixel blocks that
remained static between the delta and the key frame. Some new CODECs increase
compression by tracking moving blocks of pixels from frame to frame. This
technique is called motion compensation. The data that is carried forward from
key frames is dynamic.
Although dynamic carry forwards are helpful, they cannot always be implemented.
In many cases, the capture board cannot scale resolution and frame rate,
digitise, and hunt for dynamic carry forwards at the same time. Dynamic carry
forwards typically mark the dividing line between hardware and software
1.2. Intraframe compression.
Intraframe compression is performed solely with reference to information
within a particular frame. It is performed on pixels in delta frames that
remain after interframe compression and on key frames. Although intraframe
techniques are often given the most attention, overall CODEC performance
relates more to interframe efficiency than intraframe efficiency. The following
are the principal intraframe compression techniques.
Is one of the oldest, and simplest, data compression techniques. A common
occurrence in text is the presence of a long string of blanks in the character
stream. The transmitter scans the data for strings of blanks and substitutes a
two-character code for any string that is encountered. While null suppression
is a very primitive form of data compression, it has the advantage of being
simple to implement. Further more, the payoff, even from this simple technique,
can be substantial (gains of between 30 and 50 percent).
Run Length Encoding (RLE)
A simple lossless technique originally designed for data compression and later
modified for facsimile. RLE compresses an image based on "runs" of pixels.
Although it works well on black and white facsimiles, RLE is not very efficient
for colour video, which have few long runs of identically coloured pixels.
A standard that has been adopted by two international standards
organisations: the ITU (formerly CCITT) and the ISO. JPEG is most often used to
compress still images using discrete cosine transform (DCT) analysis. First,
DCT divides the image into 8x8 blocks and then converts the colours and pixels
into frequency space by describing each block in terms of the number of colour
shifts (frequency) and the extent of the change (amplitude). Because most
natural images are relatively smooth, the changes that occur most often have
low amplitude values, so the change is minor. In other words, images have many
subtle shifts among similar colours but few dramatic shifts between very
different colours. Next, quantisation and amplitude values are categorised by
frequency and averaged. This is the lossy stage because the original values are
permanently discarded. However, because most of the picture is categorised in
the high-frequency/low-amplitude range, most of the loss occurs among subtle
shifts that are largely indistinguishable to the human eye. After
quantization, the values are further compressed through RLE using a special
zigzag pattern designed to optimise compression of like regions within the
image. At extremely high compression ratios, more high-frequency/low-amplitude
changes are averaged, which can cause an entire pixel block to adopt the same
colour. This causes a blockiness artefact that is characteristic of
JPEG-compressed images. JPEG is used as the intraframe technique for MPEG.
3. Vector quantization (VQ)
A standard that is similar to JPEG in that it divides the image into 8x8
blocks. The difference between VQ and JPEG has to do with the quantization
process. VQ is a recursive, or multi-step algorithm with inherently
self-correcting features. With VQ, similar blocks are categorised and a
reference block is constructed for each category. The original blocks are then
discarded. During decompression, the single reference block replaces all of the
original blocks in the category. After the first set of reference blocks is
selected, the image is decompressed. Comparing the decompressed image to the
original reveals many differences. To address the differences, an additional
set of reference blocks is created that fills in the gaps created during the
first estimation. This is the self-correcting part of the algorithm. The
process is repeated to find a third set of reference blocks to fill in the
remaining gaps. These reference blocks are posted in a lookup table to be used
during decompression. The final step is to use lossless techniques, such as
RLE, to further compress the remaining information. VQ compression is by its
nature computationally intensive. However, decompression, which simply
involves pulling values from the lookup table, is simple and fast.
MPEG addresses the compression, decompression and synchronisation of video
and audio signals. In most general form, an MPEG system stream is made up of
The system layer containing timing and other information needed to de-multiplex
the audio and video streams and to synchronise audio and video during
The compression layer includes the audio and video streams.
The system decoder extracts the timing from the MPEG system stream and sends it
to the other system components. The system decoder also de-multiplexes the
video and audio streams from the system stream; then sends each to the
appropriate decoder. The video decoder decompresses the video stream as
specifies in part 2 of the MPEG standard. The audio decoder decompresses the
audio stream as specifies in part 3 of the MPEG standard.
The MPEG standard defines a hierarchy of data structures in the video
Begins with a sequence header (may contain additional sequence headers),
includes one or more groups of pictures, and ends with an end-of-sequence
Group of Pictures (GOP)
A header and a series of one or more pictures intended to allow random access
into the sequence.
The primary coding unit of a video sequence. A picture consists of three
rectangular matrices representing luminance (Y) and two chrominance (Cb and Cr)
values. The Y matrix has an even number of rows and columns. The Cb and Cr
matrices are one-half the size of the Y matrix in each direction (horizontal
One or more ´´contiguous'' macroblocks. The order of the macroblocks
within a slice is from left-to-right and top-to-bottom. Slices are important in
the handling of errors. If the bitstream contains an error, the decoder can
skip to the start of the next slice. Having more slices in the bitstream allows
better error concealment, but uses bits that could otherwise be used to improve
A 16-pixel by 16-line section of luminance components and the corresponding
8-pixel by 8-line section of the two chrominance components.
A block is an 8-pixel by 8-line set of values of a luminance or a chrominance
component. Note that a luminance block corresponds to one-fourth as large a
portion of the displayed image as does a chrominance block.
The MPEG audio stream consists of a series of packets. Each audio packet
contains an audio packet header and one or more audio frames.
Each audio packet header contains following information:
- Packet start code Identifies the packet as being an audio packet.
- Packet length Indicates the number of bytes in the audio
An audio frame contains the following information:
- Audio frame header Contains synchronisation, ID, bit rate, and sampling
- Error-checking code Contains error-checking information.
- Audio data Contains information used to reconstruct the sampled audio data.
- Ancillary data Contains user-defined data.