Multimedia in the Teaching Space
APPENDIX-1 Video Coding Algorithm
5. Inter-Picture Coding
Much of the information in a picture within a video sequence is similar to
information in a previous or subsequent picture. The MPEG standard takes
advantage of this temporal redundancy by representing some pictures in terms of
their differences from other (reference) pictures, or what is known as
inter-picture coding. This section describes the types of coded pictures and
explains the techniques used in this process.
The MPEG standard specifically defines three types of pictures: intra,
predicted, and bi-directional.
Intra pictures, or I-pictures, are coded using only information present in the
picture itself. I-pictures provide potential random access points into the
compressed video data. I-pictures use only transform coding (as explained in
the Intra-picture (Transform) Coding section) and provide moderate compression.
I-pictures typically use about two bits per coded pixel.
Predicted pictures, or P-pictures, are coded with respect to the nearest
previous I- or P-picture. This technique is called forward prediction. Like
I-pictures, P-pictures serve as a prediction reference for B-pictures and
future P-pictures. However, P-pictures use motion compensation (see the Motion
Compensation section) to provide more compression than is possible with
I-pictures. Unlike I-pictures, P-pictures can propagate coding errors because
P-pictures are predicted from previous reference (I- or P-) pictures.
Bi-directional pictures, or B-pictures, are pictures that use both a past and
future picture as a reference. This technique is called bi-directional
prediction. B-pictures provide the most compression and do not propagate errors
because they are never used as a reference. Bi-directional prediction also
decreases the effect of noise by averaging two pictures.
6. Video Stream Composition
The MPEG algorithm allows the encoder to choose the frequency and location
of I-pictures. This choice is based on the application's need for random
accessibility and the location of scene cuts in the video sequence. In
applications where random access is important, I-pictures are typically used
two times a second.
The encoder also chooses the number of B-pictures between any pair of reference
(I- or P-) pictures. This choice is based on factors such as the amount of
memory in the encoder and the characteristics of the material being coded. For
example, a large class of scenes have two bi-directional pictures separating
successive reference pictures.
The MPEG encoder reorders pictures in the video stream to present the pictures
to the decoder in the most efficient sequence. In particular, the reference
pictures needed to reconstruct B-pictures are sent before the associated
7. Motion Compensation
Motion compensation is a technique for enhancing the compression of P- and
B-pictures by eliminating temporal redundancy. Motion compensation typically
improves compression by about a factor of three compared to intra-picture
coding. Motion compensation algorithms work at the macroblock level.
When a macroblock is compressed by motion compensation, the compressed file
contains this information:
- The spatial vector between the reference macroblock(s) and the macroblock
being coded (motion vectors)
- The content differences between the reference macroblock(s) and the
macroblock being coded (error terms)
Not all information in a picture
can be predicted from a previous picture. Consider a scene in which a door
opens: The visual details of the room behind the door cannot be predicted from
a previous frame in which the door was closed. When a case such as this
arises--i.e., a macroblock in a P-picture cannot be efficiently represented by
motion compensation--it is coded in the same way as a macroblock in an
I-picture using transform coding techniques (see Intra-picture (Transform)
The difference between B- and P-picture motion compensation is that macroblocks
in a P-picture use the previous reference (I- or P-picture) only, while
macroblocks in a B-picture are coded using any combination of a previous or
future reference picture.
Four codings are therefore possible for each macroblock in a B-picture:
- Intra coding: no motion compensation
- Forward prediction: the previous reference picture is used as a reference
- Backward prediction: the next picture is used as a reference
- Bi-directional prediction: two reference pictures are used, the previous
reference picture and the next reference picture.
can be used to predict uncovered areas that do not appear in previous
8. Intra-picture (Transform) Coding
The MPEG transform coding algorithm includes these steps:
- Discrete cosine transform (DCT)
- Run-length encoding
Both image blocks and prediction-error blocks
have high spatial redundancy. To reduce this redundancy, the MPEG algorithm
transforms 8 x 8 blocks of pixels or 8 x 8 blocks of error terms from the
spatial domain to the frequency domain with the Discrete Cosine Transform
Next, the algorithm quantises the frequency coefficients. Quantization is the
process of approximating each frequency coefficient as one of a limited number
of allowed values. The encoder chooses a quantization matrix that determines
how each frequency coefficient in the 8 x 8 block is quantised. Human
perception of quantization error is lower for high spatial frequencies, so high
frequencies are typically quantized more coarsely (i.e., with fewer allowed
values) than low frequencies.
The combination of DCT and quantization results in many of the frequency
coefficients being zero, especially the coefficients for high spatial
frequencies. To take maximum advantage of this, the coefficients are organised
in a zigzag order to produce long runs of zeros. The coefficients are then
converted to a series of run-amplitude pairs, each pair indicating a number of
zero coefficients and the amplitude of a non-zero coefficient. These
run-amplitude pairs are then coded with a variable-length code, which uses
shorter codes for commonly occurring pairs and longer codes for less common
Some blocks of pixels need to be coded more accurately than others. For
example, blocks with smooth intensity gradients need accurate coding to avoid
visible block boundaries. To deal with this inequality between blocks, the
MPEG algorithm allows the amount of quantization to be modified for each
macroblock of pixels. This mechanism can also be used to provide smooth
adaptation to a particular bit rate.
The MPEG standard provides a timing mechanism that ensures synchronisation
of audio and video. The standard includes two parameters: the system clock
reference (SCR) and the presentation time-stamp (PTS).
The MPEG-specified ´´system clock'' runs at 90KHz. System clock
reference and presentation time-stamp values are coded in MPEG bitstreams using
33 bits, which can represent any clock cycle in a 24-hour period.
10. System Clock References
An SCR is a snapshot of the encoder system clock which is placed into the
system layer of the bitstream. During decoding, these values are used to
update the system clock counter in the CL480.
11. Presentation Time-stamps
Presentation time-stamps are samples of the encoder system clock that are
associated with video or audio presentation units. A presentation unit is a
decoded video picture or a decoded audio time sequence. The PTS represents the
time at which the video picture is to be displayed or the starting playback
time for the audio time sequence.
The decoder either skips or repeats picture displays to ensure that the PTS is
within one picture's worth of 90 KHz clock tics of the SCR when a picture is
displayed. If the PTS is earlier (has a smaller value) than the current SCR,
the decoder discards the picture. If the PTS is later (has a larger value) than
the current SCR, the decoder repeats the display of the picture.
Transmission costs are the most substantial portion of most data
communications and voice communication budgets. With these compression
techniques being used, communication costs are being reduced considerably.