AGOCG logo
Graphics Multimedia VR Visualization Contents
Training Reports Workshops Briefings Index
Also available in Acrobat format Back Next


What is Multimedia?
Pedagogy and technology
Future Work


Case Studies

Multimedia in the Teaching Space

APPENDIX-1 Video Coding Algorithm

5. Inter-Picture Coding

Much of the information in a picture within a video sequence is similar to information in a previous or subsequent picture. The MPEG standard takes advantage of this temporal redundancy by representing some pictures in terms of their differences from other (reference) pictures, or what is known as inter-picture coding. This section describes the types of coded pictures and explains the techniques used in this process.
Picture Types
The MPEG standard specifically defines three types of pictures: intra, predicted, and bi-directional.
Intra Pictures
Intra pictures, or I-pictures, are coded using only information present in the picture itself. I-pictures provide potential random access points into the compressed video data. I-pictures use only transform coding (as explained in the Intra-picture (Transform) Coding section) and provide moderate compression. I-pictures typically use about two bits per coded pixel.
Predicted Pictures
Predicted pictures, or P-pictures, are coded with respect to the nearest previous I- or P-picture. This technique is called forward prediction. Like I-pictures, P-pictures serve as a prediction reference for B-pictures and future P-pictures. However, P-pictures use motion compensation (see the Motion Compensation section) to provide more compression than is possible with I-pictures. Unlike I-pictures, P-pictures can propagate coding errors because P-pictures are predicted from previous reference (I- or P-) pictures.
Bi-directional Pictures
Bi-directional pictures, or B-pictures, are pictures that use both a past and future picture as a reference. This technique is called bi-directional prediction. B-pictures provide the most compression and do not propagate errors because they are never used as a reference. Bi-directional prediction also decreases the effect of noise by averaging two pictures.

6. Video Stream Composition

The MPEG algorithm allows the encoder to choose the frequency and location of I-pictures. This choice is based on the application's need for random accessibility and the location of scene cuts in the video sequence. In applications where random access is important, I-pictures are typically used two times a second.
The encoder also chooses the number of B-pictures between any pair of reference (I- or P-) pictures. This choice is based on factors such as the amount of memory in the encoder and the characteristics of the material being coded. For example, a large class of scenes have two bi-directional pictures separating successive reference pictures.
The MPEG encoder reorders pictures in the video stream to present the pictures to the decoder in the most efficient sequence. In particular, the reference pictures needed to reconstruct B-pictures are sent before the associated B-pictures.

7. Motion Compensation

Motion compensation is a technique for enhancing the compression of P- and B-pictures by eliminating temporal redundancy. Motion compensation typically improves compression by about a factor of three compared to intra-picture coding. Motion compensation algorithms work at the macroblock level.
When a macroblock is compressed by motion compensation, the compressed file contains this information:

  • The spatial vector between the reference macroblock(s) and the macroblock being coded (motion vectors)
  • The content differences between the reference macroblock(s) and the macroblock being coded (error terms)

Not all information in a picture can be predicted from a previous picture. Consider a scene in which a door opens: The visual details of the room behind the door cannot be predicted from a previous frame in which the door was closed. When a case such as this arises--i.e., a macroblock in a P-picture cannot be efficiently represented by motion compensation--it is coded in the same way as a macroblock in an I-picture using transform coding techniques (see Intra-picture (Transform) Coding Section).
The difference between B- and P-picture motion compensation is that macroblocks in a P-picture use the previous reference (I- or P-picture) only, while macroblocks in a B-picture are coded using any combination of a previous or future reference picture.
Four codings are therefore possible for each macroblock in a B-picture:

  • Intra coding: no motion compensation
  • Forward prediction: the previous reference picture is used as a reference
  • Backward prediction: the next picture is used as a reference
  • Bi-directional prediction: two reference pictures are used, the previous reference picture and the next reference picture.

Backward prediction can be used to predict uncovered areas that do not appear in previous pictures.

8. Intra-picture (Transform) Coding

The MPEG transform coding algorithm includes these steps:

  • Discrete cosine transform (DCT)
  • Quantization
  • Run-length encoding

Both image blocks and prediction-error blocks have high spatial redundancy. To reduce this redundancy, the MPEG algorithm transforms 8 x 8 blocks of pixels or 8 x 8 blocks of error terms from the spatial domain to the frequency domain with the Discrete Cosine Transform (DCT).
Next, the algorithm quantises the frequency coefficients. Quantization is the process of approximating each frequency coefficient as one of a limited number of allowed values. The encoder chooses a quantization matrix that determines how each frequency coefficient in the 8 x 8 block is quantised. Human perception of quantization error is lower for high spatial frequencies, so high frequencies are typically quantized more coarsely (i.e., with fewer allowed values) than low frequencies.
The combination of DCT and quantization results in many of the frequency coefficients being zero, especially the coefficients for high spatial frequencies. To take maximum advantage of this, the coefficients are organised in a zigzag order to produce long runs of zeros. The coefficients are then converted to a series of run-amplitude pairs, each pair indicating a number of zero coefficients and the amplitude of a non-zero coefficient. These run-amplitude pairs are then coded with a variable-length code, which uses shorter codes for commonly occurring pairs and longer codes for less common pairs.
Some blocks of pixels need to be coded more accurately than others. For example, blocks with smooth intensity gradients need accurate coding to avoid visible block boundaries. To deal with this inequality between blocks, the MPEG algorithm allows the amount of quantization to be modified for each macroblock of pixels. This mechanism can also be used to provide smooth adaptation to a particular bit rate.

9. Synchronisation

The MPEG standard provides a timing mechanism that ensures synchronisation of audio and video. The standard includes two parameters: the system clock reference (SCR) and the presentation time-stamp (PTS).
The MPEG-specified ´´system clock'' runs at 90KHz. System clock reference and presentation time-stamp values are coded in MPEG bitstreams using 33 bits, which can represent any clock cycle in a 24-hour period.

10. System Clock References

An SCR is a snapshot of the encoder system clock which is placed into the system layer of the bitstream. During decoding, these values are used to update the system clock counter in the CL480.

11. Presentation Time-stamps

Presentation time-stamps are samples of the encoder system clock that are associated with video or audio presentation units. A presentation unit is a decoded video picture or a decoded audio time sequence. The PTS represents the time at which the video picture is to be displayed or the starting playback time for the audio time sequence.
The decoder either skips or repeats picture displays to ensure that the PTS is within one picture's worth of 90 KHz clock tics of the SCR when a picture is displayed. If the PTS is earlier (has a smaller value) than the current SCR, the decoder discards the picture. If the PTS is later (has a larger value) than the current SCR, the decoder repeats the display of the picture.

12. Conclusion

Transmission costs are the most substantial portion of most data communications and voice communication budgets. With these compression techniques being used, communication costs are being reduced considerably.

Graphics     Multimedia      Virtual Environments      Visualisation      Contents