This report is also available as an Acrobat file.
Digital Video for Multimedia: Considerations for Capture, Use and Delivery
Section 2: Digital video: issues and choices
Other considerations for reducing file size
There are other techniques for reducing file size. So far we have considered those that involve compression; size of window, frame rates and quality settings. Others include:
- Key frames. Many codecs work by saving a whole frame of information and removing redundant information (eg I frames) followed by only the differences between successive frames. These whole frames are referred to as key frames. Obviously, the less frequent the key frames, the less information is kept and the smaller the resultant file size. The number of key frames per second of video is one of the parameters that is set prior to compression.
- Palettes. Reduce the number of colours to 8 bit (256). Although many of the codecs handle the palette in such a way that 24 bit video can be shown on 8 bit displays with no problems, reducing the video to 256 colours may reduce the file size by two thirds.
Tip: The considerations of window size, codec, frame rates, number of key frames and quality settings are not independent of each other and there are always tradeoffs and compromises to be made. Our advice is to experiment with combinations of the above at the start of any project involving digital video, having first established your audience, and the delivery machines available.
Delivery from CD-ROM
The table below shows the data transfer rates for different speed CD-ROM drives
|CD-ROM drive||Data Transfer Rate||Notes
|Single speed||150 kB/s ||original based on CD audio|
|Dual speed||300 kB/s||relative to single speed|
|Quad speed||600 kB/s||relative to single speed|
The above transfer rates, however, are only available if the central processing unit (CPU) is not occupied with other tasks. Playing video files from CDROM, that is, loading, decompressing and transferring to the video card, requires a lot of processing power and reduces the bandwidth available.
But by how much?
Unfortunately there are no exact answers. This is because of the number of factors :
- the amount of CPU attention required by a particular CDROM drive
- CPU type/speed, caching
- amount of free and type of RAM (EDO, etc.)
- presence and settings of any caching software
- operating system
When evaluating specifications of CDROM drives, the important parameters are the transfer rates at 60% and 40% CPU usage, not the 100% figures. This gives an indication on how the drive will perform when the CPU is busy doing something else like decompressing video.
How does this translate into actual figures?
|CD-ROM||Data Transfer Rate||Likely Data Transfer Rates at 40 - 60% CPU Usage
|Single speed||150 kB/s||120-150 kB/s|
|Dual speed||300 kB/s||210-230 kB/s|
|Quad speed||600 kB/s||300-350 kB/s|
Note: within the case of single speed CD-ROM, 150 kB/s should be achievable by nearly all CPUs.
For double speed drives compression settings with data rates of 210-230kBs should be reasonable. Moving to a quad speed drive (600kBs-1 raw transfer rate), for the same specifications of computer as for a dual speed CD-ROM, the processor is now decompressing more data. Therefore 300-350kBs becomes the guideline range.
When compressing for delivery from CD-ROM and if not using MPEG-1, turn on the padding option, if available, in compression settings. Data is normally written to CD-ROM with a sector size of 2 kB. Padding adds dummy data to the frames to the nearest multiple of 2 kB thus ensuring that frames always start and stop at a sector boundary. Editing software provides default settings for each codec for key frame interval, compression quality, etc, and it is probably best to accept these until you are familiar enough with them to make changes.
When discussing digital video there is, almost always, the assumption that audio is included. Of course this is not always the case, there might not be any associated sound or it might not be important. Presented here are some of the issues involved with audio.
Any audio information is going to use some of the bandwidth available. The following table gives an indication of the data rates associated with some standard settings
|Sampling frequency (kHz)|| Mono||Stereo|
|8 bit||16 bit||8 bit||16 bit|
|11.025||11 kB/s||22 kB/s||22 kB/s|| 44 kB/s|
|22.05||22 kB/s||44 kB/s||44 kB/s|| 88 kB/s|
|44.1||44 kB/s||88 kB/s|| 88 kB/s||176 kB/s|
- The figures are for sampling rate not frequency response, simplistically the sampling rate should be at least twice the maximum frequency of interest. For example, if frequencies of approximately 9 kHz are important, sample at 22 kHz not 11 kHz.
- The rates are for uncompressed data.
As can be seen from the table the audio data alone can (eg stereo, 16 bit at 44.1 kHz) take more bandwidth than is available from a single speed CDROM.
What are the options available to reduce data rates?
This will, obviously, depend very much on the importance of the audio information to your video. Taking the highest data rate (176 kB/s) as a starting point consider:
- mono instead of stereo (reduce down to 88 kB/s)
- 22 not 44 kHz sampling (reduce down to 44 kB/s)
- reduce to 22 kB/s by choosing either:
- a)11 kHz sampling
- b) 8 bit resolution
- or both a) and b) to reduce to 11 kB/s
As usual there are other ways of tackling the problem. Again one of the most useful tools at our disposal is compression. Depending on the nature of the audio signal, there are a number of ways of compressing the data. These algorithms attempt to take account of such factors as the way we hear and the large amounts of silence that can occur in speech detected by there being no signal, etc. Fortunately there are industry standards/organisations, such as the Interactive Multimedia Association (IMA), that tackle such issues as:
- cross-platform compatibility
- compression algorithms
- mismatches in recorded bit depth and playback capability
- sample rate conversion
For example, using the DVI® (Digital Video Interactive) algorithm, which consists of 4 bit ADPCM (Advanced Differential Pulse Coded Modulation) samples, we can achieve 44kHz, 16 bit (equivalent), stereo at a data rate of 44 kB/s (25% of the uncompressed rate).
Other tips and advice
- Cross platform compatibility. As is often the case, even with 'standards', there are cross-platform compatibility problems e.g. the AVI and QT IMA implementations are not 100% compatible; AVI does not support Apples MACE (Macintosh Audio Compression Engine) format. Therefore, if authoring for more than one platform, check which formats provide the necessary compatibility and/or be prepared to undertake some audio format conversion.
- Compression. Some of the audio formats use lossy compression techniques. Therefore you should appraise the format in relation to the importance of sound quality to your material. Audio compression can enable very useful reductions in data rates while maintaining good sound quality. This is an especially important consideration for delivery from CD-ROM.
- CD Quality audio. It is worth noting here that the 44 kHz, 16 bit stereo setting is often referred to as 'CD quality'. However, do not assume that this is what you will get; it will depend very much on the quality of the original material, recorded or live, and the equipment used for digitisation. Most standard sound cards generate an audible level of noise during recording which will be interpreted into the sound file; the inside of a computer is a very noisy environment in electronic terms. If high quality recording is important to you, consider evaluating some of the higher quality sounds cards.
- Sample at the highest quality possible. It is worth sampling at the highest quality you can during capture and refrain from taking any decisions regarding delivered quality when you come to perform the final compression. This allows you to retrace and try other alternative combinations without having to recapture the sound.
- Assessing audio quality. For assessing the sound it is sensible to have audio playback equipment which is of a reasonably high standard. Audio that sounded fine through the 'multimedia' speakers supplied with your system can be actually very noisy and quite distorted when played over a high quality sound system. Sound quality is subjective; test out the quality with potential users.
- 'Lip sync'. During assessments you may find that some subjects report that speech and lip movement seem to be badly synchronised; some people are more sensitive to this than others. This 'lipsync' problem can occur during either capture or editing. Its occurrence during capture arises out of timing skews between the sound and video capture drivers. The higher quality or all-in-one capture cards are less likely to suffer from this problem. During editing, when adding or deleting, be aware that you may be altering the synchronisation between the audio and video tracks. In either case re-synchronisation can be performed in the editor before compression. Read your editors help file/manual for application specific techniques/settings.
Virtual Environments Visualisation