CONTENTS

Introduction
What is Multimedia?
Pedagogy and technology
Networks
Future Work

Glossary

Appendices
Bibliography

Case Studies

Multimedia in the Teaching Space

GLOSSARY

ADPCM (Adaptive Differential Pulse Code Modulation)

It is a compression technique which encodes the predictive residual instead of the original waveform signal so that the compression efficiency is improved by a predictive gain. Rather than transmitting PCM samples directly, the difference between the estimate of the next sample and the actual sample is transmitted. This difference is usually small and can thus be encoded in fewer bits than the sample itself.

A-LAW

A technique for encoding audio into an 8 bit word used in G.711 encoding.

ANSI (American National Standards Institute)

ANSI works with various organisations and manufacturers of telecommunications equipment to determine domestic standards not covered by ITU.

API (Application Programming Interface)

It is software from which user interfaces (e.g., pull down menus) can be created.

ARITHMETIC CODING

Perhaps the major drawback to each of the Huffman encoding techniques is their poor performance when processing texts where one symbol has a probability of occurrence approaching unity. Although the entropy associated with such symbols is extremely low, each symbol must still be encoded as a discrete value.

Arithmetic coding removes this restriction by representing messages as intervals of the real numbers between 0 and 1. Initially, the range of values for coding a text is the entire interval [0, 1]. As encoding proceeds, this range narrows while the number of bits required to represent it expands. Frequently occurring characters reduce the range less than characters occurring infrequently, and thus add fewer bits to the length of an encoded message.

ATM (Asynchronous Transfer Mode)

ATM is a switching/transmission technique where data is transmitted in small, fixed sized cells (5 byte header, 48 byte payload). The cells lend themselves both to the time-division-multiplexing characteristics of the transmission media, and the packet switching characteristics desired of data networks. At each switching node, the ATM header identifies a virtual path or virtual circuit that the cell contains data for, enabling the switch to forward the cell to the correct next-hop trunk. The virtual path is set up through the involved switches when two endpoints wish to communicate. This type of switching can be implemented in hardware, almost essential when trunk speed range from 45Mbps to 1Gbps.

The ATM Forum, a worldwide organization, aimed at promoting ATM within the industry and the end user community was formed in October 1991 and currently includes more than 500 companies representing all sectors of the communications and computer industries, as well as a number of government agencies, research laboratories.

AUDIO STANDARDS

These refer to the audio standards for the compression/decompression and transmission of P*64 audio signals. G.xxx classifications gives these standards.

AVC system

Audio/Video/Communication system.

BANDWIDTH

A measurement expressed in bits per second (bps) on the amount of information that can flow through a channel.

B-CHANNEL

Bearer Channels. A 64Kbps channel in an ISDN line.

BIT-RATE

The rate at which the compressed bit-stream is delivered from the storage medium to the input of a decoder.

BLOCK

An 8-row by 8-column matrix of pels, or 64 DCT coefficients (source, quantised or dequantised).

BRI (Basic Rate Interface)

A data rate standard for ISDN. It provides a customer with 144Kbps divided into three channels (2B channels carrying 64Kbps each and one D-channel assigned to 16Kb of signalling information).

BACKWARD MOTION VECTOR

A motion vector that is used for motion compensation from a reference picture at a later time in display order.

CAPABILITY SET

Within the H.242 standard, capability set is used to define the set of functions which the audio-visual end point supports. At the initiation of an audio-visual call, the end points exchange their respective sets of information and establish a call within the bounds of their mutual capability sets.

CCITT

Commite' Consultatif International de Telecommunications et Telegraphy A committee of the International Telecommunications Union responsible for making technical recommendations about telephone and data communication systems for PTTs and suppliers. Plenary sessions are held every four years to adopt new standards.

CIF (Common Image Format)

A format for displaying an image on a screen. CIF has an image resolution of 352 by 288 pixels at 30 frames per second. This format is optional within the H.261 standard.

The standardisation of the structure of the samples that represent the picture information of a single frame in digital HDTV, independent of frame rate and sync/blank structure. The uncompressed bit rates for transmitting CIF at 29.97 frames/sec is 36.45 Mbps.

CHI (Concentration Highway Interface)

Programmable time division multiplex communication bus used by Lucent Technologies for interconnecting telecommunication chips. Can be programmed to act as one channel of an MVIP.

CHI LOOP

A board which can connect to one or two MVIP or concentration highway connectors and emulates a common interface by supplying the clock and by connecting the CHI or MVIP back to itself (loop) or to another CHI or MVIP (cross connect). Useful for testing where actual communication lines are not available.

CHROMINANCE (component)

A matrix, block or single pel representing one of the two colour difference signals related to the primary colours in the manner defined in the bit-stream. The symbols used for the colour difference signals are Cr and Cb.

D-CHANNEL

Data Signalling Channel. A 16Kbps channel in an ISDN line.

DCT

Discrete Cosine Transform, used in Fourier Analysis

DLL (Dynamic Link Library)
DSM (Digital Storage Media)

A digital storage or transmission device or system.

DSP (Digital Signal Processing)

The technique of processing data as numbers instead of voltages.

ECHO CANCELLATION

A technique used to prevent a speaker from hearing a delayed version of his own speech. Echo cancellation is required in video telephony due to the delays required by the video.

ENTROPY

Entropy, the average amount of information represented by a symbol in a message, is a function of the model used to produce that message and can be reduced by increasing the complexity of the model so that it better reflects the actual distribution of source symbols in the original message.

Entropy is a measure of the information contained in message, it's the lower bound for compression.

FFT

Fast Fourier Transform

FIELD

For an interlaced video signal, a field is the assembly of alternate lines of a frame. Therefore an interlaced frame is composed of two fields a top field and a bottom field.

FORWARD MOTION VECTOR

A motion vector that is used for motion compensation from a reference picture at an earlier time in display order.

FRAME PERIOD

The reciprocal of the frame rate.

FRAME RATE

The rate at which frames are be output from the decoding process.

FUTURE REFERENCE PICTURE

A future reference picture is a reference picture that occurs at a later time than the current picture in display order.

G.711

A standard for compressing and decompressing audio (50 -3000Hz) into a 48, 56, or 64Kbps stream.

G.722

A standard for compressing and decompressing audio (50 - 7000 Hz) into a 48, 56, or 64Kbps stream.

G.728

A standard for compressing and decompressing audio (50 - 3000 Hz) into a 16Kbps stream.

G.SERIES

The family of audio-related ITU standards. It includes G.711, G.722, and G.728.

H.SERIES

The family of ITU standards for use of video equipment (over 64 to 1920Kbps channels) during conferencing. Frequently referred to as P*64.

H.281

The ITU-T standard for far end camera control in an H.320 conference.

H.320

The ITU recommended standard for narrow-band visual telephone systems and terminal equipment.

HUFFMAN CODING

For a given character distribution, by assigning short codes to frequently occurring characters and longer codes to infrequently occurring characters, Huffman's minimum redundancy encoding minimises the average number of bytes required to represent the characters in a text.

Static Huffman encoding uses a fixed set of codes, based on a representative sample of data, for processing texts. Although encoding is achieved in a single pass, the data on which the compression is based may bear little resemblance to the actual text being compressed.

Dynamic Huffman encoding, on the other hand, reads each text twice; once to determine the frequency distribution of the characters in the text and once to encode the data. The codes used for compression are computed on the basis of the statistics gathered during the first pass with compressed texts being prefixed by a copy of the Huffman encoding table for use with the decoding process.

By using a single-pass technique, where each character is encoded on the basis of the preceding characters in a text, Gallager's adaptive Huffman encoding avoids many of the problems associated with either the static or dynamic method.

INTERLACE

The property of conventional television frames where alternating lines of the frame represent different instances in time.

INTRA CODING

Coding of a macroblock or picture that uses information only from that macroblock or picture.

I/O SPACE

Input/Output Space. The address at which the computer communicates with an add-in card.

I-PICTURE (INTRA-CODED PICTURE)

A picture coded using information only from itself.

ISDN (Integrated Services Digital Network)

ISDN is a CCITT term for a relatively new telecommunications service package. ISDN is basically the telephone network turned all-digital end to end, using existing switches and wiring (for the most part) upgraded so that the basic call is a 64Kbps end-to-end channel, with bit-diddling as needed. Packet and maybe frame modes are thrown in for good measure, too, in some places. It's offered by local telephone companies, but most readily in Australia, France, Japan, and Singapore, with the UK and Germany somewhat behind, and USA availability rather spotty.

A Basic Rate Interface (BRI) is two 64K bearer (B) channels and a single delta (D) channel. The B channels are used for voice or data, and the D channel is used for signalling and/or X.25 packet networking. This is the variety most likely to be found in residential service. Another flavour of ISDN is Primary Rate Interface (PRI). Inside the US, this consists of 24 channels, usually divided into 23 B channels and 1 D channel, and runs over the same physical interface as T1. Outside of the US then PRI has 31 user channels, usually divided into 30 B channels and 1 D channel. It is typically used for connections such as one between a PBX and a CO or IXC.

ITU

International Telecommunications Union, formerly CCITT, a body of the United Nations.

LUMINANCE (component)

A matrix, block or single pel representing a monochrome representation of the signal and related to the primary colours in the manner defined in the bitstream. The symbol used for luminance is Y.

MACROBLOCK

The four 8 by 8 blocks of luminance data and the two (for 4:2:0 chroma format), four (for 4:2:2 chroma format) or eight (for 4:4:4 chroma format) corresponding 8 by 8 blocks of chrominance data coming from a 16 by 16 section of the luminance component of the picture. Macroblock is sometimes used to refer to the pel data and sometimes to the coded representation of the pel values and other data elements defined in the macroblock header. The usage should be clear from the context.

MOTION COMPENSATION

The use of motion vectors to improve the efficiency of the prediction of pel values. The prediction uses motion vectors to provide offsets into the past and/or future reference pictures containing previously decoded pel values that are used to form the prediction error signal.

The book Motion analysis for Image Sequence Coding by G.Tziritas and C.Labit documents the technical advances made through the years in dealing with motion in image sequences.

MOTION ESTIMATION

The process of estimating motion vectors during the encoding process.

MOTION VECTOR

A two-dimensional vector used for motion compensation that provides an offset from the coordinate position in the current picture to the coordinates in a reference picture.

MVIP

Multi-Vendor Integration Protocol. An 8 channel time division multiplex communication bus which can be used to connect various digital communication boards in a PC.

NON-INTRA CODING

Coding of a macroblock or picture that uses information both from itself and from macroblocks and pictures occurring at other times.

NT1

Network Terminating Device. The ISDN telephone line from your local exchange carrier connects to your system through an NT1. An NT1 performs network performance and integrity checks. It enables loopback testing which verifies your digital line is connected and working properly. Outside the US the NT1 is considered part of the network and is installed by the telephone company. In the US, it is considered Customer Premises Equipment (CPE). It is common to change the NT1 when trying to diagnose a dead connection if you previously had a working connection. Some hardware manufacturers build the NT1 into their device; this may limit your flexibility in adding services to your ISDN line. The NT1 is microprocessor controlled and requires its own power source.

NTSC (National Television System Committee)

USA video standard with image format 4:3, 525 lines, 60 Hz and 4 Mhz video bandwidth with a total 6 Mhz of video channel width. NTSC uses YIQ NTSC-1 was set in 1948. It increased the number of scanning lines from 441 to 525, and replaced AM-modulated sound with FM.

PEL

Picture element.

PICTURE

Source, coded or reconstructed image data. A source or reconstructed picture consists of three rectangular matrices of 8-bit numbers representing the luminance and two chrominance signals. For progressive video, a picture is identical to a frame, while for interlaced video, a picture can refer to a frame, the top field or the bottom field of the frame depending on the context.

PREDICTION

The use of a predictor to provide an estimate of the pel value or data element currently being decoded.

P-PICTURE (Predictive-coded picture)

A picture that is coded using motion compensated prediction from past reference pictures.

PREDICTION ERROR

The difference between the actual value of a pel or data element and its predictor.

PREDICTOR

A linear combination of previously decoded pel values or data elements.

PROFILE

A defined sub-set of the syntax of a specification.

QCIF RESOLUTION

Quarter Common source Intermediate Format (1/4 CIF , i.e. luminance information is coded at 144 lines and 176 pixels per line at 30 frames per second.). The uncompressed bit rates for transmitting QCIF at 29.97 frames/sec is 9.115 Mbit/s. This format is required by the H.261 standard.

QUANTISATION MATRIX

A set of sixty-four 8-bit values used by the dequantiser.

QUANTISED DCT COEFFICIENTS

DCT coefficients before dequantisation. A variable length coded representation of quantised DCT coefficients is storedas part of the compressed video bitstream.

QUANTISER SCALE

A scale factor coded in the bitstream and used by the decoding process to scale the dequantisation.

REFERENCE PICTURE

Reference pictures are the nearest adjacent I or P pictures to the current picture in display order.

SCALABILITY

Scalability is the ability of a decoder to decode an ordered set of bitstreams to produce a reconstructed sequence. Moreover, useful video is output when subsets are decoded. The minimum subset that can thus be decoded is the first bitstream in the set which is called the base layer. Each of the other bitstreams in the set is called an enhancement layer. When addressing a specific enhancement layer, lower layer refer to the bitstream which precedes the enhancement layer.

SPID

Service Profile Identifiers which are used to identify what sort of services and features the switch provides to the ISDN device. When a new subscriber is added, the service representative will allocate a SPID just as they allocate a directory number. The subscriber needs to input the SPIDs into their terminal device before they will be able to connect to the central office switch (this is referred to as initialising the device).

SUB BAND CODING

Sub-band coding for images has roots in work done in the 1950s by Bedford and on Mixed Highs image compression done by Kretzmer in 1954. Schreiber and Buckley explored general two channel coding of still pictures where the low spatial frequency channel was coarsely sampled and finely quantized and the high spatial frequency channel was finely sampled and coarsely quantized. More recently, Karlsson and Vetterli have extended this to multiple subbands. Adelson et al. have shown how a recursive subdivision called a pyramid decomposition can be used both for compression and other useful image processing tasks.

A pure sub-band coder performs a set of filtering operations on an image to divide it into spectral components. Usually, the result of the analysis phase is a set of sub-images, each of which represents some region in spatial or spatio-temporal frequency space. For example, in a still image, there might be a small sub-image that represents the low-frequency components of the input picture that is directly viewable as either a minified or blurred copy of the original. To this are added successively higher spectral bands that contain the edge information necessary to reproduce the original sharpness of the original at successively larger scales. As with DCT coder, to which it is related, much of the image energy is concentrated in the lowest frequency band.

For equal visual quality, each band need not be represented with the same signal-to-noise ratio; this is the basis for sub-band coder compression. In many coders, some bands are eliminated entirely, and others are often compressed with a vector or lattice quantizer. Succeedingly higher frequency bands are more coarsely quantized, analogous to the truncation of the high frequency coefficients of the DCT. A sub-band decomposition can be the intraframe coder in a predictive loop, thus minimizing the basic distinctions between DCT-based hybrid coders and their alternatives.

T.120

Defines a series of communication protocols for data conferencing. The protocol closest to the hardware is T.123 which provides reliable transport of data between end points, over various types of media including modems, ISDN, and LANs. T.125, Multipoint Control Service, co-ordinates and synchronises the various participants in a multipoint call. T.124, Generic Conference Control, provides setup and control of a conference. T.126 provides whiteboarding and graphic image annotation. T.127 provides file transfer.

TEMPORAL SCALABILITY

A type of scalability where an enhancement layer also uses predictions from pel data derived from a lower layer using motion vectors. The layers have identical frame rates size, and chroma formats, but can have different frame rates.

TOP FIELD

One of two fields that comprise a frame of interlaced video. Each line of a top field is spatially located immediately above the corresponding line of the bottom field.

VARIABLE BITRATE

Operation where the bitrate varies with time during the decoding of a compressed bitstream.

Although variable bit rate is acceptable for plain linear playback, one important consideration not to use variable bit rate is that reasonably quick random access becomes nearly impossible. There is no table of contents or index in MPEG. The only tool the play back system has for approximating the correct byte position is the requested play back time stamp and the bit rate of the MPEG stream. MPEG streams do not encode their play back time.

To approximate an intermediate position in a variable bit rate stream, the play back system must grope around near the end of the stream to calculate the playback time, and assume the stream is approximately constant bit rate. The groping around for the correct position can take several seconds.

This is not appropriate for an interactive presentation or game. This groping around is at least annoying when trying to view a portion of a movie but it's not even possible for video streams because there are no time stamps (the SMPTE time codes in video streams need not to be continuous or unique).

Audio streams are always fixed bit rate.

VIDEO COMPRESSION

A video image is compressed to minimise the amount of space or data needed to store or transmit the image.

VIDEO DECOMPRESSION

To take a compressed video image and restore it to the size and format needed to view the video image.

VIDEO OVERLAY DEVICE

This is a logical device which overlays analogue video into a window on a VIA display. It may also perform frame grabbing, including saving and loading frames to and from a disk.

YCbCr

The international standard CCIR-601-1 specifies eight-bit digital coding for component video, with black at luma code 16 and white at luma code 235, and chroma in eight-bit two's complement form centred on 128 with a peak at code 224. This coding has a slightly smaller excursion for luma than for chroma: luma has 219 risers compared to 224 for Cb and Cr. The notation CbCr distinguishes this set from PbPr where the luma and chroma excursions are identical.

For Rec. 601-1 coding in eight bits per component,

Y_8b = 16 + 219 * Y

Cb_8b = 128 + 112 * (0.5/0.886) * (Bgamma - Y)

Cr_8b = 128 + 112 * (0.5/0.701) * (Rgamma - Y)

Some computer applications place black at luma code 0 and white at luma code 255. In this case, the scaling and offsets above can be changed accordingly, although broadcast-quality video requires the accommodation for headroom and footroom provided in the CCIR-601-1 equations.

CCIR-601-1 Rec. calls for two-to-one horizontal subsampling of Cb and Cr, to achieve 2/3 the data rate of RGB with virtually no perceptible penalty. This is denoted 4:2:2. A few digital video systems have utilized horizontal subsampling by a factor of four, denoted 4:1:1. JPEG and MPEG normally subsample Cb and Cr two-to-one horizontally and also two-to-one vertically, to get 1/2 the data rate of RGB. No standard nomenclature has been adopted to describe vertical subsampling. To get good results using subsampling you should not just drop and replicate pixels, but implement proper decimation and interpolation filters.

YCbCr coding is employed by D-1 component digital video equipment.

YPbPr

If three components are to be conveyed in three separate channels with identical unity excursions, then the Pb and Pr colour difference components are used:

Pb = (0.5/0.886) * (Bgamma - Y)

Pr = (0.5/0.701) * (Rgamma - Y)

These scale factors limit the excursion of EACH colour difference component to -0.5 .. +0.5 with respect to unity Y excursion: 0.886 is just unity less the luma coefficient of blue. In the analog domain Y is usually 0 mV (black) to 700 mV (white), and Pb and Pr are usually +- 350 mV.

YPbPr is part of the CCIR Rec. 709 HDTV standard, although different luma coefficients are used, and it is denoted E'Pb and E'Pr with subscript arrangement too complicated to be written here.

YPbPr is employed by component analog video equipment such as M-II and BetaCam; Pb and Pr bandwidth is half that of luma.

YIQ

The U and V signals above must be carried with equal bandwidth, albeit less than that of luma. However, the human visual system has less spatial acuity for magenta-green transitions than it does for red-cyan. Thus, if signals I and Q are formed from a 123 degree rotation of U and V respectively [sic], the Q signal can be more severely filtered than I (to about 600 kHz, compared to about 1.3 MHz) without being perceptible to a viewer at typical TV viewing distance. YIQ is equivalent to YUV with a 33 degree rotation and an axis flip in the UV plane. The first edition of W.K. Pratt "Digital Image Processing", and presumably other authors that follow that bible, has a matrix that erroneously omits the axis flip; the second edition corrects the error.

Since an analog NTSC decoder has no way of knowing whether the encoder was encoding YUV or YIQ, it cannot detect whether the encoder was running at 0 degree or 33 degree phase. In analog usage the terms YUV and YIQ are often used somewhat interchangeably. YIQ was important in the early days of NTSC but most broadcasting equipment now encodes equiband U and V.

The D-2 composite digital DVTR (and the associated interface standard) conveys NTSC modulated on the YIQ axes in the 525-line version and PAL modulated on the YUV axes in the 625-line version.

YUV

In composite NTSC, PAL or S-Video, it is necessary to scale (B-Y) and (R-Y) so that the composite NTSC or PAL signal (luma plus modulated chroma) is contained within the range -1/3 to +4/3. These limits reflect the capability of composite signal recording or transmission channel. The scale factors are obtained by two simultaneous equations involving both B-Y and R-Y, because the limits of the composite excursion are reached at combinations of B-Y and R-Y that are intermediate to primary colours. The scale factors are as follows:

U = 0.493 * (B - Y)

V = 0.877 * (R - Y)

U and V components are typically modulated into a chroma component:

C = U*cos(t) + V*sin(t)

where t represents the ~3.58 MHz NTSC colour sub-carrier. PAL coding is similar, except that the V component switches Phase on Alternate Lines (+-1), and the sub-carrier is at a different frequency, about 4.43 MHz.

It is conventional for an NTSC luma signal in a composite environment (NTSC or S-Video) to have 7.5% setup :

Y_setup = (3/40) + (37/40) * Y

A PAL signal has zero setup. The two signals Y (or Y_setup) and C can be conveyed separately across an S-Video interface, or Y and C can be combined (encoded) into composite NTSC or PAL:

NTSC = Y_setup + C

PAL = Y + C

U and V are only appropriate for composite transmission as 1-wire NTSC or PAL, or 2-wire S-Video. The UV scaling (or the IQ set, described below) is incorrect when the signal is conveyed as three separate components. Certain component video equipment has connectors labelled YUV that in fact convey YPbPr signals.

2B+D DATA RATE
This refers to a Basic Rate Interface, control i.e., a data rate standard for ISDN. It provides a customer with 144 kilobytes divided into three channels (2B channels carrying 64 kilobytes each and one D-channel assigned to 16 kilobytes of signalling information), I.441.

Graphics Multimedia Virtual Environments Visualisation Contents