|Also available in Acrobat format|
Multimedia in the Teaching Space
It is a compression technique which encodes the predictive residual instead of the original waveform signal so that the compression efficiency is improved by a predictive gain. Rather than transmitting PCM samples directly, the difference between the estimate of the next sample and the actual sample is transmitted. This difference is usually small and can thus be encoded in fewer bits than the sample itself.
A technique for encoding audio into an 8 bit word used in G.711 encoding.
ANSI works with various organisations and manufacturers of telecommunications equipment to determine domestic standards not covered by ITU.
It is software from which user interfaces (e.g., pull down menus) can be created.
Perhaps the major drawback to each of the Huffman encoding techniques is their poor performance when processing texts where one symbol has a probability of occurrence approaching unity. Although the entropy associated with such symbols is extremely low, each symbol must still be encoded as a discrete value.
Arithmetic coding removes this restriction by representing messages as intervals of the real numbers between 0 and 1. Initially, the range of values for coding a text is the entire interval [0, 1]. As encoding proceeds, this range narrows while the number of bits required to represent it expands. Frequently occurring characters reduce the range less than characters occurring infrequently, and thus add fewer bits to the length of an encoded message.
ATM is a switching/transmission technique where data is transmitted in small, fixed sized cells (5 byte header, 48 byte payload). The cells lend themselves both to the time-division-multiplexing characteristics of the transmission media, and the packet switching characteristics desired of data networks. At each switching node, the ATM header identifies a virtual path or virtual circuit that the cell contains data for, enabling the switch to forward the cell to the correct next-hop trunk. The virtual path is set up through the involved switches when two endpoints wish to communicate. This type of switching can be implemented in hardware, almost essential when trunk speed range from 45Mbps to 1Gbps.
The ATM Forum, a worldwide organization, aimed at promoting ATM within the industry and the end user community was formed in October 1991 and currently includes more than 500 companies representing all sectors of the communications and computer industries, as well as a number of government agencies, research laboratories.
These refer to the audio standards for the compression/decompression and transmission of P*64 audio signals. G.xxx classifications gives these standards.
A measurement expressed in bits per second (bps) on the amount of information that can flow through a channel.
Bearer Channels. A 64Kbps channel in an ISDN line.
The rate at which the compressed bit-stream is delivered from the storage medium to the input of a decoder.
An 8-row by 8-column matrix of pels, or 64 DCT coefficients (source, quantised or dequantised).
A data rate standard for ISDN. It provides a customer with 144Kbps divided into three channels (2B channels carrying 64Kbps each and one D-channel assigned to 16Kb of signalling information).
A motion vector that is used for motion compensation from a reference picture at a later time in display order.
Within the H.242 standard, capability set is used to define the set of functions which the audio-visual end point supports. At the initiation of an audio-visual call, the end points exchange their respective sets of information and establish a call within the bounds of their mutual capability sets.
Commite' Consultatif International de Telecommunications et Telegraphy A committee of the International Telecommunications Union responsible for making technical recommendations about telephone and data communication systems for PTTs and suppliers. Plenary sessions are held every four years to adopt new standards.
A format for displaying an image on a screen. CIF has an image resolution of 352 by 288 pixels at 30 frames per second. This format is optional within the H.261 standard.
The standardisation of the structure of the samples that represent the picture information of a single frame in digital HDTV, independent of frame rate and sync/blank structure. The uncompressed bit rates for transmitting CIF at 29.97 frames/sec is 36.45 Mbps.
Programmable time division multiplex communication bus used by Lucent Technologies for interconnecting telecommunication chips. Can be programmed to act as one channel of an MVIP.
A board which can connect to one or two MVIP or concentration highway connectors and emulates a common interface by supplying the clock and by connecting the CHI or MVIP back to itself (loop) or to another CHI or MVIP (cross connect). Useful for testing where actual communication lines are not available.
A matrix, block or single pel representing one of the two colour difference signals related to the primary colours in the manner defined in the bit-stream. The symbols used for the colour difference signals are Cr and Cb.
Data Signalling Channel. A 16Kbps channel in an ISDN line.
Discrete Cosine Transform, used in Fourier Analysis
A digital storage or transmission device or system.
The technique of processing data as numbers instead of voltages.
A technique used to prevent a speaker from hearing a delayed version of his own speech. Echo cancellation is required in video telephony due to the delays required by the video.
Entropy, the average amount of information represented by a symbol in a message, is a function of the model used to produce that message and can be reduced by increasing the complexity of the model so that it better reflects the actual distribution of source symbols in the original message.
Entropy is a measure of the information contained in message, it's the lower bound for compression.
Fast Fourier Transform
For an interlaced video signal, a field is the assembly of alternate lines of a frame. Therefore an interlaced frame is composed of two fields a top field and a bottom field.
A motion vector that is used for motion compensation from a reference picture at an earlier time in display order.
The reciprocal of the frame rate.
The rate at which frames are be output from the decoding process.
A future reference picture is a reference picture that occurs at a later time than the current picture in display order.
A standard for compressing and decompressing audio (50 -3000Hz) into a 48, 56, or 64Kbps stream.
A standard for compressing and decompressing audio (50 - 7000 Hz) into a 48, 56, or 64Kbps stream.
A standard for compressing and decompressing audio (50 - 3000 Hz) into a 16Kbps stream.
The family of audio-related ITU standards. It includes G.711, G.722, and G.728.
The family of ITU standards for use of video equipment (over 64 to 1920Kbps channels) during conferencing. Frequently referred to as P*64.
The ITU-T standard for far end camera control in an H.320 conference.
The ITU recommended standard for narrow-band visual telephone systems and terminal equipment.
For a given character distribution, by assigning short codes to frequently occurring characters and longer codes to infrequently occurring characters, Huffman's minimum redundancy encoding minimises the average number of bytes required to represent the characters in a text.
Static Huffman encoding uses a fixed set of codes, based on a representative sample of data, for processing texts. Although encoding is achieved in a single pass, the data on which the compression is based may bear little resemblance to the actual text being compressed.
Dynamic Huffman encoding, on the other hand, reads each text twice; once to determine the frequency distribution of the characters in the text and once to encode the data. The codes used for compression are computed on the basis of the statistics gathered during the first pass with compressed texts being prefixed by a copy of the Huffman encoding table for use with the decoding process.
By using a single-pass technique, where each character is encoded on the basis of the preceding characters in a text, Gallager's adaptive Huffman encoding avoids many of the problems associated with either the static or dynamic method.
The property of conventional television frames where alternating lines of the frame represent different instances in time.
Coding of a macroblock or picture that uses information only from that macroblock or picture.
Input/Output Space. The address at which the computer communicates with an add-in card.
A picture coded using information only from itself.
ISDN is a CCITT term for a relatively new telecommunications service package. ISDN is basically the telephone network turned all-digital end to end, using existing switches and wiring (for the most part) upgraded so that the basic call is a 64Kbps end-to-end channel, with bit-diddling as needed. Packet and maybe frame modes are thrown in for good measure, too, in some places. It's offered by local telephone companies, but most readily in Australia, France, Japan, and Singapore, with the UK and Germany somewhat behind, and USA availability rather spotty.
A Basic Rate Interface (BRI) is two 64K bearer (B) channels and a single delta (D) channel. The B channels are used for voice or data, and the D channel is used for signalling and/or X.25 packet networking. This is the variety most likely to be found in residential service. Another flavour of ISDN is Primary Rate Interface (PRI). Inside the US, this consists of 24 channels, usually divided into 23 B channels and 1 D channel, and runs over the same physical interface as T1. Outside of the US then PRI has 31 user channels, usually divided into 30 B channels and 1 D channel. It is typically used for connections such as one between a PBX and a CO or IXC.
International Telecommunications Union, formerly CCITT, a body of the United Nations.
A matrix, block or single pel representing a monochrome representation of the signal and related to the primary colours in the manner defined in the bitstream. The symbol used for luminance is Y.
The four 8 by 8 blocks of luminance data and the two (for 4:2:0 chroma format), four (for 4:2:2 chroma format) or eight (for 4:4:4 chroma format) corresponding 8 by 8 blocks of chrominance data coming from a 16 by 16 section of the luminance component of the picture. Macroblock is sometimes used to refer to the pel data and sometimes to the coded representation of the pel values and other data elements defined in the macroblock header. The usage should be clear from the context.
The use of motion vectors to improve the efficiency of the prediction of pel values. The prediction uses motion vectors to provide offsets into the past and/or future reference pictures containing previously decoded pel values that are used to form the prediction error signal.
The book Motion analysis for Image Sequence Coding by G.Tziritas and C.Labit documents the technical advances made through the years in dealing with motion in image sequences.
The process of estimating motion vectors during the encoding process.
A two-dimensional vector used for motion compensation that provides an offset from the coordinate position in the current picture to the coordinates in a reference picture.
Multi-Vendor Integration Protocol. An 8 channel time division multiplex communication bus which can be used to connect various digital communication boards in a PC.
Coding of a macroblock or picture that uses information both from itself and from macroblocks and pictures occurring at other times.
Network Terminating Device. The ISDN telephone line from your local exchange carrier connects to your system through an NT1. An NT1 performs network performance and integrity checks. It enables loopback testing which verifies your digital line is connected and working properly. Outside the US the NT1 is considered part of the network and is installed by the telephone company. In the US, it is considered Customer Premises Equipment (CPE). It is common to change the NT1 when trying to diagnose a dead connection if you previously had a working connection. Some hardware manufacturers build the NT1 into their device; this may limit your flexibility in adding services to your ISDN line. The NT1 is microprocessor controlled and requires its own power source.
USA video standard with image format 4:3, 525 lines, 60 Hz and 4 Mhz video bandwidth with a total 6 Mhz of video channel width. NTSC uses YIQ NTSC-1 was set in 1948. It increased the number of scanning lines from 441 to 525, and replaced AM-modulated sound with FM.
Source, coded or reconstructed image data. A source or reconstructed picture consists of three rectangular matrices of 8-bit numbers representing the luminance and two chrominance signals. For progressive video, a picture is identical to a frame, while for interlaced video, a picture can refer to a frame, the top field or the bottom field of the frame depending on the context.
The use of a predictor to provide an estimate of the pel value or data element currently being decoded.
A picture that is coded using motion compensated prediction from past reference pictures.
The difference between the actual value of a pel or data element and its predictor.
A linear combination of previously decoded pel values or data elements.
A defined sub-set of the syntax of a specification.
Quarter Common source Intermediate Format (1/4 CIF , i.e. luminance information is coded at 144 lines and 176 pixels per line at 30 frames per second.). The uncompressed bit rates for transmitting QCIF at 29.97 frames/sec is 9.115 Mbit/s. This format is required by the H.261 standard.
A set of sixty-four 8-bit values used by the dequantiser.
DCT coefficients before dequantisation. A variable length coded representation of quantised DCT coefficients is storedas part of the compressed video bitstream.
A scale factor coded in the bitstream and used by the decoding process to scale the dequantisation.
Reference pictures are the nearest adjacent I or P pictures to the current picture in display order.
Scalability is the ability of a decoder to decode an ordered set of bitstreams to produce a reconstructed sequence. Moreover, useful video is output when subsets are decoded. The minimum subset that can thus be decoded is the first bitstream in the set which is called the base layer. Each of the other bitstreams in the set is called an enhancement layer. When addressing a specific enhancement layer, lower layer refer to the bitstream which precedes the enhancement layer.
Service Profile Identifiers which are used to identify what sort of services and features the switch provides to the ISDN device. When a new subscriber is added, the service representative will allocate a SPID just as they allocate a directory number. The subscriber needs to input the SPIDs into their terminal device before they will be able to connect to the central office switch (this is referred to as initialising the device).
Sub-band coding for images has roots in work done in the 1950s by Bedford and on Mixed Highs image compression done by Kretzmer in 1954. Schreiber and Buckley explored general two channel coding of still pictures where the low spatial frequency channel was coarsely sampled and finely quantized and the high spatial frequency channel was finely sampled and coarsely quantized. More recently, Karlsson and Vetterli have extended this to multiple subbands. Adelson et al. have shown how a recursive subdivision called a pyramid decomposition can be used both for compression and other useful image processing tasks.
A pure sub-band coder performs a set of filtering operations on an image to divide it into spectral components. Usually, the result of the analysis phase is a set of sub-images, each of which represents some region in spatial or spatio-temporal frequency space. For example, in a still image, there might be a small sub-image that represents the low-frequency components of the input picture that is directly viewable as either a minified or blurred copy of the original. To this are added successively higher spectral bands that contain the edge information necessary to reproduce the original sharpness of the original at successively larger scales. As with DCT coder, to which it is related, much of the image energy is concentrated in the lowest frequency band.
For equal visual quality, each band need not be represented with the same signal-to-noise ratio; this is the basis for sub-band coder compression. In many coders, some bands are eliminated entirely, and others are often compressed with a vector or lattice quantizer. Succeedingly higher frequency bands are more coarsely quantized, analogous to the truncation of the high frequency coefficients of the DCT. A sub-band decomposition can be the intraframe coder in a predictive loop, thus minimizing the basic distinctions between DCT-based hybrid coders and their alternatives.
Defines a series of communication protocols for data conferencing. The protocol closest to the hardware is T.123 which provides reliable transport of data between end points, over various types of media including modems, ISDN, and LANs. T.125, Multipoint Control Service, co-ordinates and synchronises the various participants in a multipoint call. T.124, Generic Conference Control, provides setup and control of a conference. T.126 provides whiteboarding and graphic image annotation. T.127 provides file transfer.
A type of scalability where an enhancement layer also uses predictions from pel data derived from a lower layer using motion vectors. The layers have identical frame rates size, and chroma formats, but can have different frame rates.
One of two fields that comprise a frame of interlaced video. Each line of a top field is spatially located immediately above the corresponding line of the bottom field.
Operation where the bitrate varies with time during the decoding of a compressed bitstream.
Although variable bit rate is acceptable for plain linear playback, one important consideration not to use variable bit rate is that reasonably quick random access becomes nearly impossible. There is no table of contents or index in MPEG. The only tool the play back system has for approximating the correct byte position is the requested play back time stamp and the bit rate of the MPEG stream. MPEG streams do not encode their play back time.
To approximate an intermediate position in a variable bit rate stream, the play back system must grope around near the end of the stream to calculate the playback time, and assume the stream is approximately constant bit rate. The groping around for the correct position can take several seconds.
This is not appropriate for an interactive presentation or game. This groping around is at least annoying when trying to view a portion of a movie but it's not even possible for video streams because there are no time stamps (the SMPTE time codes in video streams need not to be continuous or unique).
Audio streams are always fixed bit rate.
A video image is compressed to minimise the amount of space or data needed to store or transmit the image.
To take a compressed video image and restore it to the size and format needed to view the video image.
This is a logical device which overlays analogue video into a window on a VIA display. It may also perform frame grabbing, including saving and loading frames to and from a disk.
The international standard CCIR-601-1 specifies eight-bit digital coding for component video, with black at luma code 16 and white at luma code 235, and chroma in eight-bit two's complement form centred on 128 with a peak at code 224. This coding has a slightly smaller excursion for luma than for chroma: luma has 219 risers compared to 224 for Cb and Cr. The notation CbCr distinguishes this set from PbPr where the luma and chroma excursions are identical.
For Rec. 601-1 coding in eight bits per component,
Y_8b = 16 + 219 * Y
Cb_8b = 128 + 112 * (0.5/0.886) * (Bgamma - Y)
Cr_8b = 128 + 112 * (0.5/0.701) * (Rgamma - Y)
Some computer applications place black at luma code 0 and white at luma code 255. In this case, the scaling and offsets above can be changed accordingly, although broadcast-quality video requires the accommodation for headroom and footroom provided in the CCIR-601-1 equations.
CCIR-601-1 Rec. calls for two-to-one horizontal subsampling of Cb and Cr, to achieve 2/3 the data rate of RGB with virtually no perceptible penalty. This is denoted 4:2:2. A few digital video systems have utilized horizontal subsampling by a factor of four, denoted 4:1:1. JPEG and MPEG normally subsample Cb and Cr two-to-one horizontally and also two-to-one vertically, to get 1/2 the data rate of RGB. No standard nomenclature has been adopted to describe vertical subsampling. To get good results using subsampling you should not just drop and replicate pixels, but implement proper decimation and interpolation filters.
YCbCr coding is employed by D-1 component digital video equipment.
If three components are to be conveyed in three separate channels with identical unity excursions, then the Pb and Pr colour difference components are used:
Pb = (0.5/0.886) * (Bgamma - Y)
Pr = (0.5/0.701) * (Rgamma - Y)
These scale factors limit the excursion of EACH colour difference component to -0.5 .. +0.5 with respect to unity Y excursion: 0.886 is just unity less the luma coefficient of blue. In the analog domain Y is usually 0 mV (black) to 700 mV (white), and Pb and Pr are usually +- 350 mV.
YPbPr is part of the CCIR Rec. 709 HDTV standard, although different luma coefficients are used, and it is denoted E'Pb and E'Pr with subscript arrangement too complicated to be written here.
YPbPr is employed by component analog video equipment such as M-II and BetaCam; Pb and Pr bandwidth is half that of luma.
The U and V signals above must be carried with equal bandwidth, albeit less than that of luma. However, the human visual system has less spatial acuity for magenta-green transitions than it does for red-cyan. Thus, if signals I and Q are formed from a 123 degree rotation of U and V respectively [sic], the Q signal can be more severely filtered than I (to about 600 kHz, compared to about 1.3 MHz) without being perceptible to a viewer at typical TV viewing distance. YIQ is equivalent to YUV with a 33 degree rotation and an axis flip in the UV plane. The first edition of W.K. Pratt "Digital Image Processing", and presumably other authors that follow that bible, has a matrix that erroneously omits the axis flip; the second edition corrects the error.
Since an analog NTSC decoder has no way of knowing whether the encoder was encoding YUV or YIQ, it cannot detect whether the encoder was running at 0 degree or 33 degree phase. In analog usage the terms YUV and YIQ are often used somewhat interchangeably. YIQ was important in the early days of NTSC but most broadcasting equipment now encodes equiband U and V.
The D-2 composite digital DVTR (and the associated interface standard) conveys NTSC modulated on the YIQ axes in the 525-line version and PAL modulated on the YUV axes in the 625-line version.
In composite NTSC, PAL or S-Video, it is necessary to scale (B-Y) and (R-Y) so that the composite NTSC or PAL signal (luma plus modulated chroma) is contained within the range -1/3 to +4/3. These limits reflect the capability of composite signal recording or transmission channel. The scale factors are obtained by two simultaneous equations involving both B-Y and R-Y, because the limits of the composite excursion are reached at combinations of B-Y and R-Y that are intermediate to primary colours. The scale factors are as follows:
U = 0.493 * (B - Y)
V = 0.877 * (R - Y)
U and V components are typically modulated into a chroma component:
C = U*cos(t) + V*sin(t)
where t represents the ~3.58 MHz NTSC colour sub-carrier. PAL coding is similar, except that the V component switches Phase on Alternate Lines (+-1), and the sub-carrier is at a different frequency, about 4.43 MHz.
It is conventional for an NTSC luma signal in a composite environment (NTSC or S-Video) to have 7.5% setup :
Y_setup = (3/40) + (37/40) * Y
A PAL signal has zero setup. The two signals Y (or Y_setup) and C can be conveyed separately across an S-Video interface, or Y and C can be combined (encoded) into composite NTSC or PAL:
NTSC = Y_setup + C
PAL = Y + C
U and V are only appropriate for composite transmission as 1-wire NTSC or PAL, or 2-wire S-Video. The UV scaling (or the IQ set, described below) is incorrect when the signal is conveyed as three separate components. Certain component video equipment has connectors labelled YUV that in fact convey YPbPr signals.
2B+D DATA RATE
Graphics Multimedia Virtual Environments Visualisation Contents