Transform Coding

Transform coding is used to convert spatial image pixel values to transform coefficient values.  Since this is a linear process and no information is lost, the number of coefficients produced is equal to the number of pixels transformed.

The desired effect is that most of the energy in the image will be contained in a few large transform coefficients.  If it is generally the same few coefficients that contain most of the energy in most pictures, then the coefficients may be further coded by lossless   entropy coding .  In addition, it is likely that the smaller coefficients can be coarsely quantized or deleted ( lossy  coding ) without doing visible damage to the reproduced image.

Note: the energy of a pixel may be defined as the square of its value times some scaling factor.  Similarly, the energy of a transform coefficient may be defined as the square of its value times some scaling factor.  With the proper scaling factor, the total energy of the pixels in a picture will always equal the total energy in the transform coefficients.  The transform process simply concentrates the energy into particular coefficients, generally the "low frequency" ones.

Many types of transforms have been tried for picture coding, including for example Fourier, Karhonen-Loeve, Walsh-Hadamard, lapped orthogonal, discrete cosine (DCT), and recently, wavelets.  The various transforms differ among themselves in three basic ways that are of interest in picture coding:

1) the degree of concentration of energy in a few coefficients;

2) the region of influence of each coefficient in the reconstructed picture;

3) the appearance and visibility of coding noise due to coarse quantization of the coefficients.

Karhunen-Loeve is a statistically based transform method that can be tailored to one image or group of images, and therefore has the optimum energy concentration.  However, it generally will not have this optimum concentration for images not in the basis set.  

Fourier transforms (discrete) have good energy concentration characteristics, but become unwieldy when dealing with large images requiring large numbers of coefficients.  Block transforms, which work on a small portion of the image at a time, are therefore preferred.  The discrete Fourier transform may be applied to a block of pixels.  Other transforms which fall in this category are Walsh-Hadamard, and the DCT.  The lapped orthogonal transforms are a special case in which the coefficients' influence is confined to a few adjacent blocks, with a tapering-off influence toward the edges.

Because of ease of hardware computation and generally very good energy concentration for a wide range of natural images, the DCT has become the transform of choice for many image coding schemes, including MPEG.  The DCT, unlike the Fourier transform, is spatially variant.  A portion of a sine wave coded with a Fourier transform has all the energy concentrated at the same frequency coefficients regardless of the phase of the sinusoid (although the energy will be apportioned differently between the sine and cosine components).  The DCT, on the other hand, is sensitive to phase, so that an object moving across the screen will have different frequency content from frame to frame.  This also means that the visibility of coding artifacts due to coefficient quantization will vary somewhat depending on the position of an object (edge) in the image.  Also, because the DCT is a strictly bounded block transform, lossy coding will produce block-edge mismatch which will be visible at some level of quantization even if there is only low frequency content in that area.

NEXT - Lossy Coding