Lossy Coding

For lossless coding to be successful, it must be presented with source material which generally fits the statistics for which it was designed.  In most practical cases of audio or video coding, this means a source in which low-frequency content dominates.  

Because the statistics of real sources may not be constant (for example, music may contain cymbal crashes, or pictures may contain high contrast texture or text), the bit rate of the coder output will generally vary.  Buffering between the coder output and the transmission or storage medium is helpful to absorb peaks in bit rate.  However, it is always possible to "break" the system by presenting it with source material that does not match the preferred statistics.  In a system designed to be lossless, there may be a need to handle these extreme cases with a controlled failure mode, that is, by controlling what bits are discarded; but the design goal is to avoid this crash for the great majority of source material.

Lossy coding assumes that some information can always be discarded.  This results in a controlled degradation of the decoded signal; instead of crashing, the system is designed to gradually degrade as less and less bit rate is available for transmission (or as more and more is required for difficult sources).  The goal is a reproduction that is visually or aurally indistinguishable from the source, or failing that, one that has artifacts of the least possible objectionability.

 

Lossy coding of pictures could be based on completely eliminating some DCT coefficients, but it has been found to be better to adjust the quantizing coarseness of the coefficients, with the extreme case being a quantizing step so big that the particular coefficient is quantized to zero.

The design of the system to discard or coarsely quantize data depends on several factors, a few of which are:

1) what form of the data is most efficient to calculate (e.g., DCT coefficients);

2) how much energy is concentrated in particular coefficients (so others are likely to be unnoticed if discarded, because they contain only small signal energy);

3) what is the relative visibility/audibility of discarding various data/coefficients of equal energy;

4) is the visibility/audibility of artifacts strongly affected by joint spectral/spatial/temporal effects, or can data discard be based only on transform coefficients;

5) if joint effects are a problem, do they need to be explicitly considered, or is there a method based on the transform coefficients that takes the other factors (at least partly) into account by default;

6) in the chosen design(s), what is the final result of artifact levels vs. data rate reduction.

Item number 6 above is referred to as "rate-distortion theory".  It can usually be studied mathematically only by using statistical models of the system that result in predictions of root-mean-square (rms) coding noise (difference between the original and decoded signals - coding error ) as a function of bits/pel (or bits/audio sample) and entropy of the source.  In computer simulations of a system, rate/distortion curves can be measured directly. In this case, peak signal to noise ratio (PSNR) is often used, as it may have slightly more relation to visibility.

While rms signal to noise ratio or PSNR can give a reasonable comparison of the performance of a particular system with some varied parameters, these criteria fail in the comparison of different systems, because the visibility/audibility of different kinds of artifacts will generally be different even though their PSNR or rms signal-to-noise ratio is the same.  This is because of the interaction of spatial, temporal and frequency effects as speculated in item number 4 above.  In actuality, the human visual and auditory systems have many successive layers of processing as well as some parallel processing functions at particular layers.  The task of designing a lossy coding system that produces undetectable errors in all these processes with minimum transmitted bits is difficult and nowhere near completely solved. Note that even in this case we are only trying to make a good reproduction of a video image, which in its best original condition contains enough artifacts to be instantly distinguishable from a real-world scene.

The result is that it is extremely difficult to develop an objective measure of lossy coder performance.  The only reliable method available to date requires subjective viewing/auditioning under carefully controlled repeatable conditions, with a large group of viewers/auditors.  Some early methods/hardware were developed that gave results reasonably correlated to subjective tests when confined to a particular system under non-extreme conditions.  

Commercial image quality measurement hardware has been introduced based on visual models of luminance, temporal effects and full color effects, which outputs results in terms of JND's between an uncoded reference picture and the encoded/decoded version.  These units are useful especially if there is a need to contractually guarantee a measured quality of service.

NEXT - Masking