DRM audio coding
All audio compression is based on the psycho-acoustic premise that there is an audio ‘masking effect’ where the ear cannot hear certain frequencies within a critical band when there is louder audio signal close by. This critical band gets wider as the frequency gets higher. If the ear cannot hear these masked frequencies then why transmit this audio information.
DRM uses aacPlus Coding Technologies audio compression. Essentially aacPlus is two perceptual audio codecs (coding decoding)-
MPEG-4 gives a more efficient audio compression than previous MPEG layer 2/3 codecs.
SBR is a technique that increases bandwidth of the received audio signal from a combination of the lower bandwidth AAC signal (sampled at 24 kHz) and SBR data that reconstructs the higher audio frequency. SBR uses a 48 kHz sample rate.
When encoding music the fundamental frequencies are usually below 6-8 kHz (depending on the audio source). These lower frequencies are coded using AAC. Frequencies above this usually contain mostly harmonic sound. Although these frequencies are required to reconstruct the sound the ear is less sensitive to these higher frequencies. Hence less SBR data needs to be transmitted in comparison to the AAC codec.
This is an example of ‘lossy’ coding where some information is not transmitted and is permanently lost. The source and destination audio data is not bit identical but, in theory, we cannot hear any difference. In practice some artefacts are audible when transmitting music, particularly when in stereo. Some types of music compresses better than others. At low bit rates AAC+SBR seems to work best on classical music.
SBR can also be used to improve audio bandwidth when used with other codecs although I have heard some AAC+SBR DRM broadcasts that have excessive SBR sibilance making the audio sound very harsh.
Approximately 3 kbps (kilo bits per second) of SBR data is added to include information on the frequency spectrum and audio that is not harmonically related to information being transmitted by the other codec.
aacPlus is increasingly being used for internet radio audio streaming with 48 kbps giving near CD quality (similar in audio quality with 128 kbps MPEG layer 3, equivalent to 192 kbps MPEG layer 2 (as used by DAB)).
Parametric Stereo
With Parametric Stereo (PS) the data describing the stereo image is transmitted alongside a mono signal derived from the stereo audio.
Parametric Stereo coding uses two parameters to describe the stereo image - Panorama (Pan) and Stereo Ambiance (SA). The Pan parameter contains information about the frequency selective level differences between left and right audio channels, while the SA parameter contains frequency selective information about the stereo ambiance as this tends to be lost when stereo signals are mixed to mono.
The Pan and SA parameters and some control bits are updated every SBR frame. Control data overhead is only 50 bps (bits per second).
For complex stereo music the average PS bitrate will be around 1.2 kbps up to a maximum peak of 2.5 kbps (maximum can be preset in the encoder), less is needed if the music stereo image is closer to mono. Stereo information requires a small fraction of the available bitrate allowing the mono signal to have maximum quality.
The actual sound quality is determined by the bit rate allocated to the audio service. This in turn is determined by the DRM mode, QAM level, error protection level (code rate), and spectrum bandwidth used for the broadcast (minimum is 4.5 kHz).
CELP and HVXC codecs
For speech only broadcasts the aacPlus codec might not be the best choice. The DRM specification also has MPEG-4 CELP (Code Excited Linear Predication) and MPEG-4 HVXC (Harmonic Vector eXcitation Coding) codecs suitable for speech only (mono only). These are based on codecs used in cell/mobile phones.
There have been very few broadcasts using these codecs but no regular DRM broadcasts uses either CELP or HVXC. In theory SBR can be used with either codec.
CELP supports 8 kHz or 16 kHz sample rate, which gives 100 Hz to 3800 Hz (telephone quality) audio at 8 kHz or at the higher sample rate gives 50 Hz to 7000 Hz audio. HVXC is telephone quality only (100 Hz to 3800 Hz) suitable for speech with an 8 kHz sampling rate.
As DRM can support 4 services it is possible to have a separate audio voice being transmitted as well as the main audio/data service.
Example of a Deutsche Welle DW-Bonn broadcast from Sines, Portugal with two different audio services. The second audio service is using the HVXC speech audio codec.

Another example of DW-Bonn (also via Sines transmitter) broadcasting 4 different audio streams in German, English, French, Russian using the HVXC speech only audio codec.
|