Monday, June 27, 2011

Bearer Protocols in Detail


The bearer protocols are where the real work in voice gets done. The bearer channel carries the voice, sampled by microphones as digital data, compressed in some manner, and then placed into packets which need to be coordinated as they fly over the networks.
Voice, as you know, starts off as sound waves (Figure 1). These sound waves are picked up by the microphone in the handset, and are then converted into electrical signals, with the voltage of the signal varying with the pressure the sound waves apply to the microphone.
The signal (see Figure 2) is then sampled down into digital, using an analog-to-digital converter. Voice tends to have a frequency around 3000 Hz. Some sounds are higher—music especially needs the higher frequencies—but voice can be represented without significant distortion at the 3000Hz range. Digital sampling works by measuring the voltage of the signal at precise, instantaneous time intervals. Because sound waves are, well, wavy, as are the electrical signals produced by them, the digital sampling must occur at a high enough rate to capture the highest frequency of the voice. As you can see in the figure, the signal has a major oscillation, at what would roughly be said is the pitch of the voice. Finer variations, however, exist, as can be seen on closer inspection, and these variations make up the depth or richness of the voice. Voice for telephone communications is usually limited to 4000 Hz, which is high enough to capture the major pitch and enough of the texture to make the voice sound human, if a bit tinny. Capturing at even higher rates, as is done on compact discs and music recordings, provides an even stronger sense of the original voice.

 
Figure 2: Example Voice Signal, Zoomed in Three Times
Sampling audio so that frequencies up to 4000 Hz can be preserved requires sampling the signal at twice that speed, or 8000 times a second. This is according to the Nyquist Sampling Theorem. The intuition behind this is fairly obvious. Sampling at regular intervals is choosing which value at those given instants. The worst case for sampling would be ifone sampled a 4000 Hz, say, sine wave at 4000 times a second. That would guarantee to provide a flat sample, as the top pair of graphs in Figure 3 shows. This is a severe case of undersampling, leading to aliasing effects. On the other hand, a more likely signal, with a more likely sampling rate, is shown in the bottom pair of graphs in the same figure. Here, the overall form of the signal, including its fundamental frequency, is preserved, but most of the higher-frequency texture is lost. The sampled signal would have the right pitch, but would sound off.

 
Figure 3: Sampling and Aliasing
The other aspect to the digital sampling, besides the 8000 samples-per-second rate, is the amount of detail captured vertically, into the intensity. The question becomes how many bits of information should be used to represent the intensity of each sample. In the quantization process, the infinitely variable, continuous scale of intensities is reduced to a discrete, quantized scale of digital values. Up to a constant factor, corresponding to the maximum intensity that can be represented, the common value for quantization for voice is to 16 bits, for a number between 215 = 32,768 to 215 1 = 32,767.

The overall result is a digital stream of 16-bit values, and the process is called pulse code modulation (PCM), a term originating in other methods of encoding audio that are no longer used.

No comments:

Post a Comment