Jitter is the variation in delays that the receiver experiences. Jitter is a nuisance that the user does not hear directly, because the phones employ a jitter buffer to correct for any delays. Jitter can be defined in a number of ways. One way is to use the standard deviation or maximum deviation around the mean delay per packet. Another way is to use the known arrival intervals (such as 20ms), and subtract consecutive delays of packets that were not lost from the known arrival time, then take the standard deviation or the maximum deviation. Either way, the jitter, measured in times or percentages against the mean, tells how variable the network is.
Jitter is introduced by variable queuing delays within network equipment. Phones and PBXs are well known for having very regular transmission intervals. However, the intervening network may have variable traffic. As the queue depths change and the network loads fluctuate, and as contention-based media such as Wi-Fi links clog with density, packets are forced to wait. Wireless networks are the biggest culprit for introducing delay into an enterprise private network. This is because wireless packets can be lost and retransmitted, and the time it takes to retransmit a packet can usually be measured in units of a millisecond.
A jitter buffer's job is to sit on the receiver and prevent the jitter from causing an underrun of the voice decoder. An underrun is an awkward period of silence that happens when the phone has finished playing the previous packet and needs another packet to play, but one has not yet arrived. These underruns count as a form of error or loss, even if every packet does make it to the receiver, and loss concealment will work to disguise them. The problem with jitter becomes that an underrun must be followed by an increase in delay of the same amount, assuming no packets are lost. This can be seen by realizing that the delayed packet will hold up the line for packets behind it.
Here, the value of the jitter buffer can be seen. The jitter buffer lets the receiver build up a slight delay in the output. If this delay is greater than the amount of actual jitter on the network, the jitter buffer will be able to smooth things out without underruning.
In this sense, the jitter buffer converts jitter directly into delay. If the jitter becomes too large, the jitter buffer may have limited room, and start dropping earlier samples in the buffer to let the call catch up to be closer to real time. In this way, the jitter buffer can convert the jitter directly into loss.
Because jitter is always converted into delay first, then loss, it does not have a direct impact on the E-model by itself, but instead can be folded in to the other measures. However, the complication arises because the user or administrator does not usually know the exact parameters of the jitter buffer. How many samples, how much delay, will the jitter buffer take before it starts to drop audio? Does the jitter buffer start off with a fixed delay? Does it build up the delay as jitter forces it to? Or does it try to proactively build in some delay, which can grow or shrink as the underruns occur? These all have an impact on the E-model call quality.
As a result, a rule of thumb here is to match the jitter tolerance to the delay tolerance. The network, at least, should not introduce more than 50ms of jitter.
No comments:
Post a Comment