The codec defines only how the voice is compressed and packaged. The voice still needs to be placed into well-defined packets and sent over the network.
The Real-time Transport Protocol (RTP), defined in RFC 3550, defines how voice is packetized on most IP-based networks. RTP is a general-purpose framework for sending real-time streaming traffic across networks, and is used for nearly all media streaming, including voice and video, where real-time delivery is essential.
RTP is usually sent over UDP, on any port that the applications negotiate. The typical RTP packet has the structure given in Table 1.
Flags | Sequence Number | Timestamp | SSRC | CSRCs | Extensions | Payload |
---|---|---|---|---|---|---|
2 bytes | 2 bytes | 4 bytes | 4 bytes | 4 bytes × number of contributors | variable | variable |
The idea behind RTP is that the sender sends the timestamp that the first byte of data in the payload belongs to. This timestamp gives a precise time that the receiver can use to reassemble incoming data. The sequence number also increases monotonically, and can also establish the order of incoming data. The SSRC, for Synchronization Source, is the stream identifier of the sender, and lets devices with multiple streams coming in figure out who is sending. The CSRCs, for Contributing Sources, are other devices that may have contributed to the packet, such as when a conference call has multiple talkers at once.
The most important fields are the timestamp (see Table 2) and the payload type (see Table 3). The payload type field usually specifies the type of codec being used in the stream.
Version | Padding | Extension (X) | Contributor Count (CC) | Marked | Payload Type (PT) | |
---|---|---|---|---|---|---|
Bit: | 0-1 | 2 | 3 | 4-7 | 8 | 9-15 |
Table 3 shows the most common voice RTP types. Numbers greater than 96 are allowed, and are usually set up by the endpoints to carry some dynamic stream.
Payload Type | Encoded Name | Meaning |
---|---|---|
0 | PCMU | G.711 with μ-law |
3 | GSM | GSM |
8 | PCMA | G.711 with A-law |
18 | G729 | G.729 or G.729a |
When the codec's output is packaged into RDP, it is done so to both avoid splitting necessary information and causing too many packets per second to be sent. For G.711, an RTP packet can be created with as many samples as desired for the given packet rate. Common values are 20ms and 30ms. Decoders know to append the samples across packets as if they were in one stream. For G.729, the RTP packet must come in 10ms multiples, because G.729 only encodes 10ms blocks. An RTP packet with G.729 can have multiple blocks, and the decoder knows to treat each block separately and sequentially. G.729 phones commonly stream with RTP packets holding 20ms or larger, to avoid having too many packets in the network.
1 Secure RTP
RTP itself has a security option, designed to allow the contents of the RTP stream to be protected while still allowing the quick reassembly of a stream and the robustness of allowing parts of the stream to be lost on the network. Secure RTP (SRTP) uses the Advanced Encryption Standard (AES) to encrypt the packets. (AES will later have a starring role in Wi-Fi encryption, as well as for use with IPsec.) The RTP stream requires a key to be established. Each packet is then encrypted with AES running in counter mode, a mode where intervening packets can be lost without disrupting the decryptability of subsequent packets in the sequence. Integrity of the packets is ensured by the use of theHMAC-SHA1 keyed signature, for each packet.
How the SRTP stream gets its keys is not specified by SRTP. However, SIPS provides a way for this to be set up that is quite logical.
No comments:
Post a Comment