Monday, July 4, 2011

Real-time Transport Protocol | RTP


The codec defines only how the voice is compressed and packaged. The voice still needs to be placed into well-defined packets and sent over the network.
The Real-time Transport Protocol (RTP), defined in RFC 3550, defines how voice is packetized on most IP-based networks. RTP is a general-purpose framework for sending real-time streaming traffic across networks, and is used for nearly all media streaming, including voice and video, where real-time delivery is essential.
RTP is usually sent over UDP, on any port that the applications negotiate. The typical RTP packet has the structure given in Table 1.
Table 1: RTP Format 
Flags
Sequence Number
Timestamp
SSRC
CSRCs
Extensions
Payload
2 bytes
2 bytes
4 bytes
4 bytes
4 bytes × number of contributors
variable
variable
The idea behind RTP is that the sender sends the timestamp that the first byte of data in the payload belongs to. This timestamp gives a precise time that the receiver can use to reassemble incoming data. The sequence number also increases monotonically, and can also establish the order of incoming data. The SSRC, for Synchronization Source, is the stream identifier of the sender, and lets devices with multiple streams coming in figure out who is sending. The CSRCs, for Contributing Sources, are other devices that may have contributed to the packet, such as when a conference call has multiple talkers at once.
The most important fields are the timestamp (see Table 2) and the payload type (see Table 3). The payload type field usually specifies the type of codec being used in the stream.
Table 2: The RTP Flags Field 
 
Version
Padding
Extension (X)
Contributor Count (CC)
Marked
Payload Type (PT)
Bit:
0-1
2
3
4-7
8
9-15
Table 3 shows the most common voice RTP types. Numbers greater than 96 are allowed, and are usually set up by the endpoints to carry some dynamic stream.
Table 3: Common RTP Packet Types 
Payload Type
Encoded Name
Meaning
0
PCMU
G.711 with μ-law
3
GSM
GSM
8
PCMA
G.711 with A-law
18
G729
G.729 or G.729a
When the codec's output is packaged into RDP, it is done so to both avoid splitting necessary information and causing too many packets per second to be sent. For G.711, an RTP packet can be created with as many samples as desired for the given packet rate. Common values are 20ms and 30ms. Decoders know to append the samples across packets as if they were in one stream. For G.729, the RTP packet must come in 10ms multiples, because G.729 only encodes 10ms blocks. An RTP packet with G.729 can have multiple blocks, and the decoder knows to treat each block separately and sequentially. G.729 phones commonly stream with RTP packets holding 20ms or larger, to avoid having too many packets in the network.

Secure RTP

RTP itself has a security option, designed to allow the contents of the RTP stream to be protected while still allowing the quick reassembly of a stream and the robustness of allowing parts of the stream to be lost on the network. Secure RTP (SRTP) uses the Advanced Encryption Standard (AES) to encrypt the packets. (AES will later have a starring role in Wi-Fi encryption, as well as for use with IPsec.) The RTP stream requires a key to be established. Each packet is then encrypted with AES running in counter mode, a mode where intervening packets can be lost without disrupting the decryptability of subsequent packets in the sequence. Integrity of the packets is ensured by the use of theHMAC-SHA1 keyed signature, for each packet.
How the SRTP stream gets its keys is not specified by SRTP. However, SIPS provides a way for this to be set up that is quite logical. 

No comments:

Post a Comment