Telecom Architectures: real time protocol

Monday, July 4, 2011

Real-time Transport Protocol | RTP

The codec defines only how the voice is compressed and packaged. The voice still needs to be placed into well-defined packets and sent over the network.

The Real-time Transport Protocol (RTP), defined in RFC 3550, defines how voice is packetized on most IP-based networks. RTP is a general-purpose framework for sending real-time streaming traffic across networks, and is used for nearly all media streaming, including voice and video, where real-time delivery is essential.

RTP is usually sent over UDP, on any port that the applications negotiate. The typical RTP packet has the structure given in Table 1.

Table 1: RTP Format
Flags	Sequence Number	Timestamp	SSRC	CSRCs	Extensions	Payload
2 bytes	2 bytes	4 bytes	4 bytes	4 bytes × number of contributors	variable	variable

The idea behind RTP is that the sender sends the timestamp that the first byte of data in the payload belongs to. This timestamp gives a precise time that the receiver can use to reassemble incoming data. The sequence number also increases monotonically, and can also establish the order of incoming data. The SSRC, for Synchronization Source, is the stream identifier of the sender, and lets devices with multiple streams coming in figure out who is sending. The CSRCs, for Contributing Sources, are other devices that may have contributed to the packet, such as when a conference call has multiple talkers at once.

The most important fields are the timestamp (see Table 2) and the payload type (see Table 3). The payload type field usually specifies the type of codec being used in the stream.

Table 2: The RTP Flags Field
	Version	Padding	Extension (X)	Contributor Count (CC)	Marked	Payload Type (PT)
Bit:	0-1	2	3	4-7	8	9-15

Table 3 shows the most common voice RTP types. Numbers greater than 96 are allowed, and are usually set up by the endpoints to carry some dynamic stream.

Table 3: Common RTP Packet Types
Payload Type	Encoded Name	Meaning
0	PCMU	G.711 with μ-law
3	GSM	GSM
8	PCMA	G.711 with A-law
18	G729	G.729 or G.729a

When the codec's output is packaged into RDP, it is done so to both avoid splitting necessary information and causing too many packets per second to be sent. For G.711, an RTP packet can be created with as many samples as desired for the given packet rate. Common values are 20ms and 30ms. Decoders know to append the samples across packets as if they were in one stream. For G.729, the RTP packet must come in 10ms multiples, because G.729 only encodes 10ms blocks. An RTP packet with G.729 can have multiple blocks, and the decoder knows to treat each block separately and sequentially. G.729 phones commonly stream with RTP packets holding 20ms or larger, to avoid having too many packets in the network.

1 Secure RTP

RTP itself has a security option, designed to allow the contents of the RTP stream to be protected while still allowing the quick reassembly of a stream and the robustness of allowing parts of the stream to be lost on the network. Secure RTP (SRTP) uses the Advanced Encryption Standard (AES) to encrypt the packets. (AES will later have a starring role in Wi-Fi encryption, as well as for use with IPsec.) The RTP stream requires a key to be established. Each packet is then encrypted with AES running in counter mode, a mode where intervening packets can be lost without disrupting the decryptability of subsequent packets in the sequence. Integrity of the packets is ensured by the use of theHMAC-SHA1 keyed signature, for each packet.

How the SRTP stream gets its keys is not specified by SRTP. However, SIPS provides a way for this to be set up that is quite logical.

Tuesday, March 30, 2010

Real-Time Transport | IP Telephony-Related Standards

Deals with the standards that are pertinent to the mechanisms of carrying voice and video over IP networks. These standards are essential to interworking with the PSTN because IP telephony gateways need to convert the IP voice and video payload into a form that is accepted by the PSTN and, conversely, translate the PSTN voice and video payload into a form that is accepted by IP networks. The gateways also need to reconstruct the original voice or video stream to be as close to the original as possible. Naturally, such reconstruction should retain the real-time properties of the original stream. In addition, an interactive application—such as a two-person voice call—also requires that the transport service itself be fast, reliable, and perceived as “free” of jitter (that is, high variation in delay) to maintain the perception of a real-time interaction.

These real-time transport requirements explain why the protocol suite, developed by the Audio/Video Transport (avt) IETF working group (see www.ietf.org/html.charters/avt-charter.html) has been called Real-Time Transport Protocol (RTP) in RFC 1889 (Schulzrinne et al., 1996). RTP has been designed for multicast, as well as point-to-point, transmission and is accompanied by its quality control component, Real-Time Control Protocol (RTCP). Both protocols are carried by the User Datagram Protocol (UDP).

RTP specifies the header of the packets that carry streams of encoded audio or video samples. This encoding is performed by a device (or software module) called a coder; the subsequent decoding is performed by a decoder, but for full-duplex communications, both are usually combined in a codec. RTP specifies the payload format, which, in turn, identifies a specific codec. (The avt working group has also developed a number of RFCs that deal with payload formats.) The codec header, which is appended to the RTP header, determines the format of the attached encoded data unit (called a frame).

Since UDP does not guarantee sequencing (that is, arrival of packets in the order they were sent), this function is assisted by RTP, which stipulates the inclusion of sequence numbers in packets. Sequence numbers are used at the receiver not only to reconstruct the original sequence, but also to keep count of lost packets (one of the quality of service statistics fed back to the sender via RTCP).

RTP deals with any jitter by time-stamping packets. At the receiving end, the play-out devices buffer the packets and then reconstruct the stream at the original rate. Another synchronization mechanism is the marker bit of the header, which, according to RFC 1889, “is intended to allow significant events such as frame boundaries to be marked in the packet stream.”

The RTCP packets are sent to exactly the same addresses as the RTP ones, but on different ports. The primary function of RTCP is to carry, from receivers to senders, the statistics on the number of lost packets, jitter, and round-trip delays. RTCP carries sender reports in the opposite direction. The statistics are used by senders to adjust encoding rates (and, possibly, the choice of codecs) in order to use less bandwidth. In addition, the statistics are useful for network management as the mechanism to detect the type and location of network problems (such as congestion). In addition to supporting quality control, RTCP performs the following functions:

· Synchronization of video and audio streams

· Identification of session participants (by their full names, telephone numbers, and e-mail addresses)

· Session control (through indication that a user is leaving the session and user-to-user control messages)

Real-Time Streaming Protocol (RTSP), developed in the Multiparty Multimedia Session Control (mmusic) working group, is a network remote control for multimedia services, as defined in RFC 2326 (Schulzrinne et al., 1998). The main purpose of the protocol is to control a device for so-called stored media [for example, a compact disc (CD) player, tape recorder, and so on]. But the control here actually encompasses playing the device, which evolves the transfer of the stream across the network. The applicability of this protocol to the task of integrating the PSTN and the Internet can be found in the areas of voice and video messaging. Like SIP, RTSP is also a descendant of HTTP, but unlike SIP, RTSP maintains a virtual connection identifier by assigning a session identifier in the beginning of the session and then keeping it in all messages relevant to the session. RTSP defines its own URL in reference to the media servers. RTSP can also interwork with SIP, as explained in Schulzrinne and Rosenberg (1999).