RTP only carries the voice, and there must be some associated way to signal the codecs which are supported by each end. This is fundamentally a property of signaling, but, unlike call progress messages and advanced PBX features, is tied specifically to the bearer channel.
SIP uses SDP to negotiate codecs and RTP endpoints, including transports, port numbers, and every other aspect necessary to start RTP streams flowing. SDP, defined in RFC 4566, is a text-based protocol, as SIP itself is, for setting up the various legs of media streams. Each line represents a different piece of information, in the format of type = value.
Table 1 shows an example of an SDP description. This description is for a phone at IP address 192.168.0.10, who wishes to receive RTP on UDP port 9000. Let's go through each of the fields.
- Type "v" represents the protocol version, which is 0.
- Type "o" holds information about the originator of this request, and the session IDs. Specifically, it is divided up into the username, session ID, session version, network type, address type, and address. "7010" happens to be the dialing phone number. The two large numbers afterward are identifiers, to keep the SDP exchanges straight. The "IN" refers to the address being an Internet protocol address; specifically, "IP4" for IPv4, of "192.168.0.10". This is where the originator is.
- Type "s" is the session name. The value given here, "A_conversation", is not particularly meaningful.
- Type "c" specifies how the originator must be reached at—its connection data. This is a repetition of the IP address and type specifications for the phone.
- The "m" line specifies the media needed. In this case, as with most voice calls, there is only one voice stream from the device, so there is only one media line. The next parameters are the media type, port, application, and then the list of RTP types, for RTP. This call is an "audio" call, and the phone will be listening on port 9000. This is a UDP port, because the application is "RTP/AVP", meaning that it is plain RTP. ("AVP" means that this is standard UDP with no encryption. There is an "RTP/SAVP" option, mentioned shortly.) Finally, the RTP formats the phone can take are 0, 8, and 18.
- The next three lines are the codecs that are supported in detail. The "a" field specifies an attribute. The "a=rtpmap" attribute means that the sender wants to map RTP packet types to specific codec setups. The line is formatted as packet type, encoded name/bitrate/parameters. In the first line, RTP packet type "0" is mapped to "PCMU" at 8000 samples per second. The default mapping of "0" is already PCM (G.711) with μ-law, so the new information is the sample rate. The second line asks for A-law, mapping it to 8. The third line asks for G.729, asking for 18 as the mapping. Because the phone only listed those three types, those are the only types it supports.
- The last line is also an attribute. "a=ptime" is requesting that the other party send 20ms packets. The other party is not required to submit to this request, as it is only a suggestion. However, this is a pretty good sign that the sender of the SDP message will also send at 20ms.
v=0 |
The setup message in Table 1 was originally given in a SIP INVITE message. The responding SIP OK message from the other party gave its SDP settings.
Table 2 shows this example response. Here, the other party, at IP address 10.0.0.10, wants to receive on UDP port 11690 an RTP stream with the three codecs PCMU, GSM, and PCMA. It can also receive a format known as "telephone-event." This corresponds to the RTP payload format for sending digits while in the middle of a call (RFC 4733). Some codecs, like G.729, can't carry a dialed digit as the usual audio beep, because the beep gets distorted by the codec. Instead, the digits have to be sent over RTP, embedded in the stream. The sender of this SDP is stating that they support it, and would like to be sent in RTP type 101, a dynamic type that the sender was allowed to choose without restriction.
v=0 |
Corresponding to this is the attribute "a=fmtp", which applies to this 101-digit type, "fmtp" lines don't mean anything specific to SDP; instead, the request of "0-16" gets forwarded to the telephone event protocol handler. It is not necessary to go into further details here on what "0-16" means. The "a=silenceSupp" line would activate silence suppression, in which packets are not sent when the caller is not talking. Silence suppression has been disabled, however. Finally, the "a=sendrecv" line means that the originator can both send and receive streaming packets, meaning that the caller can both talk and listen. Some calls are intentionally one-way, such as lines into a voice conference where the listeners cannot speak. In that case, the listeners may have requested a flow with "a=recvonly".
After a device gets an SDP request, it knows enough information to send an RTP stream back to the requester. The receiver need only choose which media type it wishes to use. There is no requirement that both parties use the same codec; rather, if the receiver cannot handle the codec, the higher-layer signaling protocol needs to reject the setup. With SIP, the called party will not usually stream until it accepts the SIP INVITE, but there is no further handshaking necessary once the call is answered and there are packets to send.
For SRTP usage with SIPS, SDP allows for the SRTP key to be specified using a special header:
a=crypto:1 AES_CM_128_HMAC_SHA1_32 Þ
inline:c3bFaGA+Seagd117041az3g113geaG54aKgd50Gz
This specifies that the SRTP AES counter with HMAC_SHA1 is to be used, and specifies the key, encoded in base-64, that is to be used. Both sides of the call send their own randomly generated keys, under the cover of the TLS-protected link. This forms the basis of RTP/SAVP.
No comments:
Post a Comment