What are media gateways and how do H.323, SIP, MGCP and other support protocols work?

What are media gateways and how do H.323, SIP, MGCP and other support protocols work?

The gateway controller or media gateway controller (MGC) carries out the signaling function on VoIP circuits. Some texts call an MGC a softswitch, even though they are not truly switches but servers that control gateways. This function is illustrated in Figure 6.

Figure 6. A media gateway controller (MGC) provides a signaling interface for media gateways (MGs), thence to the IP network.

An MGC can control numerous gateways, but to improve reliability and availability, several MGCs may be employed in separate locations with function duplication on the gateways they control. Thus, if one MGC fails, others can take over its functions. We must keep in mind that the basic topic of Section 4 of this chapter is signaling -- that is, establishing telephone connectivity, maintaining that connectivity, and taking down the circuit when the users are finished with conversation. There is a basic discussion of signaling in Chapter 4 of this text.

There are four possible signaling protocol options between an MGC and gateways. These are:

  Overview of the ITU-T Rec. H.323 standard
In May 1996, the ITU ratified the H.323 specification, which defines how voice, video and data traffic should be transported over IP-based LANs. It also incorporates the ITU-T Rec. T.120 (Ref. 21) data-conferencing standard. The H.323 recommendation is based on RTP/RTCP (real-time protocol/real-time control protocol) for managing audio and video signals.

What sets H.323 apart is that it addresses core Internet applications by defining how delay-sensitive traffic such as voice and video get priority transport to ensure real-time communication service over the Internet. Related protocols are ITU-T Rec. H.324 (Ref. 14) specification, which defines the transport of voice, data and video over regular telephone networks. Another related protocol is ITU-T Rec. H.320 (Ref. 15), which covers the transport of voice, video and data over the integrated services digital network (ISDN).

H.323 deals with three basic functional elements of VoIP. These are:

H.323 is an umbrella protocol covering:

The standard H.323 (Ref. 3) prefers the use of the term gatekeeper to media gateway controller. Some of the more important responsibilities of a gatekeeper are:

H.323 assumes that the transmission medium is a LAN that does not provide guaranteed delivery of packets. In the ITU H.323 standard we will find the term entity. An entity carries out a function. For example, a terminal is an endpoint on a LAN that can support real-time communications with another entity on that LAN. It has a capability provided by a voice or audio codec such as a G.711 or G.728 codec. It will also provide a signaling function for VoIP circuit setup, maintain and take-down. A VoIP terminal optionally can support video and data streams, including compression and decompression of these streams. Media streams are carried on RTP or RTCP. RTP deals with media content; RTCP works with the signaling functions of status and control. This protocol information is embedded in UDP, which is reliably transported by TCP. Other VoIP entities are gateways, and there is an optional gatekeeper.

The leading issue in VoIP implementation is guaranteed quality of service (QoS). H.323 is based on RTP, which is comparatively new. RTP-compliant equipment includes control mechanisms for synchronizing different traffic streams. On the other side of the coin, RTP has no mechanisms for ensuring on-time delivery of traffic signals or for recovering lost packets. It does not address the QoS issue related to guaranteed bit rate availability for specific applications. The IEC (Ref. 19) reports that there is a draft signaling proposal to strengthen the Internet's ability to handle real-time traffic reliably. This would dedicate end-to-end transport paths for specific sessions, much as the circuit-switched PSTN does. This is the resource reservation protocol (RSVP). It will be implemented in routers to establish and maintain requested transmission paths and QoS levels.

  SIP
SIP (Session Initiation Protocol) is based on RFC 2543 (Ref. 3) and is an application layer signaling protocol. It deals with interactive multimedia communication sessions between end users, called user agents. It defines their initiation, modification and termination. SIP calls may be terminal-to-terminal, or they may require a server to intercede. If a server is to be involved, it is only required to locate the called party. For interworking with non-IP networks, Megaco and H.323 are required. Often, vendors of VoIP equipment integrate all three protocols on a single platform.

SIP is closely related to IP. SIP borrows most of its syntax and semantics from the familiar HTTP (hypertext transfer protocol). A SIP message looks very much like an HTTP message, especially with message formatting, header and multipurpose Internet mail extension support. It uses addresses that are very similar to URLs and to email. For example, a call may be made to so-and-so@such-and-such. SIP messages are text-based rather than binary. This makes writing easier and the debugging of software more straightforward.

There are two modes with which a user can set up a call with SIP. These are called redirect and proxy, and servers are designed to handle these modes. Both modes issue an invite message for another user to participate in a call. The redirect server is used to supply the address (URL) of an unknown called addressee. In this case, the "invite" message is sent to the redirect server, which consults the location server for address information. Once this address information is sent to the calling user, a second invite message is issued, now with the correct address.

One specific type of SIP is called SIP-T (T for telephone). This is a function that allows calls from CCITT Signaling System 7 (SS7) to interface with telephone in an IP-based network. The particular user part of SS7 for this application is ISUP.

  Media gateway control protocol (MGCP)
This protocol was the predecessor to Megaco (see Section 12.3.4) and still holds sway with a number of carriers and other VoIP users. MGCP (Ref. 20) assumes a call-control architecture where the call-control intelligence is outside the gateways (i.e., at the network edge) and handled by external call-control elements. Thus, the MGCP assumes that these call-control elements, or call agents, will synchronize with each other to send coherent commands to the gateways under their command. There is no mechanism defined in MGCP for synchronizing call agents. It is, in essence, a master/slave protocol where the gateways are expected to execute commands sent by the call agents.

In the MGCP protocol, an assumption is made that the connection model consists of constructs that are basic endpoints and connections. Endpoints are sources or sinks of data and could be physical or virtual. The following are two examples of endpoints:

An example of a virtual endpoint is an audio source in an audio-content server. Creation of physical endpoints requires a hardware installation, whereas creation of virtual endpoints can be done in software (Ref. 20).

  Megaco or ITU-T Rec. H.248 (Ref. 13)
Megaco is a call-control protocol that communicates between a gateway controller and a gateway. It evolved from and replaces SGCP (simple gateway control protocol) and MGCP (media gateway control protocol). Megaco addresses the relationship between a media gateway (MG) and a media gateway controller (MGC). An MGC is sometimes called a softswitch or call agent. Both Megaco and MGCP are relatively low-level devices that instruct MGs to connect streams coming from outside the cell or packet data network onto a packet or cell stream governed by RTP.

A Megaco (H.248) connection model is illustrated in Figure 12.7. Two principal abstractions relate to the model: terminations and contexts. A termination sources and/or sinks one or more data streams. In a multimedia conference, a termination can be multimedia, and sources and sinks, multiple media streams. The media stream parameters, as well as modem and bearer parameters, are encapsulated within the termination.

A context is an association among a collection of terminations. There is a special type of context, called the null context, which contains all terminations that are not associated with any other termination. For example, in a decomposed access gateway, all idle lines are represented by terminations in the null context.

Let's look at three context possibilities.

The maximum number of terminations in a context is a media gateway (MG) property. MGs that offer only point-to-point connectivity might allow at most two terminations per context. MGs that support multi-point conferences might allow three or more terminations per context.

The attributes of contexts are:

Megaco uses a series of commands to manipulate terminations, contexts, events and signals. For example, the add command adds a termination to a context and may be used to create a new context at the same time. Of course, we would expect the subtract command to remove a termination from a context and may result in the context's being released if no terminations remain.

There is also the modify command, used to modify the description of a termination -- e.g., the type of voice compression in use. Notify is used to inform the gateway controller if an event occurs on a termination such as a telephone in an off-hook condition or digits being dialed. There is also a service change command.

Terminations are referenced by a TerminationID, which is an arbitrary schema selected by the MG. TerminationIDs of physical terminations are provisioned by the media gateway. The TerminationIDs may be chosen to have structure. For example, a TerminationID may consist of a trunk group and a trunk within the group (Ref. 9).


Figure 7. An example of H.248/Megaco connection model. SCN (switched circuit network). The asterisk in each box in each of the contexts represents the logical association of terminations implied by the context (based on Figure 1, RFC 3015, Ref 9).