If you have used VoIP over internet solutions in the past then this will be no surprise. However, if you are new to Microsoft Teams with little or no UC heritage, then this post may be of value to you.
Even though Teams uses its own protocols over https to communicate, underneath the bonnet of these packets still uses the underlying protocols and principles of media connectivity over the internet. With Teams it is harder to see as the packets are encrypted with a certificate, but there are ways in which you can view this.
When I talk about media establishment over the internet, what I really mean is media connectivity when a client is behind NAT.
NAT is a problem when trying to connect two clients together over a third party network because we need a way of being able to transport the data packets from one peer to another and when behind a NAT firewall we cannot do this because both peer’s IP addresses change as they pass through the firewall. For instance, Client A has an IP of 10.0.0.1 and Client B has an IP of 192.168.0.1 and they exist on completely different networks connected by a NAT firewall over the internet. Using their private IP addresses would result in a fail because they are not routable over the internet as the firewall would change the IP to the public IP of the firewall’s outside interface.
Before we solve this problem, its probably a good time to bring up the term ICE. No this is not In Car Entertainment, but Internet Connectivity Establishment. ICE is a protocol that VoIP clients use to establish media in any network, internal or external. ICE lives inside another protocol called Session Description Protocol, or SDP. SDP is responsible for ICE, client capabilities and codec negotiation during a call setup.
When we talk about ICE, you’ll hear the term ICE candidates. What this means is that in order to establish media, the client will send an invitation to a conversation to the peer that they are trying contact via a SIP Proxy that contains within its SDP a list of IP addresses and Ports that can be used to allow the peer to connect to it.
These IP addresses and Ports are called ICE Candidates. When looking at the SDP you’ll see something like this
The first thing that you will notice is that each ICE candidate has a pair of addresses and ports. Candidate 1 in a port pair is the IP address and Port the client has opened for receiving media, whilst the second is the IP address and Port the client has opened is for RTCP media quality monitoring. The IP addresses must match, but the ports should be different if using UDP (Preferred).
Candidate:1 will always be the private IP address of the local client and this is what is used when both clients are internal to each other, i.e. on the same network. The next Candidate pair will be the IP address and Ports opened over the internet and this is where I take a pause and return to STUN for a moment.
If you look at the Candidate:4 pair in the above example, we can see that the IP 220.127.116.11 with the port 1230 and 1231 opened respectively. This IP address is the public IP of the firewall the client is routing over and the port the firewall has opened on the outbound interface for that connection. Natively, clients are not capable of discovering this IP and port on their own.
Instead they send out a discovery request to an external server to say “Hey tell me my public IP and Port please”. This request goes to a STUN server usually on UDP port 3478. The STUN Server will reply to the client with the public IP and Port opened on the firewall and this becomes whats called a server reflexive address (srflx) remote address (raddr). The firewall takes care of the NAT and the private IP of the client 192.168.0.103 is the real destination and the remote port (rport) is 1230 and 1231 (ports open on the firewall).
So STUN is just the process of allowing the client to discovery its public IP and Port. However, in order for STUN to work properly, the firewall needs to support certain NAT types. In a nutshell STUN does not work with firewalls configured to enforce symmetric NAT. Other forms of NAT are OK e.g. full cone, restricted cone and port restricted cone NAT.
The reason why symmetric NAT does not work is because for each new request from the client a new port is opened for that connection, whereas other forms of NAT use the same connection already opened. If symmetric NAT is enforced, you will experience audio drop outs because the client will need to renegotiate and switch the connections each time the port changes.
Now assuming everything is as it should be, the ICE Candidates with the server reflexive address is used to attempt media negotiation between each clients. If the firewalls both sides pass connectivity checks then these candidates are used and media is established.
However, if media cannot be established using these ICE candidates, then we need another way to connect media. This is typically done using a media relay server, which is also referred to as a TURN server.
Going back to the SDP diagram we see another candidate pair with IP 18.104.22.168 ports 59268 and 51695. This is the IP of the TURN server and the port assigned to the client for media traversal through the TURN server. We can tell this is a TURN server IP because the type is marked as relay. In this event the client will send media over its server reflexive address and port to the TURN server on the port assigned. The TURN server will accept the media and then relay it within itself to the peer connection also connected to the TURN server. The TURN server will then send out the media to the peer’s reflexive address which will hit the peer’s firewall, then NAT will take place to deliver the media to the client.
Putting this all together the media establishment process should look like this
Obviously you can go into more depth into this subject but at least this should give you a good baseline understanding on how Teams establishes media over NAT.