Being a Skype and Teams consultant I seem to spend my life talking about why it is important to implement Quality of Service even for cloud systems routed over the internet.
My mantra is simple, if you can do it and it’s going to make an improvement no matter how little the perception may be, then it’s probably better to do it.
Specifically with cloud there is always an argument that QoS implementation is not worth the effort because of the middle carrying network that is the internet. As we all know, the internet cannot perform QoS. So what is the point?
Firstly, this is a wrong assumption, that there is no point. The second incorrect assumption is that Quality of Service is only possible when you’ve implemented Office 365 Express Route.
The reason for my stance on this matter is because I look for different types of communication between different peers. I look at the intended media route over your network and calculate that a certain percentage of media will always be local to your network. With the implementation of Direct Routing, this percentage increases quite significantly.
As a result of these media paths remaining on your controlled network, you can apply traffic treatment policies to prioritise important data packets end to end and both ways. Your network may be geographically large with different types of inter site connections such as MPLS, Point to Point, or managed ethernet. As a result implementing QoS is no small feat. And for this reason alone I get the most push back on why QoS cannot be implemented at a customer. It’s not the technology, its either effort based, or the fact that the networks team have historically used other treatment methods and they don’t want to pivot away from that.
Once we get over the first hurdle on agreeing that it makes sense to deploy QoS for Microsoft Teams because 70% of the media is going to go end to end over the customer LAN, we then start talking about how this can be implemented.
First off LAN and WAN QoS are fundamentally the same, just different networks. And the endpoint for each of those networks may treat QoS differently. The most important thing from my perspective is that the traffic is treated in exactly the same way over each network regardless. Some WAN networks use accelerators and application inspection to classify data packets based on what the device has determined to be the application e.g. Microsoft Teams. The problem is that in order for these devices to determine the application type, they must inspect the data packet. As Teams transmits media in Secure Real Time Media (SRTP) the data payload is encapsulated in an encrypted packet. This means the device has to decrypt the data packet, inspect it, decide what to do with it and then re-encrypt it and send it on. This requires CPU and memory, but more importantly for us, increases latency and packet reordering and jitter. All bad where media quality is concerned. It is for this reason Microsoft do not support deep packet inspection for Microsoft Teams payloads.
The other challenge we have is WAN acceleration and packet reshaping. Network engineers will want to do this because it means that they can squash more data through the available bandwidth that otherwise would be possible. WAN accelerators basically compress the data packet and then send over the network. The problem with compression is that the data packet has already been compressed by the voice codec used in the SDP negotiation between endpoints, for the WAN to compress the packet again, you have double compression. This leads to data bits being lost and entire packets resulting is poor media quality. Again Microsoft do not support or recommend this for Microsoft Teams.
This leaves us with what needs to be done. Microsoft support policy based QoS using DSCP. Nothing new there. The LAN needs to be configured to transmit packets based on their DSCP classification, as does the WAN. Do not try to re-mark data packets between networks, for instance configuring EF for audio on the LAN but AF34 over the WAN. If you do that you are not gaining anything and contradicting the purpose of Quality of Service. Pick the classification and trust the packet end to end.
Microsoft publish their QoS recommended classifications for media types. For Teams, this is EF (46) for audio, AF41 (34) video and AF21 (18) for app sharing. It is an incorrect assumption that you must assign these values to Teams traffic for Quality of Service.
Yes, it would be nice to have, but the reality is often very different. Most enterprises have very strict controls over what type of application can use EF. For instance the most common entry criteria is that the application must have call admission control. Microsoft Teams does not have this ability.
EF is an expensive classification for them in the way that it operates as well as if they have a managed WAN then they are probably paying for a preset static amount of EF bandwidth. This bandwidth is precious to them as other business critical applications could be using this priortisation. If you go ahead and deploy Teams on EF then you could bring down several systems as a result.
The actual reality is that you are aiming for the best classification you can get in the AF band. You want the top classification that no other application is using so the data packet you are transmitting gets the best treatment and prioritisation possible. The net result is the almost the same experience as EF. In some ways it is better because like my last point around EF is expensive, by classifying in AF means you can use the general bandwidth available which would be much higher to your heart is content and get full prioritisation over it for no additional cost to your customer. Its a win win compromise.
Once you have this implemented at the network level, you need some way to mark the data packets accordingly. You do this today using group policy.
Don’t forget that if you are wanting to implement QoS then do it properly, it doesn’t just end with this GPO for Teams.exe. Where will these users be calling? Desk phones? Direct Routing SBCs?
You’ll need to ensure that these devices are configured themselves for QoS otherwise you are only getting QoS on the sending stream from the Teams client and potentially none on the receiving stream. The end result is 50% of the possible experience to each of the users.
You can test whether data packets are being correctly marked by using Wireshark to capture the data packets. You are looking for a UDP stream to the target endpoint on the source port within the media range
But remember, any packet that is destined to be transmitted over the internet will only be priortised on your network, up to your boundary. After that QoS does not come into play and the packet is sent like any other data packet.
The same is said from any inbound data packets from the internet. For instance, you receive a pstn call from Microsoft Phone System. The packet is being transmitted from Microsoft via the internet to your network. It is not prioritised and similarly any markings that were stamped by Microsoft’s media network for DSCP values are stripped by the internet routers. This means the inbound stream has a DSCP value of 0
Therefore, you are effectively only getting 25% of the total streams treated for Quality of Service i.e. Outbound stream client -> your boundary.
Network inefficiencies cannot be hidden from media traffic
If you’re going 100% cloud for calling and meetings, then you really should consider your internet breakout design, capacity and performance. It may be more cost effective to implement local breakouts at sites, rather than purchasing Express Route. But one thing is for sure, in an enterprise organisation, if you want enterprise grade voice quality then you need to guarantee your media quality end to end. Otherwise, there will be times where there are degraded experiences.
Lately, Microsoft have been rolling out meeting settings to the Teams admin portal and one of those settings is an enable Quality of Service markings for real time media.
You would assume that this setting would mark the traffic coming out of the Microsoft network and replace the need for group policy based QoS?
At the time of writing this appears not the case. Perhaps this feature has not yet made it to the client. The setting certainly suggests it should.
However, this setting will presumably apply the recommended DSCP markings to data packets and that could be in breach of your design. In this case, you would still rely on the GPO method.
From the testing I have done at the moment this setting does not actually mark any inbound or outbound data packet to the client.
In any case, when this feature is fully working it still is not going to solve your problems without you putting the effort in to support it. While you can be pretty consistent and controlled for LAN to LAN communication, you need to remember anything going to the cloud or coming from the cloud is not going to benefit from QoS, unless you have Express Route.
Deciding whether you need that or not depends on your usage prediction.
In summary, Quality of Service is still an important element of deploying a cloud voice solution, but you must understand your usage profile to weigh up the reward vs effort to implement. If you’re going 100% cloud then QoS will only play a meaningful role in internal P2P comms. You must focus your efforts in ensuring there is sufficient capacity and performance in your external network to support a good standard of quality albeit uncontrolled. If you want the best experience in this scenario, then Express Route may be a consideration for you, but not necessarily mandatory.