Experience another DNS-as-root-cause issue in a Skype for Business federation model. Where this wasn’t as obvious as it looks afterwards, I’d like to share the details of this one.
Intermittent results at federation calls were bugging us at first. Some calls were successfully, but some failed with setting up audio. One common ground were that these calls caused a failure in the Skype for Business Reports as well:
A BYE/22, or “Call failed to establish due to a media connectivity failure when both endpoints are internal” is recorded at every failing call.
The description doesn’t exactly fit – in our model, the call can’t be internal as the accepting call user is a federated, external user/endpoint on Skype for Business Online. The caller is forwarded from the on-premises environment where we see this error. So how is this possible, there are no two endpoints which could be ‘internal’?
Further investigation shows the issue is ‘common’ and some other blogs also mention this one, e.g. on ucprimer and ucsorted. However, there’s one interesting theory to add.
A DNS issue / lookup failure can also cause this issue! In this specific case, it was quite complicated to find:
Due to a VPN bug, clients with VPN software active could have sent out DNS requests twice: once to the local configured DNS server(s), and once to the VPN-configured DNS server(s). In some instances, the VPN-configured DNS responded with not found (due to blocked by firewall; as clients were supposed to lookup external records by their own configuration) took precedence over the local received response.
Hard to find in some cases – like this one – and not really clear clues in both Skype for Business Reports and the UCCAPILOG.
Fixing DNS – making sure that DNS requests are not blocked, but received and answered by the same DNS Server, with ability to lookup ‘online’, fixed this issue!