Protecting good quality voice services in 5G network setups

Feature

10 December 2021

By Reiner Stuhlfauth, Technology Manager Wireless, Rohde & Schwarz

Technological advances have brought on a major change in communications, from circuit-switched 2G networks with initial focus on telephony to fully packet-switched 4G networks for Internet data transfer. Today we are talking 5G technology, with its flexible and sophisticated architecture, pitched to provide enhanced data services for mobile devices.

Despite the focus on exchange of high volumes of data, voice services still remain a key element, and there are two major circumstances to consider here. First, there’s the radio access network (RAN) – i.e. whether 5G new radio (NR) is offered in addition to LTE as non-standalone access (NSA, or option 3 deployment), or whether there is a 5G standalone network (SA mode, or option 2 deployment). The NSA mode enables dual-connectivity scenarios in either LTE as the primary radio access technology (EN-DC) or 5G as the primary radio access technology (NE-DC).

The second consideration is the type of network used – EPC or 5G core (5GC). In a dual-connectivity scenario, there can be a voice-service restriction indicated by the radio access technology (RAT). It concentrates on voice or speech services, although 5G may certainly offer video or communication services – so-called “Rich Communications Services” (RCS), which are managed in a similar way to voice services. The only difference here is in the support of emergency services, where the network distinguishes between an emergency or general voice call through the signalling. Regarding protocol and transport, emergency and voice are handled in a similar way, except for quality of service (QoS) profiles, but a network may indicate the support of both services as separate offerings.
There is a difference between legacy networks and a 5G network offering voice services, as the latter exchanges connection parameters and service access policies during the registration procedure. The user equipment (UE) will indicate its capabilities to the network and, in reverse direction, the network offers subscribed services, i.e. voice or video calls.

VoNR

Voice over New Radio (VoNR) is voice over IP, incorporating the IP multimedia subsystem (IMS) infrastructure, already part of legacy technologies like LTE; see VoNR’s protocol structure in Figure 1. It offers a management and orchestration system that guarantees end-to-end QoS for each application, compared to VoIP with its traffic-channel-only approach. IMS establishes, controls and maintains a packet data unit (PDU) session, including its data bearers, with corresponding QoS flow for best end-user experience. There will be at least two data bearers established: one for content (i.e., speech packets containing the encoded audio itself) and one for IMS signalling. Like in VoLTE, there is a major difference with voice over IMS in 5G system (5GS) when compared to voice services offered by external applications, e.g. so-called over-the-top (OTT) speech services. This is because OTT speech may operate transparently to the connectivity network and there is no IMS management to ensure QoS, which raises the question of how to connect IMS to the 5G core network.

Figure 1: Voice over NR – protocol structure

For reasons such as fast market adoption, network deployments, disaggregation of network entities and the coexistence with legacy technologies, there is no single 5G deployment scenario; see Figure 2. The evolutionary paths describe whether in an NSA connection voice will be supported by E-UTRA only and if the simultaneous NR data connection can be sustained or suspended, an option referred to as “voice over LTE in EN-DC”. The EPS fallback is for scenarios where 5GC doesn’t provide voice services; if needed, the voice call will be transferred to an EPS connection (VoLTE), including also a RAT change from 5G NR to LTE. The advantage is that the UE camps in 5G NR, and the handover to legacy network is executed only when the voice call is connected.

Another mode is the RAT fallback, where the assumption is that the core network supports voice connection, but the current RAT, presumably NR, does not. As a consequence, a voice connection is transferred from NR to E-UTRA, representing a RAT change only. Voice over NR (VoNR) indicates a scenario where the NR network supports voice services and the 5GC offers a connection to IMS. The primary deployment focus of VoNR is standalone operation (SA) where 5GC connects to IMS supporting voice services. However, VoNR also works in non-standalone (NSA) operation modes like E-UTRA and NR dual connectivity (EN-DC).

Figure 2: Deployment scenarios supporting voice in 5G

5G supports multimedia telephone services for IMS (MTSI), representing the application layer see Figure 3. The media flow consists of audio, video and text (i.e., general data like images, text, websites, etc.), leveraging modern collaboration and communications tools. To preserve QoS, the real-time streaming and control protocols coordinate the media transport and tackle impairments like delayed, disordered or misguided packets. The transport and network layers are realised by the well-known TCP, UDP and IP (IPv4 and IPv6) protocols. The RAT functions are provided by either E-UTRA or 5G NR. The session initiation protocol (SIP) and the session description protocol (SDP) undertake the control plane of the voice connection.

Figure 3: Protocol layer for MTSI

Supporting voice services

The incorporation of IMS services in 5G, including network interfaces, protocol layers and signalling scenarios, is the absolute prerequisite for voice services offered in 5G. To leverage QoS aspects, a so-called QoS flow is established between the UE and the network, accompanied by parameters such as latency, priority, packet error rate and guaranteed bit rate. To reduce signalling overhead, 5G assigns a 5G QoS flow identity (5QI) to each QoS flow. All protocol layers and network functions are aware of this 5QI. There is a recommendation to apply the following 5QI profiles: 5QI = 1 for conversational voice, 5QI = 2 for conversational video requiring certain QoS values, 5QI = 5 for IMS signalling and, optionally, 5QI = 6-9 for concurrent media flows with lower QoS requirements.
As voice is considered an application in 5G, there are no mandatory configurations of protocol layers. However, they can be seen more as recommendations, since the voice service focuses more on latency than reliability, and aspects such as efficient usage of the radio resources and energy consumption play a pivotal role in a voice connection. Semi-persistent scheduling mechanisms allow a quasi-constant scheduling of guaranteed bit-rate radio resources with low signalling overheads. Additionally, the slot aggregation mechanism lets the automatic repetition of a speech packet, with focus on reducing latency. Energy reduction is tackled by discontinuous reception and transmission (DRX and DTX). The focus on latency before reliability is clear: it is recommended to set the RLC layer into unacknowledged mode and to skip the integrity check at the PDCP layer for security reasons, with only ciphering enabled.
The industry’s body 3rd Generation Partnership Project (3GPP) had already developed the enhanced voice services (EVS) speech codec, now mandatory with 5G voice. EVS uses the higher data rates offered by 5GS for enhanced encoded audio signals. Technically, EVS increases the audio bandwidth and covers the audible frequency range from 20Hz to 20kHz, corresponding to the typical range of the human ear. To convert the audio signal into digital, the EVS applies methods like amplitude quantisation and discrete sampling, except with finer quantisation levels and higher sample rates. One important aspect of the EVS is its interoperability codec mode that adjusts the EVS speech codec to legacy voice codec rates, enabling a smooth move to VoNR.
On the infrastructure side, having voice services requires some adaptation, and the flexible architecture provides new optional interfaces and functions. Firstly, the operator needs to decide which core network is incorporated and whether it should support voice services or not, which leads to the decision of offering either EPS fallback or VoNR.

Secondly, the core network EPS or 5GC needs to be connected to IMS via several interfaces to exchange user and signalling data, since the networks communicate via those interfaces. Since there’s no default 5G system, different entities and interfaces can be deployed optionally, which might impact the UE:

The N6 interface provides the data transfer between 5GC and IMS. In 5G the N6 interface is already used to exchange data between 5GC and an external data network, but because of the introduction of voice services, it needs to be extended and provides a connection to another data network – therefore, the IMS.
If both core networks are applied, EPS and 5GC, the N26 interface may share some signalling information between the EPS mobility management entity and the 5GC access and mobility function. Here, the UE uses a single registration procedure as the two core network entities coordinate mobility and registration.
The S5 interface exchanges and coordinates user data between the session management and the user plane function, with the serving gateway.
A common home subscriber service centre coordinates subscription profiles and access policies.

T&M for voice services in 5G

Testing guarantees and verifies the application QoE for the end user and the proper functioning and implementation of the voice services in 5G.

Since voice over 5G operates similarly to voice over LTE, the general test setup does not differ greatly. However, various fields of testing should be contemplated.

Testing voice services in 5G typically starts with a basic verification of proper implementation and functional behaviour; i.e., can a call be established and is the voice signal audible? Following that, T&M is used to determine the quality of the audio under well-known and reproducible conditions. Besides device-orientated voice testing, mobile network testing and benchmarking of deployed services in a live network guarantee the experienced user quality.
Voice-quality test setup includes mobile radio testing capability that supports signalling and functional testing and enhanced protocol procedures, such as interoperability, multi-connectivity and mobility scenarios; see Figure 4. A test setup may also include audio quality test equipment, with digital and analogue interfaces. For stress tests, a setup may use fading on the radio interface and emulate network impairments like IP-packet disordering, delay or discarded packets.

Figure 4: Example of an audio quality analysis test setup

In addition to the RAT technologies 5G and LTE, such a setup may also support legacy RAT such as 2G or 3G and non-cellular technologies like Bluetooth or Wi-Fi, since they offer voice services.

An obvious requirement is to emulate the IMS network and its signalling protocols SIP, SDP and data provisioning. The audio quality is typically indicated as mean opinion score value, derived as a result of algorithms like the Perceptual Quality for Voice algorithm published by the ITU. The advantages of a lab-based test setup are that the conditions are reproducible, and the test repeatable and performed under predefined conditions.
To monitor the quality of certain applications like voice or video and to fulfil the KPI requirements, field or drive testing is necessary. Here, a passive device like a scanner is extended by a device that can actively set up a connection, and analysis on the application quality can be performed. In addition, network operators may like to compare their network quality in a benchmarking process against other networks, or monitor the entire network via multiple samples and a statistical analysis to obtain a summarised view.
Hence, test and measurement plays a pivotal role in successfully supporting voice services in commercial 5G networks.