Announcing Improved Voice Quality for Plivo SDK-based Apps

Feb 23, 2017

Following the December rollout of our new Voice 2.0 Infrastructure, our team continues to add features and upgrades to improve voice quality. As of today, our Browser SDK supports the Opus codec, which is the best-in-class audio codec for a wide range of voice applications. All applications that use the Plivo Browser SDK for inbound and outbound calling will automatically use Opus as their default codec. Soon we’ll launch Opus support for our Mobile SDK and Zentrunk SIP trunking as well.

Why the Opus codec?

We’ve optimized the opus encoder with reduced complexity on our Plivo network side to reduce decoder complexity on customers’ browsers. We’ve also optimized sampling rates, leading to decreases in browser sampling frequency and thus providing better performance on the browser. All of our optimizations also save bandwidth.

We’ve captured these audio samples that compare how Opus and the PCMU (G.711) codec deal with varying degrees of packet loss. The first sample, with 0% packet loss, sets the baseline for comparison. With increases in packet loss, Opus’s superiority stands out. Even at 30% packet loss, Opus still delivers comprehensible dialogue.

Packet Loss	Audio Samples
0%	Opus
0%	PCMU
10%	Opus
10%	PCMU
25%	Opus
25%	PCMU
30%	Opus
30%	PCMU

Opus is also efficient. Instead of the 100Kbps of bandwidth our prior WebRTC SDK codecs used, Opus uses 50Kbps. Your application will experience decreases in jitter, latency, and packet loss, and despite poor network connections, your users will experience better voice quality.

What is Opus?

Opus is an open source audio codec that’s optimized for speech and music transmission over the internet. Audio codecs are software that compress and decompress digital audio signals for transmission. These codecs depend on mathematical algorithms and are graded on their ability to retain audio quality while encoding and compressing audio signals.

Opus is highly effective at reducing bandwidth consumption and CPU usage during audio transmission while maintaining high-fidelity audio signals. It’s known for its ability to handle a variety of VoIP audio applications, including conferencing, help desks, and click-to-call applications.

Why is Opus awesome?

Opus was built to fill the gaps of existing audio codecs, which were not optimized for bandwidth, CPU usage, and the varying bitrates and frame sizes that are needed for next-generation WebRTC-based audio applications. Even though Opus is not new, its high quality and low latency performance have propelled its popularity among applications that use WebRTC. Google’s Chrome browser has adopted Opus as its default codec, and Firefox, Opera, and Chromium browsers all support Opus for WebRTC as well. Because of this broad support, more and more WebRTC applications have been adopting Opus to transmit speech over the internet.

Here’s how Opus stacks up against other popular codecs, per its creators Jean-Marc Valin (Mozilla/Xiph.Org), Koen Vos (vocTone), and Timothy B. Terriberry (Mozilla/Xiph.Org). As illustrated below, Opus has the lowest delay (26.5ms by default), flexible bitrate, and a broad range of bandwidth support (narrowband to fullband), and it’s optimized for real-time communication.

Extreme audio optimization

Opus can adjust bitrate, audio bandwidth, and frame size dynamically on live calls. This support for a range of bitrates, frame sizes, audio bandwidths, sampling rates, and multistream frames ensure that a wide variety of applications can use Opus to transmit audio. This flexibility allows Opus to compensate for varying internet speeds and issues that users could experience without notice. For example, if a user has a congested Wi-Fi router or experiences low network bandwidth, Opus can automatically and seamlessly switch to a lower bitrate for smaller bandwidth consumption.

Errors and packet losses are unavoidable when complex systems interact. Opus has many features and strategies to mitigate poor audio quality during low network connections.

Reduced jitter

Ideally, in a perfect high-bandwidth, low-latency environment, the network should deliver a steady stream of packets on a continuous basis. However, even if audio data is being transmitted and played in the right order but not played to the exact timing, sound distortions can occur. Here’s an illustration comparing a steady stream of packets during zero congestion versus the same audio transmission (i.e., same packet stream) in a congested environment.

Flow chart of jitter vs no jitter comparison

VoIP applications can experience a lot of jitter because high fidelity audio requires high bandwidth. However, even in the event of packet loss, Opus has built-in features such as Packet Loss Concealment (PLC) and dynamic frame sizes to mitigate the symptoms and detection of jitter by the human ear.

When voice is transmitted over IP, packet loss can occur during decoding. Opus can use PLC to mask the effects of packet loss. When the codec detects that a packet is missing, Opus can use several PLC strategies to hide gaps in lost information. Opus can replace lost speech frames with zeros (i.e., zero insertion), reconstruct missing gaps by repeating a portion that has been successfully received (i.e., waveform substitution), or use speech models and algorithms to fill gaps in speech (i.e., model-based methods). These strategies are especially important for calls to and from areas of low bandwidth networks or Wi-Fi congestion.

Lower latency

The human ear can detect latency greater than 250ms. While 300ms is considered industry wide as poor latency, the International Telecommunication Union recommends that latency should be kept below 150ms to ensure that symptoms of poor voice quality doesn’t affect calls. Our platform is optimized to deliver connectivity under 50ms to all customers around the globe, and support for high value audio codecs such as Opus plays a large role. Opus solves latency issues by supporting variable and constant bitrates and being able to adjust bitrates dynamically.

Support for variable bitrate (VBR) and constant bitrate (CBR). Voice transmission requires a variable bitrate — the ability to change bitrate dynamically to adapt to the audio being encoded. VBR can help achieve a lower bitrate for the same voice quality, which means that it can consume less bandwidth than CBR, leading to improvements in audio quality.
Dynamic bitrates from 6Kbps to 510Kbps. Opus will adjust its bitrate between 6 to 510 kilobits per second (Kbps) according to packet loss and round-trip time (RTT) reports during live audio transmission. If an audio call is experiencing increased packet loss and long RTT, then Opus will automatically switch to a lower bitrate to compensate and reduce congestion. The ability to change bitrates dynamically ensures that applications consistently deliver high voice quality and clarity.

Better packet loss concealment

Mitigating packet loss is especially important in real-time communication, because there’s no time to resend missing packets. Even low levels of packet loss can cause unnecessary breaks in audio; when packet loss is severe, complete sentences could be missing. Even though Opus cannot alleviate packet loss, it can mask the symptoms with reconstruction algorithms like forward error correction (FEC) and other PLC strategies.

Forward error correction (FEC). FEC can improve audio quality because it can reconstruct a missing packet from information from neighboring packets that were previously or subsequently transmitted.
Flexible error propagation. In the event of packet loss, other audio codecs utilize long-term prediction (LTP) filter states that spend more bits throughout the packet, which requires significant increases in bitrate and delay. To mitigate this, Opus reduces LTP filter states to the beginning of a packet, spending more bits only during the first pitch period, but saving bits throughout the packet transmission. This decreases potential voice quality issues and allocates more bandwidth to transmission.

Reduced audio bandwidth

Bandwidth is the amount of information that can be transmitted over a period of time. The larger the bandwidth the more data can be transmitted. Increasing bandwidth can lead to better audio quality. Strategies for better utilizing bandwidth include transmitting more data each time, transmitting the same amount of data faster, or reducing the amount of data that needs to be transmitted. Opus deploys discontinuous transmission (DTX) to reduce the amount of data being transmitted during periods of silence.

Most audio calls have intermittent pauses and periods of silence, therefore by reducing the packet rate during silence can save bandwidth and CPU usage. DTX give Opus the ability to detect silence and reduce packet rates when no one is speaking. Then, when audio resumes, Opus can increase the packet rate seamlessly.

Opus’ adaptability and robustness makes the codec suitable for VoIP applications running on stand-alone software or web browsers. See for yourself and let us know what you think.