Announcing Improved Voice Quality for Plivo SDK based Apps [with Audio Samples]

After the December roll-out of our new Voice 2.0 Infrastructure, our team continues to add features and upgrades to improve voice quality. We’re happy to announce that as of today, our existing WebRTC SDK supports the Opus codec, which is the best-in-class performing audio codec for a wide range of voice applications. All applications currently using the Plivo Web SDK for inbound and outbound calling will automatically use Opus as the default codec. Plivo mobile SDK support for Opus is coming soon.

What Does This Mean for Your Application?

Opus’ adaptability and robustness makes the codec very suitable for VoIP applications running on standalone software or web browsers. Today, we offer Opus support on our WebRTC SDK completely free of cost to all of our current and future customers. Also, we will soon launch Opus support for our MobileSDK and Zentrunk SIP Trunking as well. Please sign up to our blog to get notified.

Based on our benchmark test we have optimized the opus encoder our Plivo network side with reduced complexity that reduces decoder complexity on customer's browser and therefore providing improved browser performance. We have also optimised the sampling rates leading to decreases in browser sampling frequency and thus allowing better performance on the browser. All the above are optimised so that opus can be used on browser giving most important benefit of Bandwidth saving.

Below are audio samples that compare how Opus and PCMU deal with varying degrees of packet loss. The audio samples with 0% packet loss sets the baseline for comparison. With increases in packet loss, Opus’s superiority stands out. Compared to PCMU at 30% packet loss, Opus truly comes out ahead. Even at such a high level of packet loss, Opus still delivers audio quality that preserves quality and ensures that the dialogue is comprehensible.

Packet Loss Audio Samples
0% Opus
PCMU
10% Opus
PCMU
25% Opus
PCMU
30% Opus
PCMU

With this new audio codec, your application will experience decreases in jitter, latency, and packet loss. And during poor network connections, your users will experience better voice quality. To give you an example of this, instead of utilizing 100 kbps bandwidth with prior WebRTC SDK codecs, Opus only requires 50 kpbs of bandwidth.

Unfortunately, slow-speed connections and congestion are unavoidable, but the VoIP community is constantly working to help solve issues of packet loss with new technology, such as the Opus codec. Going forward, we will continue to dedicate more resources to improve upon our voice quality by upgrading our technology, optimizing our infrastructure, and connecting to more local in-country carriers.

What is Opus?

Opus is an open source audio codec that’s optimized for speech and music transmission over the internet. Audio codecs are software that compress and decompress digital audio signals for transmission. These codecs come in the form of mathematical algorithms and are graded on its ability to retain audio quality while encoding and compressing audio signals.

Compared to other codecs, Opus is highly effective at reducing bandwidth consumption and CPU usage during audio transmission while maintaining high-fidelity audio signals. That’s why Opus is known for its ability to handle a wide variety of VoIP (voice over IP) audio applications including conferencing, CRMs, help desks, and click-to-call applications.

Why is Opus Awesome?

Opus was built to fill the gaps of existing audio codecs, which were not optimized for bandwidth, CPU usage, and varying bitrates and frame sizes that are needed for next generation WebRTC-based audio applications. Even though Opus is not new, its high quality and low latency performance has propelled its popularity especially amongst applications that use WebRTC. Google’s Chrome browser has adopted Opus as the default codec, while Firefox, Opera, and Chromium browsers all support Opus for WebRTC as well. And because of this, more and more WebRTC applications have been adopting Opus to transmit speech over the internet.

Here’s how Opus stacks up against other popular codecs currently in the market as described by its creators Jean-Marc Valin (Mozilla/Xiph.Org), Koen Vos (vocTone), and Timothy B. Terriberry (Mozilla/Xiph.Org). As illustrated below, Opus has the lowest delay (26.5 ms by default), flexible bitrate, broad range of bandwidth support (narrowband to fullband) and optimized for real-time communication.

Chart opus-codec-support-comparison

Extreme Audio Optimization

Opus is extremely flexible because it can adjust bitrate, audio bandwidth, and frame size dynamically on live calls. This support for a range of bitrates, frame sizes, audio bandwidths, sampling rates, and multistream frames ensure that a wide variety of applications can use Opus to transmit audio. This flexibility allows Opus to compensate for varying internet speeds and issues that any user could experience without notice. For example, if your user has a congested WiFi router or are experiencing low network bandwidth, Opus can automatically switch to a lower bitrate for smaller bandwidth consumption seamlessly.

Errors and packet losses are unavoidable when complex systems interact, that’s why Opus has many features and strategies to mitigate poor audio quality during low network connections. Audio engineers can geek out on how it fixes important audio issues during transmission.

Reduced Jitter

Ideally, in a perfect high bandwidth low latency environment, a steady stream of packets should be delivered on a continuous basis. However, even if audio data is being transmitted and played in the right order but not played to the exact timing, sound distortions can occur. Below is an illustration comparing a steady stream of packets during zero congestion versus the same audio transmission (i.e., same packet stream) in a congested environment.

Flow chart of jitter vs no jitter comparison

VoIP applications can experience a lot of jitter because it requires high bandwidths to transmit high fidelity audio. However, even in the event of packet loss, Opus has built-in features such as Packet Loss Concealment (PLC) and dynamic frame sizes to mitigate the symptoms and detection of jitter by the human ear.

  • Good Loss Robustness and Packet Loss Concealment (PLC). When voice is transmitted over IP, packet loss can occur during decoding. However, Opus can use PLC to mask the effects of packet loss. When the codec detects that a packet is missing, Opus has several PLC strategies to hide gaps in lost information. Opus can replace lost speech frames with zeros (i.e., zero insertion), reconstruct missing gaps by repeating a portion that has been successfully received (i.e., waveform substitution), or use speech models and algorithms to fill gaps in speech (i.e., model-based methods). These strategies are especially important for calls to and from areas of low bandwidth networks or WiFi congestion.

Lower Latency

When quantified, the human ear can detect latency greater than 250 ms. While 300 ms is considered industry wide as poor latency, the International Telecommunication Union recommends that latency should be kept below 150 ms to ensure that symptoms of poor voice quality doesn’t affect calls. Our platform is optimized to deliver connectivity under 50 ms to all customers around the globe and support for high value audio codecs such as Opus plays a large role. Opus solves latency issues by supporting variable and constant bitrates and being able to adjust bitrates dynamically.

  • Support for Variable Bitrate (VBR) and Constant Bitrate (CBR). Voice transmission requires a variable bitrate, the ability to change bitrate dynamically to adapt to the audio being encoded. VBR can help achieve a lower bitrate for the same voice quality, which means that it can consume less bandwidth than CBR, leading to improvements in audio quality.
  • Dynamic Bitrates from 6 Kbps to 510 Kbps. Opus will adjust its bitrate between 6 to 510 kilobits per second (Kbps) according to packet loss and round-trip time (RTT) reports during live audio transmission. If the audio call is experiencing increased packet loss and long RTT, then Opus will automatically switch to a lower bitrate to compensate and reduce congestion. The ability to change bitrates dynamically ensures that applications consistently deliver high voice quality and clarity.

Better Packet Loss Concealment

Mitigating packet loss is especially important in real-time communication, because there is no time to resend missing packets. Therefore even low levels of packet loss can cause unnecessary breaks in audio or when packet loss is severe, complete sentences could be missing. Even though Opus cannot directly alleviate packet loss, it can mask the symptoms with reconstruction algorithms like forward error correction (FEC) and other packet loss concealment (PLC) strategies.

  • Good Loss Robustness and Packet Loss Concealment (PLC). When the codec detects that a packet is missing, Opus has several PLC strategies to hide gaps in lost information. Opus can replace lost speech frames with zeros (i.e., zero insertion), reconstruct missing gaps by repeating a portion that has been successfully received (i.e., waveform substitution), or use speech models and algorithms to fill gaps in speech (i.e., model-based methods). These strategies are especially important for calls to and from areas of low bandwidth networks or WiFi congestion.
  • Forward Error Correction (FEC). FEC is another method of addressing packet loss. This feature can significantly improve audio quality because it can reconstruct a missing packet from information from neighboring packets that were previously or subsequently transmitted.
  • Flexible Error Propagation. In the event of packet loss, other audio codecs in the market utilize long-term prediction (LTP) filter states that spend more bits throughout the packet, which requires significant increases in bitrate and delay. To mitigate this, Opus reduces LTP filter states to the beginning of a packet, spending more bits only during the first pitch period, but saving bits throughout the packet transmission. This decreases potential voice quality issues and allocates more bandwidth to transmission.

Reduced Audio Bandwidth

Bandwidth is the amount of information that can be transmitted over a period of time. The larger the bandwidth the more data can be transmitted. Therefore, increasing bandwidth can lead to better audio quality. Strategies for better utilizing bandwidth include transmitting more data each time, transmitting the same amount of data faster, or reducing the amount of data that needs to be transmitted. Opus deploys a discontinuous transmission (DTX) in order to reduce the amount of data being transmitted during periods of silence.

  • Discontinuous Transmission (DTX). Most audio calls have intermittent pauses and periods of silence, therefore by reducing the packet rate during silence can save bandwidth and CPU usage. Discontinuous transmissions give Opus the ability to detect silence and reduce packet rates when no one is speaking. Then, when audio resumes, Opus can increase the packet rate seamlessly.
comments powered by Disqus