Whitepaper

Mobile voice control that listens and understands: DSP Group

February 13, 2013

Telecom Lead Europe: Lior Blanka, corporate vice president and chief technology officer for DSP Group, says 3D noise cancellation technologies are coming soon to mobile devices will make voice interfaces more reliable.

Voice control is the future. Just like touch screen technology revolutionized smart phones, and the ability to use gestures launched new gaming and fitness applications, the possibilities for hands free operation is just beginning.

ARCchart forecasts that 1.8 billion mobile phones with some form of voice control functions will ship globally by 2016, accounting for almost 90 percent of the handsets shipping that year.

The increased availability of voice control applications can be attributed to a number of factors: advances in ASR (automatic speech recognition), cloud computing that enables very high processing power with more robust and reliable algorithms, noise cancellation that reduces ambient noise, and language processing technologies.

However, while ASR-enabled applications work well in quiet environments, their performance tends to degrade drastically in the presence of background noise in noisy cafés, on public transportation, or even when walking on a busy street. Without intelligible speech, automatic voice recognition can’t function properly or be considered as a reliable input device.

These problems look to be addressed by the next generation of mobile devices. A new approach, known as 3D voice processing will become increasingly prevalent and promises to make voice applications a far more practical proposition. 3D voice processing enables ASR to achieve far better accuracy by suppressing background noise while preserving the natural voice of the speaker with only minor distortion. The degradation is so minor that the user experience is hardly affected when operating applications such as Siri or a text message dictation application in noisy public venues. So what is 3D voice processing, and how does it differ from what’s in use today?

The new dimension of voice processing – 3D

Traditional noise cancellation suffers from tradeoffs between the degree of noise reduction and voice quality: the higher the noise reduction levels, the greater the potential for voice distortion. Attempting to minimize the tradeoffs, engineers have developed noise reduction algorithms to reduce the amount of noise which perform well mainly in stationary noise and poor performance in non-stationary noise such as street noise and similar other noises.

Noise cancellation technique took a leap forward with the introduction of a second microphone in smart phones, enabling both microphones to operate in similar manner to the human auditory system. However, this capability does not provide sufficient noise cancellation to eliminate all background noise for voice calls or voice control, while driving or riding on public transportation, or even at home when, for instance, music is turned up loud.

Advanced noise cancellation technology uses an additional sensor in addition to the standard two audio microphones, and then applies a 3D-Vocal algorithm to perform multiple voice processing tasks including background noise cancellation, loudness equalization and general voice enhancement. Removing background noise significantly improves the accuracy rate of ASR, (Automatic Speech Recognition) and voice-call applications for smart phones, tablets and other mobile devices.

By incorporating the advanced noise cancellation capabilities into smart phones for voice communication, the voice quality can be significantly improved from poor to very good. When the audio quality of 3D voice processing was compared with standard 2D noise cancellation techniques using the ETSI EG 202 396-1 standard, 3D voice processing performed significantly higher. The improved performance was most dramatic for road conditions, where 3D voice processing very good when compared to poor for 2D.

3D Voice processing provides a variety of benefit to consumers in addition to improved voice control. It takes the strain out of hearing, being heard and understood in any surroundings when speaking directly in the microphone and in speaker mode. Conference calls can be taken on-the-go, in an office or in a noisy public venue without compromising voice quality or intelligibility. Background sound replacement, with music or other sounds, can provide a potential revenue generator for operators with new services that could be available at a premium like ring tones.

For safely and convenience reasons, hands free operation will often be the first choice with consumers. And yet voice control is just beginning to see its true potential. Test results indicate that 3D voice processing can significantly improve the reliability and usability of voice control enabling it to become a valuable differentiator. With the latest technology the additional benefits can be realized by consumers while enabling operators and consumer electronics manufacturers to also experience a new series of revenue generating products and services.

By Lior Blanka, corporate vice president and chief technology officer, DSP Group
[email protected]

With more than 25 years of technological experience, DSP Group corporate VP and CTO Lior Blanka previously served as 3.5G Cellular Phone Cross-Sites Manager at Intel, and CDMA Phone Reference Design Manager at DSP Communications.

[email protected]