Like much of the tech featured in the original Star Trek and other futuristic shows, communicating with a computer directly through speech has long been a reality via an interface dominated by Nuance, the company behind the virtual assistant female voice many hear on their smartphones and computers. But Google is upping the game with software that promises to deliver more than its competition does now and down the road.
In March, Google unveiled the Google Cloud Speech API, open-sourced so third-party developers can apply the voice capabilities for their own enterprise needs. Google is granting free access during the Limited Preview phase. There is no end date to this preview phase, according to Google's site.
Google's product may supplant Nuance, which currently dominates the voice-activation market. In addition to Apple Siri, Nuance created Nina, a voice-activated virtual assistant for businesses. Taking mobile applications to the connected car, Nuance also built Dragon Drive, which it describes as "car meets cloud." The vendor promises to enable voice commands that provide a "connected yet distraction-free driving" experience. Currently more than 90 million cars use Nuance technology, according to the company.
Get Off My Roof, David
We may never perform stunts like David Hasselhoff, but one day our IoT cars could talk to us like KITT. (Sources: KITT Interior - Wikipedia; Hasselhoff - Saad Faruque)
But that doesn't necessarily mean 90 million happy customers. According to studies cited by the Wall Street Journal
, some drivers experience a great deal of frustration with their car's voice-activation systems. Even Nuance's Vice President Arnd Weil admitted the company is addressing the problem of accurately picking up commands when the driving environment is noisy and improving the system's understanding of languages, dialects and accents.
Nuance lists 42 languages for its speech-to-text (ASR), though the actual number drops if you eliminate the variations on the same language, like three versions of Arabic and English, as well as two versions of Mandarin, French, Portuguese and Spanish specific to particular countries. That leaves quite a lot of languages out from the possibility of voice activation.
In contrast, Google supports 80-plus languages and variants in Cloud Speech API. It also is designed to filter out what you don't want, like the noisy background noise that plagues voice-activated functions in connected cars, according to the developer.
Information coming through audio can come directly through a device's microphone or from a previously recorded audio file. Cloud Speech API supports a number of audio file formats, including FLAC, AMR, PCMU and linear-16. Users can upload audio files that may, in later versions of the software, be integrated with Google Cloud Storage, the vendor says.
It also has built-in machine learning to improve accuracy in proportion to its use. Working with deep learning neural network algorithms, the speech API grows to better recognize the user's sound and terms over the course of its interaction, Google claims.
Perhaps Google's product will solve the problems that persist in current voice recognition for connected devices and enterprise applications, further enhancing the Internet of Things and human relationships with machines. What do you think?
— Ariella Brown, Freelance Contributor. Special to The New IP