Alexa and Siri Wont Work in Noisy Environments

Matthew Goldey

July 17, 2018

Voice assistants are everywhere, especially in people’s homes. We use Alexa, Siri, and Google Home to control our lights, answer trivia questions, and play music. Notably, sales of voice assistant devices have more than doubled in the last year.1

Beyond the home, more people are using voice commands on their phones. By 2020, nearly half of all internet searches are predicted to be voice driven.2 Why type when you can talk? Speech recognition is now three times faster than a human for short phrases.3

Not far behind the consumer marker, the business world is catching on. Unlike households (well, some households), business environments are often noisy. Even further, daily activities are not transferable from business to business. A financial trader wants to pull up a chart of Apple’s stock price. A first responder needs to look up warrants for an address. A logistics professional will pull up orders from the last three months. These noisy workplaces make developing skills challenging.

What’s a skill?

To help people, voice assistants have specific “skills” they can understand. These skills have an “intent” and “entities” related to that “intent”. You might tell your Amazon Echo “Alexa, turn on the bedroom lights”. The “intent” of the voice command is what Alexa should do. The entities are the action “turn on” and the object “bedroom lights.”

These skills often fail in noisy environments. Let’s say you were building a skill to react to the command:

Get me directions to 55 West Monroe Street

Behind each voice command is a voice-to-text transcription engine. In practice, this is at most 95% accurate, and 75% or less accurate in noisy environments. You may get transcripts back like this:

Get me directions to 55 West Monroe Street
Get me directions to 55 West Rose Street
Give me directions to 55 ... Monroe...
Give me directions to 55 Western Row Street

If you build a skill with the entities direction and address, your skill can miss the address – or find a wrong address – if you only rely on the most likely transcript.

Enter Scribe Discovery Engine

We built Scribe Discovery to help developers write skills for noisy environments.

Scribe’s real-time transcription engine provides hundreds of possible transcripts.

Scribe ranks these by how much they sound like the audio it heard. Discovery searches the rich output to find the targeted terms.

In our example, Discovery finds not just the target address, but also these other possible addresses:

55 West Monroe Street 5 West Monroe Street 55 West Monroe 55 Monroe 5 Monroe 55 West Rose 55 Western 5 Western

As a developer, you can build skills for transcripts with no mistakes. Discovery will sort through all possible transcripts for you, uncovering 10-25% more entities than present in the transcript, clearly an advantage in noisier environments. 


Check out the video below to see Discovery in action. 




Interested in learning more? Check out our documentation on Discovery.