Expanding Scribes Accent Coverage

Amy Geojo

August 14, 2018

Voice recognition is only as good as the inputs used to train the technology. That’s why introducing Scribe to the dialect of hundreds of accents has been crucial to ensuring its accuracy for transcription services.

We started with this line:

Please call Stella. Ask her to bring these things with her from the store: six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob. We also need a small plastic snake and a big toy frog for the kids. She can scoop these things into three red bags, and we will go meet her wednesday at the train station.

Of course, if you ask a room full of people to all say those words out loud, it will sound different coming out of each of their mouths (for a quick reference on how we all talk differently, check this out). These differences are what we wanted to pinpoint to enable Scribe to be as smart as possible – to be able to say with confidence, that no matter where someone comes from, or what their background is, their dialogue will be understood.

Some background: Automatic Speech Recognition (ASR) systems identify and process human speech. One primary use of such technology is to convert speech to text. This is no mean feat. For starters, there isn’t a 1:1 mapping between phonemes (linguistically meaningful speech sounds) and graphemes (characters comprising an alphabet). For instance, English has (debatably) 44 phonemes but only 26 letters. Moreover, the acoustic signature associated with a phoneme varies due to individual speaker differences and contextual effects, among other reasons.

Still, phonemes are categories and categories have boundaries. If everyone spoke the same dialect, an ASR system could theoretically use acoustic differences (formant differences etc.) to distinguish phonemes. (Background reading here and here.)

Dialects pose an additional challenge for ASR systems because the same word can be associated with different phonemes entirely. (Interesting blog from two linguists about vowel shifts.)

We leveraged data from the Speech Accent Archive, curated by linguists at George Mason University (Weinberger, S., 2013), which contains recordings from hundreds of people speaking the line above.

Considering participants born in the United States:

  • The data includes recordings from individuals born in Washington D.C. and all states except Delaware. Of these, 25% are from individuals born in Virginia, California, New York or Pennsylvania.
  • Ninety-five percent (342) participants spoke English from birth. The native languages for the other 18 speakers: Arabic (3), Mandarin (2), Greek (2), Farsi, French, Kikongo, Korean, Russian, Spanish, Tagalog, Twi, Urdu, Yiddish and Yupik.

Worldwide, 163 unique countries were represented with 191 different native languages. A problem (or a wonder of human achievement) is that individuals born in the same country do not necessarily share the same native language. For instance, 11 different native-languages are represented in the set of audio recordings from Chinese born speakers: 23 native speakers of Cantonese, 52 native speakers of Mandarin, with the remaining 13 spoken by native speakers of the other 9 languages.

Running and optimizing Scribe with this study, we have very promising results to report: We are very well covered in 49 out of the 50 states (sorry, Oklahoma), and well covered in most countries all over the world.


United States Accent Accuracy (English)


World Accent Accuracy (English)


Here’s why this is important: Our clients conduct business worldwide. We are dedicated to providing accurate transcriptions and entity discovery, no matter where they come from or where they live. In order to do so, it is important that we develop robust programs that are as insensitive to accent-based variances as possible.

And we’re on our way.

Blog

August 2019 Newsletter

Tejas Shastry, our Chief Data Scientist, recently participated in a highly successful webinar with Greenwich Associates where he discussed artificial intelligence and NLP in trading, including his thoughts on potential future uses in structuring data.
Read More
August 2019 Newsletter
Blog

Artificial Intelligence on the Trading Desk Webinar: A Summary

GreenKey’s Chief Data Scientist, Tejas Shastry, was a featured panelist in a July 25 webinar hosted by Greenwich Associates that discussed various aspects of artificial intelligence now being used on trading desks.
Read More
Artificial Intelligence on the Trading Desk Webinar: A Summary
Blog

There are trade signals within your audio data

Identification and interpretation of pace, volume, pitch, and cadence are part of the human brain’s processing ability to identify sentiment, based on years of training.
Read More
There are trade signals within your audio data
Blog

July 2019 Newsletter

This month we highlight our ability to unlock conversations across voice and chat - leading to fewer missed trading opportunities across clients and asset classes - as well as new analytics reports specifically tailored to show desk heads what's trending.
Read More
July 2019 Newsletter
{"slides_column":1,"slides_scroll":1,"dots":"true","arrows":"true","autoplay":"true","autoplay_interval":2000,"speed":300}

By signing up, you agree to the Terms of Use, the Privacy Policy and the transfer of your personal data from your country of residence to the United States (if different).

We promise we don’t send spam.