Shared Machine Learning Across Wall Street Without Sacrificing Data Security

Tejas Shastry

December 19, 2018

Read GreenKey’s White Paper on Error Correction

Machines are quickly learning to understand human speech as well as human beings, and rapid gains across the industry are fueled by a singular focus: data. Speech recognition engines obtain the best accuracy by listening to thousands of hours of recorded audio.

While audio training data is easy to obtain in some industries, those who work in financial and emergency services know how sensitive audio data can be. Personal identifying information, industry trade secrets, and other sensitive info might riddle valuable audio data making it challenging for an external party to train a speech recognition model from.

Like many machine-learned speech recognition engines, GreenKey learns to recognize speech by training from human-generated audio and human-generated transcripts. Unlike other engines, GreenKey’s base model can learn from customized models without having access to the original raw data. This ability allows everyone in the GreenKey community to benefit from shared learnings without sacrificing data privacy.