How close are we to Westworld? An A.I. and Voice Recognition Expert Weighs In

Tejas Shastry

June 22, 2018

Most AI researchers I know are addicted to Westworld. Here’s why: it’s a representation of how far technology could go, while staying within the realm of plausible reality. The seamless interaction between humans and AI portrayed by HBO is something myself and other AI researchers strive to achieve.

When Logan Delos (a potential investor) meets a room full of Hosts (what Westworld calls robots) and doesn’t realize it until he’s told, his expression is one of amazement and wonder. It’s a profound moment – AI seamlessly passing off as human. I still remember when Facebook debuted facial recognition by placing boxes over faces in your photos and suggesting who it was. The first time I saw it, I had that same feeling as Logan – “wow, AI has been analyzing my photos, knows my friends faces, and I didn’t even realize it.”

More and more, I find examples of current technology edging towards the world portayed in Westworld. The ability for Hosts to communicate with and learn from each other are examples of real-world technologies: speech recognition, speech synthesis, and natural language understanding. We implement concepts like these with Scribe. Scribe, our speech recognition technology, learns from thousands of hours of data, as well as having another AI speech synthesis model make up phrases and speak to it. Similar to how the Hosts’ personas can develop in Westworld, Scribe and its speech-synthesis companion can reinforce each other’s learning.

While current technology is catching up, there are still some hurdles in achieving Westworld-quality AI and some differences in approaches. Natural language understanding (parsing what a human says and deciding how to respond) is still in its infancy. In the show, “understanding” is accomplished by meticulous programming by the staff, telling the Host how to react in a multitude of situations. In reality, this is an unfathomably large task, and most AI research instead focuses on having AI learn how to interact with the world without human intervention.

A more realistic (and powerful) concept is the idea that Hosts become conscious through memory. Real-world AI doesn’t necessarily remember how or why it got something wrong when it’s out in production. Having that constant feedback loop, the internal voice, is an unsolved issue in AI, but there are an increasing number of approaches to it.

Both of these concepts – memory and natural language understanding, could be key to reaching Westworld-level AI. One of my favorite scenes [warning: season one spoiler] is the one where Bernard realizes he’s a host. He’s a host who manages Hosts. He blurs the lines between the roles of humans and Hosts and shows just how much Hosts are capable of beyond a specific scene or script. Perhaps with increased research, we too can have AI models managing and teaching other AI models, just like Bernard.

So how close are we? Very far. We have a lot of pieces of AI good at single things – neural nets that can recognize faces, human speech or play games at superhuman levels. But, we don’t have a great, general way of putting all those pieces together.

In the meantime, I’m excited for the finale (and to see if they made Spaceworld).