Today’s episode of the Future Ear Podcast features Jeff Adams, founder and CEO of Cobalt Speech & Language. Jeff’s been a speech technologist for 25+ years, spending time at Nuance, Yap, and Amazon. At Amazon, Jeff founded the Alexa Speech Group and led the team of speech and language scientists that built the automatic speech recognition and natural language understanding engines that Alexa runs on. During this conversation, we walk through the major hurdles that Jeff’s team had to solve to bring Alexa to life, and look to what’s on the horizon with conversational voice assistants more broadly.
We kick things off by looking at a blog post that Jeff’s successor at Amazon, Rohit Prasad, wrote in November of 2019 as the company celebrated Alexa’s 5th birthday. In Rohit’s post, he cites four fundamental AI tasks that allowed Alexa to be “magical.” The tasks cited were wake word detection, automatic speech recognition (ASR), natural language understanding (NLU), and text-to-speech synthesis. So, the first half of our conversation covers how Jeff’s team ultimately solved each of these four pillars.
Like so much great technology, we sort of take for granted just how challenging it was to build. In these early days of voice assistants, it’s easy to get frustrated by their limitations, but as we discuss, it’s worth noting how much technical difficulty has already been solved that seemed nearly impossible only a decade ago. One of the big breakthroughs, according to Jeff, was the advancements around deep learning that were transpiring right around the same time Jeff’s team was building Alexa’s ASR & NLU. As Jeff mentions, deep neural networks provided a novel way of characterizing the “acoustic model” allowing for a much more accurate way to properly recognize speech and sounds.
After understanding how the four pillars were solved and establishing how we got to now with the technology, Jeff then sheds a light on where we go next. We use the COVID-19 pandemic to highlight how conversational AI can be used today for public health crises based on what voice assistants are capable of today. From there, we explore what breakthroughs we should be on the look out for that will support the conversational AI capabilities of tomorrow.
This was incredibly cool for me to get to interview one of the key scientists who built Alexa and hear how exactly he and his team did it. By better understanding how we got to now, I hope my listeners and I can better understand where we maybe go from here. For now, we can thank Jeff Adams and the Alexa Speech Group for the ability to access this podcast, simply by saying, “Alexa, play the most recent episode of the Future Ear Radio podcast.”
-Thanks for Reading-