Conferences, Daily Updates, Future Ear Radio, VoiceFirst

Voice Summit Day One Recap (Future Ear Daily Update 7-24-19)


Although the Voice Summit technically started on Monday, that day was considered “Day 0” and Tuesday marked Day One of the summit. The thing that makes this summit so impressive and valuable is that it’s packed full of speakers from start to finish. It led off with three back-to-back keynotes from three of the most influential people in the voice space.

I took to twitter yesterday to try and live-tweet each of the sessions as best as I could, so feel free to click into each embedded twitter thread for each session for my real-time thoughts.

First up, was Dave Isbitski, chief evangelist of Alexa. Dave talked a lot about the first 5 years of the Alexa journey, highlighting various phases along the way to where we are today with Alexa. We’ve moved to single-turn conversations to multi-turn, and as Dave detailed, the next phase is for multi-session, which means that Alexa will start to understand the user’s context and in time learn things about the user like preferences. This is all achieved through deep learning modules.

Dave also unveiled a new workflow tool called “Skill Flow Builder” that allows anyone, developer or non-developer, to easily input and adjust dialogue within skills. The use case that Dave highlighted for this was interactive story telling games. Just as I tweeted, this really harkens back to Brian Roemmele talking about the need for the technology to be simple enough to, “bring the creatives into the fold.” Skill Flow Builder does just that.

One of my favorite portions of Dave’s talk was around flash briefings and some creative ways that people are starting to use them, such as for internal corporate communications. Flash briefings continue to strike me as one of the most unique aspects to Alexa and something that we’re only just starting to scratch the surface with.

Next was Adam Cheyer who co-founded Siri and sold it to Apple, then moved onto Viv Labs, which was purchased by Samsung, where Adam now works. Adam heads up the Bixby division, and Bixby 2.0 is the first iteration of the voice assistant under Adam’s leadership. Obviously, when one of the founding fathers of voice assistants is presenting, you’re due for some interesting insight.

To round out the initial keynotes, we had the pleasure of Noelle LaCharite of Microsoft talking about Cortana’s makeover. I think Microsoft is smart to have pivoted Cortana away from competing with Google and Alexa as a “master assistant”  and instead positioned Cortana as “the master of Microsoft.” As Noelle pointed out, Cortana is wonderful when it’s tasked to do things housed inside Microsoft’s properties, such as scheduling meetings with outlook. Additionally, I appreciate the focus Microsoft has around accessibility, which is clearly a motivation for Noelle personally.

After the first three keynotes, the breakout sessions began. The one downside about this conference is that there are about seven sessions going at once, and so it can be really hard to choose which session to attend. I decided to go see Mark C. Webster’s talk on “why conversational interfaces are designed to fail.”

This was one of the better talks I’ve heard in the voice space and the reason was that Mark really shot the room straight with the state of conversational interfaces. One of the key points that he made was that the metaphor construct we use for assistants as “people” might be leading to confusion and poor experiences among users. In previous computing interfaces, images allowed us to create metaphors (whether they be desktop icons or app icons) to communicate the intent of the icon. Voice on the other hand does not have really offer a similar construct.

The issue with creating the expectation that you can just, “speak to Alexa as you would a person,” is that it’s not really true. Alexa and Google Assistant exist today because the natural language processing engines that these assistants run on have advanced considerably in the past decade, allowing them to capture our speech with high accuracy. But, just because they can accurately capture what we’re saying does not mean that Alexa knows what to do with your input, and therefore leads to, “I’m sorry, I do not understand that.” That was the crux of Mark’s presentation – maybe we shouldn’t be setting the expectation that these are “conversational devices” quite yet.

The last session of the day that I saw was Brielle Nickoloff of Witlingo talking about the evolution of the Voice web. This was an awesome talk that included a really stellar demo of Buildlingo’s newest update. One of the key points from this talk was that as tools continue to surface (i.e. Buildlingo and Castlingo) that facilitate easier and faster audio content creation, the world of audio content creation begins to be democratized. Brielle did a great job drawing parallels of the voice web with the evolution of the internet in its various phases and how it progressively became easier and easier to share content on the web, to the point that anyone could quickly and easily share anything on sites like Twitter, Facebook and Youtube.


All-in-all, it was an awesome day. I learned a lot, met a ton of people, connected with old pals, and got a good understanding of where we are with voice technology in its various pockets. Onto day two!

-Thanks for Reading-


To listen to the broadcast on your Alexa device, enable the skill here

To add to you flash briefing, click here

To listen on your Google Assistant device, enable the skill here 

and then say, “Alexa/Ok Google, launch Future Ear Radio.”

1 thought on “Voice Summit Day One Recap (Future Ear Daily Update 7-24-19)”

Leave a Reply