A Meeting of the Minds
Last week, I made my annual trek to Chatanooga, Tennessee to gather with a wide variety of Voice technology enthusiasts at the Alexa Conference. Along with the seismic growth of smart speakers and voice assistant adoption, the attendees grew quite dramatically too, as we went from roughly 200 people last year to more than 600 people this year. We outgrew last year’s venue, the very endearing Chattanooga Public Library, and moved to the city’s Marriott convention center. The conference’s growth was accompanied with an exhibit hall and sponsorships from entities as large as Amazon itself. We even had a startup competition between five startups, where my guest, Larry Guterman, won the competition with his amazing Sonic Cloud technology.
In other words, this year felt indicative that the Alexa Conference took a huge step forward. Cheers to Bradley Metrock and his team for literally building this conference from scratch into what it has become today and for bringing the community together. That’s what makes this conference so cool; it has a very communal feel to it. My favorite part is just getting to know all the different attendees and understand what everyone is working on.
This Year’s Theme
Bret Kinsella, the editor of the de-facto news source for all things Voice, Voicebot.ai, presented the idea that we’ve moved into phase two of the technology. Phase one of Voice was all about introducing the technology to the masses and then increasing adoption and overall access to the technology. You could argue that this phase started in 2011 when Siri was introduced, but the bulk of the progress of phase one was post-2014 when Amazon rolled out the first Echo and introduced Alexa.
Since then, we’ve seen Google enter into the arena in a very considerable way that has culminated into the recent announcement that it would have one billion devices with Google Assistant enabled. We also saw smart speaker sales soar to ultimately represent the fastest adoption of any consumer technology product ever. If the name of the game for phase one was introducing the technology and growing the user base, then I’d say mission accomplished. On to the next phase of Voice.
According to Bret, phase two is about a wider variety of access (new devices), new segments that smart assistants are moving into, and increasing the frequency in which people use the technology. This next phase will revolve around habituation and specialization.
In a lot of different ways, the car is the embodiment of phase two. The car already represents the second most highly accessed type of device behind only the smartphone, but offers a massive pool of untapped access points through integrations and newer model cars with smart assistants built into the car’s console. It’s a perfect environment for using a voice interface as we need to be hands and eyes-free while driving. Finally, from a habituation standpoint, the car, similar to smart speakers, will serve the same role of “training wheels” for people to get used to the technology as they build the habit.
There were a number of panelists in the breakout sessions and general attendees that helped open my eyes to some of the unique ways that education, healthcare, business, and hospitality (among other areas) are all going to yield interesting integrations and contributions during this second phase. All of these segments offer new areas for specialization and opportunities for people to increasingly build the habit and get comfortable using smart assistants.
The Communal Phase Two
Metaphorically speaking, this year’s show felt like a transition from phase one to phase two too. As I already mentioned, the conference itself grew up, but so have all of the companies and concepts that were first emerging last year. Last year, we saw the first Alexa-driven, interactive content companies like Select a Story and Tellables starting to surface, which helped shine a light on what the future of story-telling might look like in this new medium.
This year we had the founder of Atari, Nolan Bushnell, delivering a keynote talk on the projects he and his colleague, Zai Ortiz, are building at their company, X2 Games. One of the main projects, St. Noire, is an interactive, murder-mystery board game that fuses Netflix-quality video content for your character (through an app on a TV) with an interactive element for the players having to decide certain decisions (issued through a smart speaker). The players’ decisions are what will ultimately impact the trajectory of the game and allow for the players to progress far enough to solve the mystery. It was a phenomenal demo of a product that certainly made me think, “wow, this interactive story-telling concept sure is maturing fast.”
Witlingo now has a serious product on its hands with Castlingo (micro-Alexa content generated by the user). I feel like while podcasts represent long-form content akin to blogging, there seems to be a gap to fill for more micro-form audio content creation more akin to tweeting. I’m not sure if this gap will ultimately be filled by something like Castlingo or Flash Briefings, but it would be awesome if a company like Witlingo emerged as the Twitter for audio.
Companies like Soundhound continue to give me hope that white-label assistant offerings will thrive in the future, especially as brands will want to extend their brands to their assistants, and not have something bland and generic. Katie McMahon‘s demos of Hound never cease to amaze me either, and it’s newest feature, Query Glue, displays the furthest level of conversational AI that I’ve seen to date.
Magic + Co’s presence at the show indicated that digital agencies are beginning to take Voice very seriously and will be at the forefront of the creative ways brands and retailers integrate and use smart assistants and VUI. We also had folks from Vayner Media at this year’s conference which was just another example that some of the most cutting-edge agencies are thinking deeply about Voice.
Finally, there seemed to be transition to a higher phase on an individual level too. Brian Roemmele, the man who coined the term #VoiceFirst, continues to peel back the curtain on what he believes the long-term future of Voice looks like (check out his podcast interview with Bret Kinsella). Teri Fisher seemed to be on just about every panel and was teaching everyone how to produce different types of audio content. For example, he provided a workshop on how to create a Flash Briefing, which makes me believe we’ll see a lot of people from the show begin making their own audio content (myself included!).
From a personal standpoint, I guess I’ve entered into my own phase two as well. Last year I attended the conference on a hunch that this technology would eventually impact my company and the industry I work in, and after realizing my hunch was right, I decided that I needed to start contributing in the area of expertise that I know best: hearables.
This year, I was really fortunate to have the opportunity to present on the research I’ve been compiling and writing about around why I believe hearables play a critical role in a VoiceFirst future. I went from sitting in a chair, watching and admiring people like Brian, Bret and Katie McMahon share their expertise last year, to being able to share some of my own knowledge this year to those same people, which was one of the coolest moments in my professional career. (Stay tuned, as I will be releasing my 45-minute talk into a series of blog posts where I break down each aspect of my presentation.)
For those of you reading this piece who haven’t been able to make this show but feel like this conference might be valuable but aren’t sure how, my advice to you is to just go. You’ll be amazed at how inclusive and communal the vibe is and I bet you’ll even walk away from it thinking differently about you and your business’ role as we enter into the 2020’s. If you do decide to go, be sure to reach out as I will certainly be in attendance next year and the years beyond.
-Thanks for Reading-