Last Thursday, I was fortunate for the opportunity to travel to Chattanooga, TN to attend the second annual Alexa Conference and join a group of some of the smartest people working on Voice technology. The cool thing about the Alexa Conference is that it’s not sponsored by Amazon (or Google or any other major tech company), it’s fully-independent, sponsored by third parties, and therefore it truly feels objective and unbiased. The attendees and speakers ranged from third party “skill” agencies, skill developers (domestic and international), certified Alexa champions, skill analytic and diagnostic providers, a representative from the FTC, insurance and healthcare reps, to futurists, Internet of Things specialists, digital transformation experts, behavioral economists, doctors, PhD scientists, former NASA employees, and a random dude from the Audiology industry who writes a blog called FuturEar.
I have been following the #VoiceFirst movement, which includes progress in the Voice User Interface (VoiceUI), the devices that house our smart assistants (smart speakers, smartphones and wearable technology), devices that work in conjunction and respond to smart assistants (Internet of Things), and our smart assistants as a whole for the past few years. I think I may have learned more in the 48 hours that I attended this conference than I have in the thousands of hours leading up to it. Ok, that’s probably some hyperbole there, but there was a ton of insight and these were my favorite takeaways from the show:
Context is King
One of the big questions that I had heading into Chattanooga was, “how do we take this all to the next level?” I now have the answer and it all derives from context. Deep, personalized contextual awareness. What does that mean? Well, for starters, let’s establish that smart assistants feed and grow stronger on “personal data.” The only way that these assistants ever get any more useful or “smarter” is by learning more about us.
A really good way to think about this is through the lens of ordering your favorite pizza. My favorite pizza (shoutout Imo’s) is probably pretty different than your favorite pizza. The web, as we know it, is built on a pay-per-click model, so when I search on Google for pizza options around me, the results are going to show advertised options at the top. These are not in any way personalized to me and therein lies the big difference. When I’ve ordered pizza 20 times through my smart assistant, 15 of which have been Imo’s, and then I’m in Chattanooga for work (where Imo’s does not exist) and I want to order a pizza, my smart assistant will provide me results similar to Imo’s in Chattanooga. The smart assistant knows my preferences and therefore will actively distill the options for me to cater to my personal preferences.
Taking it one step further, think about all the other personal information that you probably share or are having shared with you that can broaden the assistant’s contextual awareness. If your friends have been to Chattanooga and they raved about a pizza spot on Instagram months or years ago, your smart assistant could retrieve that and factor that into your results. So now it’s not just based on your own pizza preferences, but also factoring in other variables such as your friends’ experiences and preferences.
This begins to bring privacy and security front and center. One of the really interesting presentations was from the German guys at 169Labs. While the attitude in the US around privacy is pretty lax and apathetic, it’s important to understand that our attitude here in the States is quite different than how many Europeans feel. They take their privacy way more seriously and it’s a top of mind issue that permeates in any tech discussion. Privacy will continue to be a topic of discussion as our smart assistants evolve and we become increasingly more aware of just how much data we are sharing. I believe the pros outweigh the cons when it comes to sharing your personal data with your smart assistant(s), but the key is going to be feeling safe that it is all encrypted and protected from being hacked.
The beginnings of Conversational Interfaces
One of the more frustrating aspects of smart speakers and smart assistants is the lack of continuity. Currently, our smart assistants function in a way that is more or less executing single commands or single questions. There isn’t really any dialogue, it’s typically, “Alexa shuffle my playlist” or “Alexa set a timer for 15 minutes” or “Alexa what’s the weather?” or “Alexa how many ounces are in a gallon?” Asking a question or issuing a command and having the device turn off afterward is not the goal for our smart assistants. Brian Roemmele compared this level of sophistication to the command line in the PC era. It’s super primitive and we’re in the first inning of a double-header with this technology.
Instead, what we need is, again, contextual awareness in order to have a dialogue. Katie McMahon of Soundhound did an awesome job demoing Soundhound’s own smart assistant, Hound, with some real contextual awareness:
So she starts off by saying, “Show me Asian restaurants, excluding Japanese and Chinese, that are open right now.” An accomplishment in itself that Hound so quickly answered accurately. Then she goes on to further refine the search, “Ok Hound, show those with outdoor seating.” The key word there is, “those,” as the assistant is now aware of the context because it recognizes that “those” is a demonstrative pronoun representing the Asian restaurants from the previous query. This is HUGE! In a dialogue with another person, you’re constantly using pronouns and language that references context from earlier in the conversation or conversations prior. It’s an essential part to how we communicate and we’re already seeing with smart assistants like Hound demonstrate that these assistants are more than capable of this type of complex contextual awareness. Without this ability, I doubt smart assistants will ever be taken that seriously.
Next, she goes one step further. Katie says, “I need an Uber to the first one.” So not only does the assistant recognize that “the first one” is in reference to the first result from the previous search, but its capable of using multiple “domains” or skills in conjunction. This is a significant step forward on something that we’re pretty limited with today. How many people would have been able to accomplish that all in one app on their phone? Most likely, you’d use multiple apps like Yelp, grab the address of where you want to go and pop it in Uber. Furthermore, if your assistant is factoring in more data for a more personalized result list, by retrieving your friends’ Instagram and/or Swarm data, then we’re comparing that to you going into each app and hunting for the relevant posts. This is clearly an improvement in time and efficiency.
It’s honestly amazing what’s going on over at Hound and the ability its assistant has with retaining information throughout the dialogue. It was the first time I really saw this level of continuity and contextual awareness in a smart assistant and it made me very optimistic about the potential of smart assistants when they’re able to have this type of dialogue. When I wrote about the key to the adoption of a new user interface being the reduction of friction, this is ultimately what I was referring to. Even a primitive conversational interface would still dramatically reduce the time that we currently spend app toggling, tapping and searching on our phones for many things. We’re moving far, far beyond just using assistants to set timers.
(Here’s a link to another video of Katie asking Hound for hypothetical monthly mortgage payments based on home values, interest rates and down payment percentages. I was blown away by these demos and have been using Hound since I’ve gotten back. It’s encouraging to see that all of this innovation is not limited to just Amazon, Google, Facebook and Apple.)
Proactive & Personalized
Another huge advantage of feeding your smart assistant personal data is that it can begin to proactively engage you on what it’s finding. This is where the Internet of Things (IoT) begins to get really interesting. If you have a smart fridge and your smart assistant is accessing that data, it then knows stuff like your fruit being spoiled. Couple that with the possibility that you’ve given your assistant access to all of your calendar data, so it knows that you have a dinner party that night and you’ve, in some way or another, logged that you need the fruit for your recipe for said dinner party. So, what we’re moving toward is a scenario where the smart assistant in your car or hearable pings you and says something along the lines of, “Hey Dave, it looks like your blackberries have gone bad and you need them for your dessert recipe for Julie’s dinner party tonight. It looks like they’re currently on sale at Mariano’s down the street, would you like to go?” and then navigate you down to grocery store.
This was a big aha! moment for me. So much of #VoiceFirst is happening in disparate areas that it’s hard to bring all of it together into one specific use case like I just mentioned above. When they’re silo’d off on their own, you hear, “smart fridge” and you think, “Really? What’s the point of that?” But when you start looking at all of these IoT devices as data entry points for your smart assistant, which your assistant can then actively retrieve, assess and then provide actionable insight (all very quickly), it all then becomes a whole lot more compelling. This is, “small data” as opposed to big data. It’s personal to each of us and therefore invaluable to each of us. This opens the door to serendipity and assistants proactively providing you with suggestions and reminders from data that you likely aren’t even aware of.
Some other Takeaways
- Brian Roemmele harped a lot on the idea of “enabling creatives to enter into the fold.” He used the analogy of Steve Jobs empowering the graphic designers with the iPhone, as the iPhone created a massive abundance of opportunity for that profession. The same will be done with voice for many more creative types that include comedians, poets, psychologists, storytellers, artists, historians, writers, etc. Therefore we need to a set of tools that are easy enough for anyone to use and create with.
- VoiceXP demonstrated a number of unique skills specifically for the Echo Show. I also appreciated that Bob Stolzberg really emphasized the fact that the Echo Show is version one of Amazon’s multi-modal strategy. We’re quite literally scratching the surface here with what’s possible when you add in screens and mixed-modality into the #VoiceFirst equation. Some really exciting opportunities around this.
- Keynote speaker Ahmed Bouzid presented a plethora of fascinating facts and charts, but the one that stood out to me were two of the demographics that can benefit the most from a #VoiceFirst world: seniors and physically incapacitated. This is at the heart of why I’m so passionate about spreading awareness to the #audpeeps, audiology and hearing aid industry about #VoiceFirst. Smart assistant integration is coming to hearing aids and this new use case for hearing aids, hearables and all our ear-computers, stands to really benefit those who struggle in a mobile world. Mobile computing is not conducive to these two demographics and I’m ecstatic about the possibility that these demographics will soon be empowered in a way they never have before. It’s an awesome value-add that you can advocate for free and it will dramatically improve the patient experience over time.
This was an amazing two days and I’ll definitely be back next year to continue to gather everything I can about what’s happening in the #VoiceFirst world and how they’ll ultimately impact those of you who work with the little computers that go in the ear.
-Thanks for reading-