Yesterday’s update was based around Cathy Pearl’s fantastic talk from the Voice summit around democratizing voice technology and using it to empower disabled individuals that can benefit from this technology. Today, I wanted to highlight another cohort that stands to gain from voice tech and that’s our aging population. Prior to Cathy’s talk, I attended an equally awesome session led by Davis Park of Front Porch and Derek Holt of K4Connect. I live-tweeted this talk as well (I’m sorry if I overloaded your twitter feed while at the summit!):
To add some context here, K4Connect is a tech startup specifically geared toward building “smart” solutions for older adults. Front Porch, on the other hand, is a group of retirement facilities located in California that has been piloting a series of programs to implement Alexa-enabled devices into its residents’ homes. The two are now working together to expand Front Porch’s pilot to move into phase two, where K4Connect is helping to outfit Front Porch’s residents with IoT devices, such as connected lights and thermostats.
From my perspective, this was one of the most important sessions of the entire Voice summit. The reason I say this is because it honed in on two key facts that have been reoccurring themes throughout FuturEar:
America’s population is getting considerably older due to the facts that we’re living longer and 10,000 baby boomers are turning 65 years old every day for a 20 year stretch (2011-2030).
The older our population gets, the higher the demand climbs for caregivers to look after our aging adults. It was stated in the presentation that we as a nation will need to recruit and retain 300,000 additional caregivers to meet the 2026 demand. Again, the demand will only continue to go up based on the first bullet point.
The takeaway from this talk, similar to Cathy Pearl’s, was that voice technology (namely, voice assistants and the IoT) can be implemented and utilized to offset the demand of the caregivers by empowering our older adults. One overlapping message from this talk and Cathy’s was that caregivers are largely burdened by menial tasks (turn on the light, close the blinds, change the TV channel), and the individuals who are being cared for are hyper-conscious of this. It gets exhausting for the caregiver as well as those receiving care, because they know how exhausting it is for the caregiver. Well, Siri/Alexa/Google do not get exhausted, they’re little AI bots, so who cares if you’re issuing hundreds of commands a day. That’s the beauty in this.
Following the talk, I spoke with Davis Park about their pilot and I asked him what the Front Porch residents are using their Alexa devices for. “It’s completely different based on the resident. For example, one woman said she loves it because she can now make the perfect hard-boiled egg,” Davis said. This was a total aha! moment for me, because sometimes we’re not appreciating the nuanced ways individuals are finding value in the oft cited use cases of voice assistants today (weather, timers, news, scores, etc) that sometimes get belittled. On the surface, sure, she’s finding value in being able to set a timer, but dig a little deeper and you’ll find in fact that the value is because she’s no longer overcooking her hard-boiled egg.
The slide pictured above from the session illustrates why I see so much potential for voice technology, specifically for older adults. It’s becoming increasingly apparent through numerous research studies that loneliness and social isolation are severely detrimental to us as individuals, as well as to the broader economy.
The industry that I come from, the world of hearing aids and hearing loss, understands these co-morbidities all too well, as hearing loss is often correlated to social isolation. If your hearing is so diminished that you can no longer engage in social situations, you’re more likely to become withdrawn and become social isolated/lonely.
This is ultimately why I think we’ll see voice assistants become integrated into this new generation of hearing aids. It kills two birds with one stone, as it augments one’s physical sound environment by providing amplification and the ability to hear more clearly, as well as serve as an access point to a digital assistant that can be used to communicate with one’s technology. One of the best solutions on the horizon for helping to circumvent the rising demand for caregivers might be “digital caregivers” in the form of Alexa/Google housed in hearing aids or other hearable devices.
-Thanks for Reading-
To listen to the broadcast on your Alexa device, enable the skill here
One of the best sessions that I attended at the Voice summit last week, was Cathy Pearl’s Democratization of Voice. The reason this talk resonated with me is that it really highlighted one of the strongest themes at this year’s conference and that is that one of the biggest cohorts to benefit from voice technology, even in its infancy and primitive state, is our disabled population. I live-tweeted this session if you’re interested in seeing my live takeaways:
Back at Google I/O’s developer conference in May this year, Google rolled out a handful of new accessibility programs, which I wrote about in a May daily update. Each of these new programs made an appearance in Cathy’s talk, which was largely centered around how voice technology can be leveraged by a wide range of people with disabilities. For example, we saw a video of a 19 year old with a rare type of muscular dystrophy that restricts his mobility greatly. He was able to use Google Assistant in conjunction with a number of connected IoT devices to outfit his bedroom and control every thing from the lights to the TV to the blinds, using his voice.
The ability to outfit one’s home like this is life-changing for both the user as well as the caregivers. That’s what makes me so excited about this technology. To me, it might seem trivial to be able to control my lights via my voice, but it’s a godsend for the folks who live with debilitating diseases and disabilities, not to mention the caregivers which the technology reduces the burden on.
Which is why I think it’s so cool that Google announced it will be giving away 100,000 free Google Minis through the Christopher Reeve foundation to celebrate the American with Disabilities Act’s 29th birthday. How cool is that? My main takeaway from Cathy’s talk is that one of the most obvious impacts voice technology has had over these past few years, is the ability to empower our disabled populations, restore one’s dignity, and reduce the burden on our caregivers. So when I see that Google is stepping up and pledging 100,000 devices to help amplify this movement, well, that’s good by me.
-Thanks for Reading-
To listen to the broadcast on your Alexa device, enable the skill here
Although the Voice Summit technically started on Monday, that day was considered “Day 0” and Tuesday marked Day One of the summit. The thing that makes this summit so impressive and valuable is that it’s packed full of speakers from start to finish. It led off with three back-to-back keynotes from three of the most influential people in the voice space.
I took to twitter yesterday to try and live-tweet each of the sessions as best as I could, so feel free to click into each embedded twitter thread for each session for my real-time thoughts.
First up, was Dave Isbitski, chief evangelist of Alexa. Dave talked a lot about the first 5 years of the Alexa journey, highlighting various phases along the way to where we are today with Alexa. We’ve moved to single-turn conversations to multi-turn, and as Dave detailed, the next phase is for multi-session, which means that Alexa will start to understand the user’s context and in time learn things about the user like preferences. This is all achieved through deep learning modules.
Dave also unveiled a new workflow tool called “Skill Flow Builder” that allows anyone, developer or non-developer, to easily input and adjust dialogue within skills. The use case that Dave highlighted for this was interactive story telling games. Just as I tweeted, this really harkens back to Brian Roemmele talking about the need for the technology to be simple enough to, “bring the creatives into the fold.” Skill Flow Builder does just that.
One of my favorite portions of Dave’s talk was around flash briefings and some creative ways that people are starting to use them, such as for internal corporate communications. Flash briefings continue to strike me as one of the most unique aspects to Alexa and something that we’re only just starting to scratch the surface with.
Next was Adam Cheyer who co-founded Siri and sold it to Apple, then moved onto Viv Labs, which was purchased by Samsung, where Adam now works. Adam heads up the Bixby division, and Bixby 2.0 is the first iteration of the voice assistant under Adam’s leadership. Obviously, when one of the founding fathers of voice assistants is presenting, you’re due for some interesting insight.
To round out the initial keynotes, we had the pleasure of Noelle LaCharite of Microsoft talking about Cortana’s makeover. I think Microsoft is smart to have pivoted Cortana away from competing with Google and Alexa as a “master assistant” and instead positioned Cortana as “the master of Microsoft.” As Noelle pointed out, Cortana is wonderful when it’s tasked to do things housed inside Microsoft’s properties, such as scheduling meetings with outlook. Additionally, I appreciate the focus Microsoft has around accessibility, which is clearly a motivation for Noelle personally.
After the first three keynotes, the breakout sessions began. The one downside about this conference is that there are about seven sessions going at once, and so it can be really hard to choose which session to attend. I decided to go see Mark C. Webster’s talk on “why conversational interfaces are designed to fail.”
This was one of the better talks I’ve heard in the voice space and the reason was that Mark really shot the room straight with the state of conversational interfaces. One of the key points that he made was that the metaphor construct we use for assistants as “people” might be leading to confusion and poor experiences among users. In previous computing interfaces, images allowed us to create metaphors (whether they be desktop icons or app icons) to communicate the intent of the icon. Voice on the other hand does not have really offer a similar construct.
The issue with creating the expectation that you can just, “speak to Alexa as you would a person,” is that it’s not really true. Alexa and Google Assistant exist today because the natural language processing engines that these assistants run on have advanced considerably in the past decade, allowing them to capture our speech with high accuracy. But, just because they can accurately capture what we’re saying does not mean that Alexa knows what to do with your input, and therefore leads to, “I’m sorry, I do not understand that.” That was the crux of Mark’s presentation – maybe we shouldn’t be setting the expectation that these are “conversational devices” quite yet.
The last session of the day that I saw was Brielle Nickoloff of Witlingo talking about the evolution of the Voice web. This was an awesome talk that included a really stellar demo of Buildlingo’s newest update. One of the key points from this talk was that as tools continue to surface (i.e. Buildlingo and Castlingo) that facilitate easier and faster audio content creation, the world of audio content creation begins to be democratized. Brielle did a great job drawing parallels of the voice web with the evolution of the internet in its various phases and how it progressively became easier and easier to share content on the web, to the point that anyone could quickly and easily share anything on sites like Twitter, Facebook and Youtube.
All-in-all, it was an awesome day. I learned a lot, met a ton of people, connected with old pals, and got a good understanding of where we are with voice technology in its various pockets. Onto day two!
-Thanks for Reading-
To listen to the broadcast on your Alexa device, enable the skill here
One week from now, at the VOICE Summit, I will be joining an all-star group of hearables experts on a panel titled, “How On-the-Go Hearables Expand Opportunities with Voice,” which will be moderated by Claire Mitchell, of VaynerMedia. After spending some time getting to know the fellow panelists, I can safely say that this is going to be an awesome discussion as each panelists brings a different type of expertise to the table.
Andrew Bellavia of Knowles Corp. knows more about the hardware side of hearables than anyone I know. Knowles Corp manufacturers the microphones and DSP chips that various Alexa devices are outfitted with. Along with his expertise in all things hearables-hardware, Andy wears top-of-the-line, Bluetooth hearing aids (Phonak Marvels) that he routinely communicates to Google Assistant with. He’s on the forefront of experimenting and experiencing the future of #VoiceFirst + hearables.
Rachel Batish of Audioburst will provide a unique perspective on how Audioburst is enabling a future where we can catalog, search and curate audio clips from various radio shows and podcasts. To put into simple terms what Audioburst is striving to be, it’s essentially Google, but for audio. Imagine a future where you’re driving in the car, and you use ask Audioburst to “read you updates on Netflix” and then get a curated feed of short podcast and radio clips where the broadcasters are specifically speaking about Netflix. That’s the type of perspective that Rachel will be able to provide.
Eric Seay of AudioUX co-founded a creative audio agency that aims to shape the world of audio and sonic branding. As we move into a world that is less reliant on screens and visual overlays, Eric believes that many of the visual cues that we’ve become familiar with will have audio counterparts. Logos and sonic logos. Social media likes and nano audio “likes.” Eric will surely offer interesting ways that we need to start thinking about the burgeoning world that is audio + voice-only.
Finally, Andreea Danielescu of Accenture Labs and startup, Antilipsi, works as a founder, engineer, researcher and architect around the full software stack. She’s done extensive work around experience design and gesture interaction with all different types of emerging technologies, including voice assistants. She’ll bring to the table real world experience with early implementations of hearable technologies that incorporate features like voice assistant access, sharing what limitations exist and how we can work to overcome them.
It’s sure to be an awesome discussion next Thursday. If you’re attending the Voice Summit, I hope that you’re able to join us, and if not, I’ll do my best to recap some of the key takeaways from our panel on all things hearables + voice.
-Thanks for Reading-
To listen to the broadcast on your Alexa device, enable the skill here
Last week, I made my annual trek to Chatanooga, Tennessee to gather with a wide variety of Voice technology enthusiasts at the Alexa Conference. Along with the seismic growth of smart speakers and voice assistant adoption, the attendees grew quite dramatically too, as we went from roughly 200 people last year to more than 600 people this year. We outgrew last year’s venue, the very endearing Chattanooga Public Library, and moved to the city’s Marriott convention center. The conference’s growth was accompanied with an exhibit hall and sponsorships from entities as large as Amazon itself. We even had a startup competition between five startups, where my guest, Larry Guterman, won the competition with his amazing Sonic Cloud technology.
In other words, this year felt indicative that the Alexa Conference took a huge step forward. Cheers to Bradley Metrock and his team for literally building this conference from scratch into what it has become today and for bringing the community together. That’s what makes this conference so cool; it has a very communal feel to it. My favorite part is just getting to know all the different attendees and understand what everyone is working on.
This Year’s Theme
Bret Kinsella, the editor of the de-facto news source for all things Voice, Voicebot.ai, presented the idea that we’ve moved into phase two of the technology. Phase one of Voice was all about introducing the technology to the masses and then increasing adoption and overall access to the technology. You could argue that this phase started in 2011 when Siri was introduced, but the bulk of the progress of phase one was post-2014 when Amazon rolled out the first Echo and introduced Alexa.
Since then, we’ve seen Google enter into the arena in a very considerable way that has culminated into the recent announcement that it would have one billion devices with Google Assistant enabled. We also saw smart speaker sales soar to ultimately represent the fastest adoption of any consumer technology product ever. If the name of the game for phase one was introducing the technology and growing the user base, then I’d say mission accomplished. On to the next phase of Voice.
According to Bret, phase two is about a wider variety of access (new devices), new segments that smart assistants are moving into, and increasing the frequency in which people use the technology. This next phase will revolve around habituation and specialization.
In a lot of different ways, the car is the embodiment of phase two. The car already represents the second most highly accessed type of device behind only the smartphone, but offers a massive pool of untapped access points through integrations and newer model cars with smart assistants built into the car’s console. It’s a perfect environment for using a voice interface as we need to be hands and eyes-free while driving. Finally, from a habituation standpoint, the car, similar to smart speakers, will serve the same role of “training wheels” for people to get used to the technology as they build the habit.
There were a number of panelists in the breakout sessions and general attendees that helped open my eyes to some of the unique ways that education, healthcare, business, and hospitality (among other areas) are all going to yield interesting integrations and contributions during this second phase. All of these segments offer new areas for specialization and opportunities for people to increasingly build the habit and get comfortable using smart assistants.
The Communal Phase Two
Metaphorically speaking, this year’s show felt like a transition from phase one to phase two too. As I already mentioned, the conference itself grew up, but so have all of the companies and concepts that were first emerging last year. Last year, we saw the first Alexa-driven, interactive content companies like Select a Story and Tellables starting to surface, which helped shine a light on what the future of story-telling might look like in this new medium.
This year we had the founder of Atari, Nolan Bushnell, delivering a keynote talk on the projects he and his colleague, Zai Ortiz, are building at their company, X2 Games. One of the main projects, St. Noire, is an interactive, murder-mystery board game that fuses Netflix-quality video content for your character (through an app on a TV) with an interactive element for the players having to decide certain decisions (issued through a smart speaker). The players’ decisions are what will ultimately impact the trajectory of the game and allow for the players to progress far enough to solve the mystery. It was a phenomenal demo of a product that certainly made me think, “wow, this interactive story-telling concept sure is maturing fast.”
Witlingo now has a serious product on its hands with Castlingo (micro-Alexa content generated by the user). I feel like while podcasts represent long-form content akin to blogging, there seems to be a gap to fill for more micro-form audio content creation more akin to tweeting. I’m not sure if this gap will ultimately be filled by something like Castlingo or Flash Briefings, but it would be awesome if a company like Witlingo emerged as the Twitter for audio.
Companies like Soundhound continue to give me hope that white-label assistant offerings will thrive in the future, especially as brands will want to extend their brands to their assistants, and not have something bland and generic. Katie McMahon‘s demos of Hound never cease to amaze me either, and it’s newest feature, Query Glue, displays the furthest level of conversational AI that I’ve seen to date.
Magic + Co’s presence at the show indicated that digital agencies are beginning to take Voice very seriously and will be at the forefront of the creative ways brands and retailers integrate and use smart assistants and VUI. We also had folks from Vayner Media at this year’s conference which was just another example that some of the most cutting-edge agencies are thinking deeply about Voice.
Finally, there seemed to be transition to a higher phase on an individual level too. Brian Roemmele, the man who coined the term #VoiceFirst, continues to peel back the curtain on what he believes the long-term future of Voice looks like (check out his podcast interview with Bret Kinsella). Teri Fisher seemed to be on just about every panel and was teaching everyone how to produce different types of audio content. For example, he provided a workshop on how to create a Flash Briefing, which makes me believe we’ll see a lot of people from the show begin making their own audio content (myself included!).
From a personal standpoint, I guess I’ve entered into my own phase two as well. Last year I attended the conference on a hunch that this technology would eventually impact my company and the industry I work in, and after realizing my hunch was right, I decided that I needed to start contributing in the area of expertise that I know best: hearables.
This year, I was really fortunate to have the opportunity to present on the research I’ve been compiling and writing about around why I believe hearables play a critical role in a VoiceFirst future. I went from sitting in a chair, watching and admiring people like Brian, Bret and Katie McMahon share their expertise last year, to being able to share some of my own knowledge this year to those same people, which was one of the coolest moments in my professional career. (Stay tuned, as I will be releasing my 45-minute talk into a series of blog posts where I break down each aspect of my presentation.)
For those of you reading this piece who haven’t been able to make this show but feel like this conference might be valuable but aren’t sure how, my advice to you is to just go. You’ll be amazed at how inclusive and communal the vibe is and I bet you’ll even walk away from it thinking differently about you and your business’ role as we enter into the 2020’s. If you do decide to go, be sure to reach out as I will certainly be in attendance next year and the years beyond.
Last week, I was fortunate to travel to Boston to attend the Voice of Healthcare Summit at Harvard Medical School. My motivation for attending this conference was to better understand how smart assistants are currently being implemented into the various segments of our healthcare system and to learn what’s on the horizon in the coming years. If you’ve been following my blog or twitter feed, then you’ll know that I am envisioning a near-term future where smart assistants become integrated into our in-the-ear devices (both hearables and bluetooth hearing aids). Once that integration becomes commonplace, I imagine that we’ll see a number of really interesting and unique health-specific use cases that leverage the combination of the smartphone, sensors embedded on the in-the-ear device, and smart assistants.
Bradley Metrock, Matt Cybulsky and the rest of the summit team that put on this event truly knocked it out of the park, as the speaker set and the attendees included a wide array of different backgrounds and perspectives, which resulted in some very interesting talks and discussions. Based on what I gathered from the summit, smart assistants will yield different types of value to three groups: patients, remote caregivers, and clinicians and their staff.
At this point in time, none of our mainstream smart assistants are HIPAA-compliant, limiting the types of skills and actions that be developed specific to healthcare. Companies like Orbita are working around this limitation by essentially taking the same building blocks required to create of voice skills and then building secure voice skills from scratch in its platform. Developers who want to create skills/actions for Alexa or Google that use HIPAA data, however, will have to wait until the smart assistant platforms have become HIPAA-compliant, which could happen this year or next.
It’s easy to imagine the upside that will come with HIPAA-compliant assistants, as that would allow for the smart assistant to retrieve one’s medical data. If I had a chronic condition that required me to take five separate medications, Alexa could audibly remind me to take each of the five, by name, and respond to any questions I might have regarding any of the five medications. If I am telling Alexa of a side effect I’m having, Alexa might even be able to identify which of the five medications are possibly causing that side-effect and loop in my physician for her input. As Brian Roemmele has pointed out repeatedly, the future ahead for our smart assistants is routed through each of our own personalized, contextual information, and until these assistants are HIPAA-compliant, the assistant has to operate at a more general level than a personalized one.
That’s not to say there isn’t value in generalized skills or skills that don’t use data that falls under the HIPAA umbrella and therefore can be personalized. Devin Nadar from Boston Children’s Hospital walked us through their KidsMD skill, which ultimately allows for parents to ask general questions about their children’s illness, recovery, symptoms, etc and then have the peace of mind that the answers they’re receiving are being sourced and vetted by Boston Children’s Hospital; it’s not just random responses being retrieved from the internet. Cigna’s Rowena Track showed how their skill allows for you to check things such as your HSA-balance or urgent care wait times.
Care Givers and “Care Assistants”
By 2029, 18% of America will be above the age of 65 years old and the average US life expectancy rate is already climbing above 80. That number will likely continue to climb which brings us to the question, “how are we going to take care of our aging population?” As Laurie Orlov, industry analyst and writer of the popular Aging In Place blog, so eloquently stated during her talk, “The beneficiaries of smart assistants will be disabled and elderly people…and everyone else.” So, based on that sentiment and the fact that the demand to support our aging population is rising, enter into the equation what John Loughnane of CCA described as, “care assistants.”
As Laurie’s slide above illustrates, smart assistants or “care assistants” in this scenario, help to triangulate the relationship between the doctor, the patient and those who are taking care of the patient, whether that be care givers or family. These “care assistants” can effectively be programmed with helpful responses around medication cadence, what the patient can or can’t do and for how long they’re restricted, what they can eat, when to change bandages and how to do so. In essence, the “care assistant” serves as an extension to the care giver and the trust they provide, allowing for more self-sufficiency and therefore, less of a burden on the care giver.
As I have written about before, the beauty of smart assistants is that even today in their infancy and primitive state, smart assistants can empower disabled and elderly people in ways that no previous interface has before. This matters from a fiscal standpoint too, as Nate Treloar, President of Orbita, pointed out that social isolation costs Medicare $6.7 billion per year. Smart assistants act as a tether to our collective social fabric for these groups and multiple doctors at the summit cited disabled or elderly patients who described their experience of using a smart assistant as “life changing.” What might seem trivial to you or I, like being able to send a message with your voice, might be truly groundbreaking to someone who has never had that type of control.
The Clinician and the System
The last group that stands to gain from this integration would be the doctor and those working in the healthcare system. According to the annals of Internal Medicine, for every hour that a physician spends with a patient, they must spend two hours on related administration work. That’s terribly inefficient and something that I’m sure drives physicians insane. The drudgery of clerical work seems to be ripe for smart assistants to provide efficiencies. Dictating notes, being able to quickly retrieve past medical information, share said medical information across systems, etc. Less time doing clerical work and more time helping people.
Boston Children’s Hospital uses an internal system called ALICE and by layering voice onto this system, admins, nurses and other staff can very quickly retrieve vital information such as:
“Who is the respiratory therapist for bed 5?”
“Which beds are free on the unit?”
“What’s the phone number of the MSICU Pharmacist?”
“Who is the Neuro-surgery attending?”
And boom, you quickly get the answer to any of these. That’s removing friction in a setting where time might really be of the essence. As Dr. Teri Fisher, host of the VoiceFirst Health podcast, pointed out during his presentation, our smart assistants can be used to reduce the strain on the overall system by playing the role of triage nurse, admin assistant, healthcare guide and so on.
What Lies Ahead
It’s always important with smart assistants and Voice to simultaneously temper current expectations while remaining optimistic about the future. Jeff Bezos joked in 2016 that, “not only are we in the first inning of this technology, we might even be at the first batter.” It’s early, but as Bret Kinsela of VoiceBot displayed during his talk, smart speakers represent the fastest adoption of any consumer technology product ever:
The same goes for how smart assistants are being integrated into our healthcare system. Much like Bezos’ joke, very little of this is even HIPAA-compliant yet. With that being said, you still have companies and hospitals the size Cigna and Boston Children’s Hospital putting forth resources to start building out their offerings in an impending VoiceFirst world. We might not be able to offer true, personalized engagement with the assistant yet, but there’s still lots of value that can be derived at the general level.
As this space matures, so too will the level of which we can unlock efficiencies within our healthcare system across the board. Patients of all ages and medical conditions will be more empowered to receive information, prompts and reminders to better manage their conditions. This means that those taking care of the patients are less burdened too, as they can offload the information aspect of their care giving to the “care assistant.” This then frees up the system as a whole, as there are less general inquiries (and down the line, personal inquiries), meaning less patients who need to come in and can be served at home. Finally, the clinicians can be more efficient too, as they can offload clerical work to the assistant and better retrieve data and information on a patient-to-patient basis, and also more efficiently communicate with their patient, even remotely.
As smart assistants become more integral to our healthcare system, my belief is that on-body access to the assistant will be desired. Patients, caregivers, clinicians and medical staff all have their own reasons for wanting their assistant right there with them at all times. What better a place than a discreet, in-the-ear device that allows for one-to-one communication with the assistant?
Last Thursday, I was fortunate for the opportunity to travel to Chattanooga, TN to attend the second annual Alexa Conference and join a group of some of the smartest people working on Voice technology. The cool thing about the Alexa Conference is that it’s not sponsored by Amazon (or Google or any other major tech company), it’s fully-independent, sponsored by third parties, and therefore it truly feels objective and unbiased. The attendees and speakers ranged from third party “skill” agencies, skill developers (domestic and international), certified Alexa champions, skill analytic and diagnostic providers, a representative from the FTC, insurance and healthcare reps, to futurists, Internet of Things specialists, digital transformation experts, behavioral economists, doctors, PhD scientists, former NASA employees, and a random dude from the Audiology industry who writes a blog called FuturEar.
I have been following the #VoiceFirst movement, which includes progress in the Voice User Interface (VoiceUI), the devices that house our smart assistants (smart speakers, smartphones and wearable technology), devices that work in conjunction and respond to smart assistants (Internet of Things), and our smart assistants as a whole for the past few years. I think I may have learned more in the 48 hours that I attended this conference than I have in the thousands of hours leading up to it. Ok, that’s probably some hyperbole there, but there was a ton of insight and these were my favorite takeaways from the show:
Context is King
One of the big questions that I had heading into Chattanooga was, “how do we take this all to the next level?” I now have the answer and it all derives from context. Deep, personalized contextual awareness. What does that mean? Well, for starters, let’s establish that smart assistants feed and grow stronger on “personal data.” The only way that these assistants ever get any more useful or “smarter” is by learning more about us.
A really good way to think about this is through the lens of ordering your favorite pizza. My favorite pizza (shoutout Imo’s) is probably pretty different than your favorite pizza. The web, as we know it, is built on a pay-per-click model, so when I search on Google for pizza options around me, the results are going to show advertised options at the top. These are not in any way personalized to me and therein lies the big difference. When I’ve ordered pizza 20 times through my smart assistant, 15 of which have been Imo’s, and then I’m in Chattanooga for work (where Imo’s does not exist) and I want to order a pizza, my smart assistant will provide me results similar to Imo’s in Chattanooga. The smart assistant knows my preferences and therefore will actively distill the options for me to cater to my personal preferences.
Taking it one step further, think about all the other personal information that you probably share or are having shared with you that can broaden the assistant’s contextual awareness. If your friends have been to Chattanooga and they raved about a pizza spot on Instagram months or years ago, your smart assistant could retrieve that and factor that into your results. So now it’s not just based on your own pizza preferences, but also factoring in other variables such as your friends’ experiences and preferences.
This begins to bring privacy and security front and center. One of the really interesting presentations was from the German guys at 169Labs. While the attitude in the US around privacy is pretty lax and apathetic, it’s important to understand that our attitude here in the States is quite different than how many Europeans feel. They take their privacy way more seriously and it’s a top of mind issue that permeates in any tech discussion. Privacy will continue to be a topic of discussion as our smart assistants evolve and we become increasingly more aware of just how much data we are sharing. I believe the pros outweigh the cons when it comes to sharing your personal data with your smart assistant(s), but the key is going to be feeling safe that it is all encrypted and protected from being hacked.
The beginnings of Conversational Interfaces
One of the more frustrating aspects of smart speakers and smart assistants is the lack of continuity. Currently, our smart assistants function in a way that is more or less executing single commands or single questions. There isn’t really any dialogue, it’s typically, “Alexa shuffle my playlist” or “Alexa set a timer for 15 minutes” or “Alexa what’s the weather?” or “Alexa how many ounces are in a gallon?” Asking a question or issuing a command and having the device turn off afterward is not the goal for our smart assistants. Brian Roemmele compared this level of sophistication to the command line in the PC era. It’s super primitive and we’re in the first inning of a double-header with this technology.
Instead, what we need is, again, contextual awareness in order to have a dialogue. Katie McMahon of Soundhound did an awesome job demoing Soundhound’s own smart assistant, Hound, with some real contextual awareness:
So she starts off by saying, “Show me Asian restaurants, excluding Japanese and Chinese, that are open right now.” An accomplishment in itself that Hound so quickly answered accurately. Then she goes on to further refine the search, “Ok Hound, show those with outdoor seating.” The key word there is, “those,” as the assistant is now aware of the context because it recognizes that “those” is a demonstrative pronoun representing the Asian restaurants from the previous query. This is HUGE! In a dialogue with another person, you’re constantly using pronouns and language that references context from earlier in the conversation or conversations prior. It’s an essential part to how we communicate and we’re already seeing with smart assistants like Hound demonstrate that these assistants are more than capable of this type of complex contextual awareness. Without this ability, I doubt smart assistants will ever be taken that seriously.
Next, she goes one step further. Katie says, “I need an Uber to the first one.” So not only does the assistant recognize that “the first one” is in reference to the first result from the previous search, but its capable of using multiple “domains” or skills in conjunction. This is a significant step forward on something that we’re pretty limited with today. How many people would have been able to accomplish that all in one app on their phone? Most likely, you’d use multiple apps like Yelp, grab the address of where you want to go and pop it in Uber. Furthermore, if your assistant is factoring in more data for a more personalized result list, by retrieving your friends’ Instagram and/or Swarm data, then we’re comparing that to you going into each app and hunting for the relevant posts. This is clearly an improvement in time and efficiency.
It’s honestly amazing what’s going on over at Hound and the ability its assistant has with retaining information throughout the dialogue. It was the first time I really saw this level of continuity and contextual awareness in a smart assistant and it made me very optimistic about the potential of smart assistants when they’re able to have this type of dialogue. When I wrote about the key to the adoption of a new user interface being the reduction of friction, this is ultimately what I was referring to. Even a primitive conversational interface would still dramatically reduce the time that we currently spend app toggling, tapping and searching on our phones for many things. We’re moving far, far beyond just using assistants to set timers.
(Here’s a link to another video of Katie asking Hound for hypothetical monthly mortgage payments based on home values, interest rates and down payment percentages. I was blown away by these demos and have been using Hound since I’ve gotten back. It’s encouraging to see that all of this innovation is not limited to just Amazon, Google, Facebook and Apple.)
Proactive & Personalized
Another huge advantage of feeding your smart assistant personal data is that it can begin to proactively engage you on what it’s finding. This is where the Internet of Things (IoT) begins to get really interesting. If you have a smart fridge and your smart assistant is accessing that data, it then knows stuff like your fruit being spoiled. Couple that with the possibility that you’ve given your assistant access to all of your calendar data, so it knows that you have a dinner party that night and you’ve, in some way or another, logged that you need the fruit for your recipe for said dinner party. So, what we’re moving toward is a scenario where the smart assistant in your car or hearable pings you and says something along the lines of, “Hey Dave, it looks like your blackberries have gone bad and you need them for your dessert recipe for Julie’s dinner party tonight. It looks like they’re currently on sale at Mariano’s down the street, would you like to go?” and then navigate you down to grocery store.
This was a big aha! moment for me. So much of #VoiceFirst is happening in disparate areas that it’s hard to bring all of it together into one specific use case like I just mentioned above. When they’re silo’d off on their own, you hear, “smart fridge” and you think, “Really? What’s the point of that?” But when you start looking at all of these IoT devices as data entry points for your smart assistant, which your assistant can then actively retrieve, assess and then provide actionable insight (all very quickly), it all then becomes a whole lot more compelling. This is, “small data” as opposed to big data. It’s personal to each of us and therefore invaluable to each of us. This opens the door to serendipity and assistants proactively providing you with suggestions and reminders from data that you likely aren’t even aware of.
Some other Takeaways
Brian Roemmele harped a lot on the idea of “enabling creatives to enter into the fold.” He used the analogy of Steve Jobs empowering the graphic designers with the iPhone, as the iPhone created a massive abundance of opportunity for that profession. The same will be done with voice for many more creative types that include comedians, poets, psychologists, storytellers, artists, historians, writers, etc. Therefore we need to a set of tools that are easy enough for anyone to use and create with.
VoiceXP demonstrated a number of unique skills specifically for the Echo Show. I also appreciated that Bob Stolzberg really emphasized the fact that the Echo Show is version one of Amazon’s multi-modal strategy. We’re quite literally scratching the surface here with what’s possible when you add in screens and mixed-modality into the #VoiceFirst equation. Some really exciting opportunities around this.
Keynote speaker Ahmed Bouzid presented a plethora of fascinating facts and charts, but the one that stood out to me were two of the demographics that can benefit the most from a #VoiceFirst world: seniors and physically incapacitated. This is at the heart of why I’m so passionate about spreading awareness to the #audpeeps, audiology and hearing aid industry about #VoiceFirst. Smart assistant integration is coming to hearing aids and this new use case for hearing aids, hearables and all our ear-computers, stands to really benefit those who struggle in a mobile world. Mobile computing is not conducive to these two demographics and I’m ecstatic about the possibility that these demographics will soon be empowered in a way they never have before. It’s an awesome value-add that you can advocate for free and it will dramatically improve the patient experience over time.
This was an amazing two days and I’ll definitely be back next year to continue to gather everything I can about what’s happening in the #VoiceFirst world and how they’ll ultimately impact those of you who work with the little computers that go in the ear.