This Thursday, I was fortunate to be invited by Bradley Metrock, the host of the podcast, “This Week in Voice,” to sit down with him and discuss the top stories of the week that pertain to Voice technology. I was joined by fellow panelist, Sarah Storm, who is the Head of the cloud studio, SpokenLayer, and the three of us went back and forth around what’s new in the VoiceFirst world.
The great thing about this podcast is that Bradley brings on a wide variety of people with different backgrounds on this show, so that each week you get a different perspective into the stories of the week. This week, we talked about the following five stories:
New York Times: Why We May Soon Be Living in Alexa’s World
This story serves as a revelation of sorts, as it’s the realization that Alexa, and the other smart assistants, are not just merely new gadgets, but represent a shift in how we communicate with computers as a whole.
VoiceBot.ai: Spotify Working on New Smart Speaker?
The fact that Spotify posted two separate job openings for senior positions around a new hardware division turned a lot of heads. This is particularly interesting given the impending IPO, as Spotify might be looking to make some pretty dramatic moves prior to going public. Would Spotify be better off vertically integrating itself via partnerships/acquisitions, or is it possible for them to create a hardware division from scratch?
Forbes: Meet the Voice Marketer
Voice represents an entirely new opportunity for brands to market themselves, but the question is how best do you use this new medium? With more personal data than ever at many of these brands’ disposal, it will be a challenge to balance the “creepy” with the truly proactive and engaging.
The Voice of Healthcare Summit
Held at the Martin Conference Center at Harvard Medical School in Boston this August, this summit promises to be one of the best opportunities to gather with fellow Voice enthusiasts and healthcare professionals, to collaborate and learn about applying Voice to healthcare. This will be an awesome event and I encourage anyone to go who thinks this might be up their alley!
This was a great experience getting to sit in on this podcast and chat with Bradley and Sarah. I hope you enjoy this episode and cheers to more in the future!
Monday night, Twitter proved yet again that despite all its shortcomings, it is still the king of where some of the best discussions and debates go down for all to see. Normally, I wouldn’t base a blog post around a twitter debate, but this specific thread was a culmination of a lot of content and discussion over the past few weeks around smart assistants, the smart home and the overall effort to understand how Voice will evolve and mature. Given who was debating and the way it dovetails so nicely from the Alexa Conference, I thought it worthy of a blog post.
Before I jump into the thread, I want to provide some context here around some of the precursors to this discussion. This really all stems from the past few CES, but mainly the most recent show. To start, here’s a really good A16z podcast by two of the prominent people in the thread, Benedict Evans and Steven Sinofsky, talking about the smart home coming out of this year’s CES and the broader implications of Voice as a platform:
As they both summarize, one of the main takeaways from CES this year was that seemingly every product is in some way tied to Voice (I wrote about this as the, “Alexification of Everything“). The question isn’t really whether we’re going to keep converting our rudimentary, “dumb” devices into internet-connected “smart” devices, but rather, “what does that look like from a user standpoint?” There are a LOT of questions that begin to emerge when you start poking into the idea of Voice as a true computing platform. For example, does this all flow through one, central interface (assistant) or multiple?
Benedict Evans followed up on the podcast by writing this piece, further refining the ideas on the above podcast, and tweeted out the article in the tweet below. He does a really good job of distilling down a lot of the high-level questions, using history as a reference, to contest the validity of Voice as a platform. He makes a lot of compelling points, which is what led to this fascinating thread of discussion.
To help understand who’s who in this thread, the people I want to point out are as follows: Benedict Evans (a16z), Steven Sinofsky (board partner @ a16z), Brian Roemmele (Voice expert for 30+ years), and Dag Kittlaus (co-founder of Siri and ViV). Needless to say, it’s pretty damn cool to be chilling at home on a Monday night in St. Louis and casually observe some of the smartest minds in this space debate the future of this technology out in the open. What a time to be alive. (I love you Twitter you dysfunctional, beautiful beast.)
So it starts off with this exchange between Dag and Benedict:
As Dag points out, once the 3rd party ecosystem really starts to open up, we’ll start seeing a Cambrian explosion of what our smart assistants can do via network effects. Benedict, however, is alluding to the same concern that many others have brought up – people can only remember so many skills, or “invocations.” It’s not sustainable to assume that we can create 1 million skills and that users will be able to remember every single one. This guy’s response to Brian encapsulates the concern perfectly:
So what’s the way forward? Again, this all goes back to the big takeaway at the Alexa Conference, something that Brian was hammering home. It’s all about the smart assistant having deeply personalized, contextual awareness of the user:
“The correct answer or solution is the one that is correct to you.” This is the whole key to understanding what appears to be the only way we move forward in a meaningful way with #VoiceFirst. It’s all about this idea of the smart assistant using the contextual information that you provide to better serve you. We don’t need a ton of general data, each person just needs their smart assistant to be familiar with their own personal “small data.” Here Dag expands on this in his exchange with Steven:
So when we’re talking about, “deeply personalized, contextual awareness” what we’re really saying is whether the smart assistant can intelligently access and aggregate all of your disparate data together and understand the context in which you’re referring to said data. For example, incorporating your geo-location to give context to your “where” so that when you say, “book me on the first flight back home tomorrow,” your smart assistant will understand where you are currently by using your geo-location data, and where “home” is for you based on a whole different set of geo-location data that you’ve identified to your assistant as home. Factor in more elements like all of the data you save to your airline profiles, and the assistant will make sure you’re booked with all your preferences and your TSA-precheck number included. Therefore, you’re not sitting there telling the assistant to do each aspect of the total task, you’re having it accomplish the total task in one fell swoop. That is a massive reduction in friction when you subtract all the time you spend doing these types of tasks manually each day.
I don’t think we’re really talking about general AI that’s on par with Hal-9000. That’s something way more advanced and something that’s probably much further out. In order for this type of personalized, contextual awareness to be enabled, the smart assistant would really just need to be able to quickly access all of the data you have stored in disparate areas of your apps together. Therefore, APIs become essential. In the example described above, your assistant would need to be able to API into all of your apps (i.e. Southwest app where your profile is stored and Google’s app where you have indicated your “home” location) or the 3rd-party skill ecosystem whenever a request is made. Using what’s already at its disposal via API integrations from apps, in conjunction with retrieving information or functions built in skills. Therefore, the skill ecosystem is paramount to the success of the smart assistant as they serve as entirely new functions that the assistant can perform.
It’s really, really early with this technology so it’s important to temper expectations a bit. We’re not at this point of “deeply personalized, contextual awareness” just quite yet, but we’re getting closer. As a random observer of this #VoiceFirst movement, it’s pretty awesome to have your mind blown by Brian Roemmele at the Alexa Conference talking about this path forward, and then even more awesome to have the guy who co-founded Siri & ViV completely validate everything Brian said a few weeks later on Twitter. I think that as Benedict and Steven pointed out, the current path we’re on is not sustainable, but based on the learnings from Brian and Dag, it’s exciting to know that there is an alternate path ahead to keep progressing our smart assistants forward and bring this vision to life that is much more rich and intuitive for the user.
Eventually, many of us will prefer to have our smart assistants handy all the time, and what better a spot than in our little ear computers?
Last Thursday, I was fortunate for the opportunity to travel to Chattanooga, TN to attend the second annual Alexa Conference and join a group of some of the smartest people working on Voice technology. The cool thing about the Alexa Conference is that it’s not sponsored by Amazon (or Google or any other major tech company), it’s fully-independent, sponsored by third parties, and therefore it truly feels objective and unbiased. The attendees and speakers ranged from third party “skill” agencies, skill developers (domestic and international), certified Alexa champions, skill analytic and diagnostic providers, a representative from the FTC, insurance and healthcare reps, to futurists, Internet of Things specialists, digital transformation experts, behavioral economists, doctors, PhD scientists, former NASA employees, and a random dude from the Audiology industry who writes a blog called FuturEar.
I have been following the #VoiceFirst movement, which includes progress in the Voice User Interface (VoiceUI), the devices that house our smart assistants (smart speakers, smartphones and wearable technology), devices that work in conjunction and respond to smart assistants (Internet of Things), and our smart assistants as a whole for the past few years. I think I may have learned more in the 48 hours that I attended this conference than I have in the thousands of hours leading up to it. Ok, that’s probably some hyperbole there, but there was a ton of insight and these were my favorite takeaways from the show:
Context is King
One of the big questions that I had heading into Chattanooga was, “how do we take this all to the next level?” I now have the answer and it all derives from context. Deep, personalized contextual awareness. What does that mean? Well, for starters, let’s establish that smart assistants feed and grow stronger on “personal data.” The only way that these assistants ever get any more useful or “smarter” is by learning more about us.
A really good way to think about this is through the lens of ordering your favorite pizza. My favorite pizza (shoutout Imo’s) is probably pretty different than your favorite pizza. The web, as we know it, is built on a pay-per-click model, so when I search on Google for pizza options around me, the results are going to show advertised options at the top. These are not in any way personalized to me and therein lies the big difference. When I’ve ordered pizza 20 times through my smart assistant, 15 of which have been Imo’s, and then I’m in Chattanooga for work (where Imo’s does not exist) and I want to order a pizza, my smart assistant will provide me results similar to Imo’s in Chattanooga. The smart assistant knows my preferences and therefore will actively distill the options for me to cater to my personal preferences.
Taking it one step further, think about all the other personal information that you probably share or are having shared with you that can broaden the assistant’s contextual awareness. If your friends have been to Chattanooga and they raved about a pizza spot on Instagram months or years ago, your smart assistant could retrieve that and factor that into your results. So now it’s not just based on your own pizza preferences, but also factoring in other variables such as your friends’ experiences and preferences.
This begins to bring privacy and security front and center. One of the really interesting presentations was from the German guys at 169Labs. While the attitude in the US around privacy is pretty lax and apathetic, it’s important to understand that our attitude here in the States is quite different than how many Europeans feel. They take their privacy way more seriously and it’s a top of mind issue that permeates in any tech discussion. Privacy will continue to be a topic of discussion as our smart assistants evolve and we become increasingly more aware of just how much data we are sharing. I believe the pros outweigh the cons when it comes to sharing your personal data with your smart assistant(s), but the key is going to be feeling safe that it is all encrypted and protected from being hacked.
The beginnings of Conversational Interfaces
One of the more frustrating aspects of smart speakers and smart assistants is the lack of continuity. Currently, our smart assistants function in a way that is more or less executing single commands or single questions. There isn’t really any dialogue, it’s typically, “Alexa shuffle my playlist” or “Alexa set a timer for 15 minutes” or “Alexa what’s the weather?” or “Alexa how many ounces are in a gallon?” Asking a question or issuing a command and having the device turn off afterward is not the goal for our smart assistants. Brian Roemmele compared this level of sophistication to the command line in the PC era. It’s super primitive and we’re in the first inning of a double-header with this technology.
Instead, what we need is, again, contextual awareness in order to have a dialogue. Katie McMahon of Soundhound did an awesome job demoing Soundhound’s own smart assistant, Hound, with some real contextual awareness:
So she starts off by saying, “Show me Asian restaurants, excluding Japanese and Chinese, that are open right now.” An accomplishment in itself that Hound so quickly answered accurately. Then she goes on to further refine the search, “Ok Hound, show those with outdoor seating.” The key word there is, “those,” as the assistant is now aware of the context because it recognizes that “those” is a demonstrative pronoun representing the Asian restaurants from the previous query. This is HUGE! In a dialogue with another person, you’re constantly using pronouns and language that references context from earlier in the conversation or conversations prior. It’s an essential part to how we communicate and we’re already seeing with smart assistants like Hound demonstrate that these assistants are more than capable of this type of complex contextual awareness. Without this ability, I doubt smart assistants will ever be taken that seriously.
Next, she goes one step further. Katie says, “I need an Uber to the first one.” So not only does the assistant recognize that “the first one” is in reference to the first result from the previous search, but its capable of using multiple “domains” or skills in conjunction. This is a significant step forward on something that we’re pretty limited with today. How many people would have been able to accomplish that all in one app on their phone? Most likely, you’d use multiple apps like Yelp, grab the address of where you want to go and pop it in Uber. Furthermore, if your assistant is factoring in more data for a more personalized result list, by retrieving your friends’ Instagram and/or Swarm data, then we’re comparing that to you going into each app and hunting for the relevant posts. This is clearly an improvement in time and efficiency.
It’s honestly amazing what’s going on over at Hound and the ability its assistant has with retaining information throughout the dialogue. It was the first time I really saw this level of continuity and contextual awareness in a smart assistant and it made me very optimistic about the potential of smart assistants when they’re able to have this type of dialogue. When I wrote about the key to the adoption of a new user interface being the reduction of friction, this is ultimately what I was referring to. Even a primitive conversational interface would still dramatically reduce the time that we currently spend app toggling, tapping and searching on our phones for many things. We’re moving far, far beyond just using assistants to set timers.
(Here’s a link to another video of Katie asking Hound for hypothetical monthly mortgage payments based on home values, interest rates and down payment percentages. I was blown away by these demos and have been using Hound since I’ve gotten back. It’s encouraging to see that all of this innovation is not limited to just Amazon, Google, Facebook and Apple.)
Proactive & Personalized
Another huge advantage of feeding your smart assistant personal data is that it can begin to proactively engage you on what it’s finding. This is where the Internet of Things (IoT) begins to get really interesting. If you have a smart fridge and your smart assistant is accessing that data, it then knows stuff like your fruit being spoiled. Couple that with the possibility that you’ve given your assistant access to all of your calendar data, so it knows that you have a dinner party that night and you’ve, in some way or another, logged that you need the fruit for your recipe for said dinner party. So, what we’re moving toward is a scenario where the smart assistant in your car or hearable pings you and says something along the lines of, “Hey Dave, it looks like your blackberries have gone bad and you need them for your dessert recipe for Julie’s dinner party tonight. It looks like they’re currently on sale at Mariano’s down the street, would you like to go?” and then navigate you down to grocery store.
This was a big aha! moment for me. So much of #VoiceFirst is happening in disparate areas that it’s hard to bring all of it together into one specific use case like I just mentioned above. When they’re silo’d off on their own, you hear, “smart fridge” and you think, “Really? What’s the point of that?” But when you start looking at all of these IoT devices as data entry points for your smart assistant, which your assistant can then actively retrieve, assess and then provide actionable insight (all very quickly), it all then becomes a whole lot more compelling. This is, “small data” as opposed to big data. It’s personal to each of us and therefore invaluable to each of us. This opens the door to serendipity and assistants proactively providing you with suggestions and reminders from data that you likely aren’t even aware of.
Some other Takeaways
Brian Roemmele harped a lot on the idea of “enabling creatives to enter into the fold.” He used the analogy of Steve Jobs empowering the graphic designers with the iPhone, as the iPhone created a massive abundance of opportunity for that profession. The same will be done with voice for many more creative types that include comedians, poets, psychologists, storytellers, artists, historians, writers, etc. Therefore we need to a set of tools that are easy enough for anyone to use and create with.
VoiceXP demonstrated a number of unique skills specifically for the Echo Show. I also appreciated that Bob Stolzberg really emphasized the fact that the Echo Show is version one of Amazon’s multi-modal strategy. We’re quite literally scratching the surface here with what’s possible when you add in screens and mixed-modality into the #VoiceFirst equation. Some really exciting opportunities around this.
Keynote speaker Ahmed Bouzid presented a plethora of fascinating facts and charts, but the one that stood out to me were two of the demographics that can benefit the most from a #VoiceFirst world: seniors and physically incapacitated. This is at the heart of why I’m so passionate about spreading awareness to the #audpeeps, audiology and hearing aid industry about #VoiceFirst. Smart assistant integration is coming to hearing aids and this new use case for hearing aids, hearables and all our ear-computers, stands to really benefit those who struggle in a mobile world. Mobile computing is not conducive to these two demographics and I’m ecstatic about the possibility that these demographics will soon be empowered in a way they never have before. It’s an awesome value-add that you can advocate for free and it will dramatically improve the patient experience over time.
This was an amazing two days and I’ll definitely be back next year to continue to gather everything I can about what’s happening in the #VoiceFirst world and how they’ll ultimately impact those of you who work with the little computers that go in the ear.
The annual Consumer Electronics Show (CES) took place this past week in Las Vegas, bringing together 184,000 attendees and a whole host of vendors in the consumer electronics space to showcase all of the new, innovative things each is working on. Once again, smart assistants stole the show, making this the third year in row where smart assistants seem to be gradually dominating the overall theme of the show. Along with the Alexa-fication of everything, there were a number of significant hearable announcements, each in some way or another incrementally improving and expanding on our mini ear-computers. Although I was not in attendance, these are my five takeaways from CES 2018:
1. The Alexa-fication of Everything
It seemed like just about every story coming out of this year’s show was in some way tied to an Alexa (or Google…but mainly Alexa) integration. We saw Kohler introduce the “connected bathroom” complete with a line of smart, Alexa-enabled mirrors, showers, toilets (yes, toilets), bathtubs and faucets. First Alert debuted its new Onelink Safe & Sound carbon monoxide and smoke detector with Alexa built-in. Harman revealed an Echo Show competitor, the JBL LINK View, powered by Google’s assistant.
My personal favorite of the smart-assistant integrations around the home, was the inconspicuous smart light switch, the Instinct, by iDevices. By converting your standard light switches around your home to the Instinct, you enhance the utility of the switch by an order of great magnitude, as it allows for motion-detection lighting, energy savings, and all the benefits of Alexa built-right into your walls.
And that’s just the integrations that emerged for the home, as the car became another area of focus of smart assistant integration at this year’s show. Toyota announced that it would be adding Alexa to a number of its Toyota and Lexus cars, starting this year. Kia partnered with Google Assistant to begin rolling that feature out this year too. Add these integrations to the list that also includes Ford, BMW and Nissan from previous announcements. Mercedes decided it doesn’t need Google or Amazon, and unveiled its own assistant. And finally, Anker debuted a bluetooth smart charger, Roav Viva, that can access Alexa in whatever car you’re in for only $50.
Alexa, Google and the other smart assistants are showing no sign of slowing down in their quest to enter every area that we exist.
2. Bragi Announces “Project Ears”
Bragi’s “Project Ears” is a combination of tinnitus relief and personalized hearing enhancement. This announcement was exciting for two reasons.
What’s particularly interesting about Bragi is its partnership with “Big 6” hearing aid manufacturer Starkey, and the byproducts of that partnership that we’re beginning to see. Last week, I wrote about Starkey’s announcement of the “world’s first hearing aid with inertial sensors” and how that was likely a byproduct of the Bragi partnership, as Bragi has been on the forefront of embedding sensors into small, ear-worn devices. Fast-forward one week to CES, and we see Bragi’s Project Ears initiative, which includes “tinnitus relief” by embedding tinnitus masking into the device to help relieve the ringing in one’s ears. So, we see Bragi incorporating elements of hearing aids into their devices, just as we saw Starkey incorporating elements of hearable technology into their hearing aids. The two seem to be leveraging each others’ expertise to further differentiate in each’s respective markets.
The second aspect to this announcement, stems from Bragi’s newly announced partnership with Mimi Hearing Technologies. Mimi specializes in “personalized hearing and sound personalization” and as a result, Bragi’s app will include a “scientific hearing test to measure your unique Earprint™.” This is ultimately to say that the hearing test issued by Bragi’s app will be iterated and improved via this partnership with Mimi. Bragi wants to match you as accurately as possible to your own hearing profile, and this announcement shows that they’re continuing to make progress in doing so.
3. Nuheara Unveils New Products & Utilization of NAL-NL2
Nuheara, the hearable start up from down-under, introduced two new products at this year’s show. The first was the LiveIQ, a pair of wireless earbuds that are priced under $200. These earbuds will use some of the same technology that Nuheara’s flagship hearable, IQBuds, use, as well as providing active noise cancelling.
The second device introduced was the IQBuds Boost, which will essentially serve as an upgrade to the current IQBuds. The IQBuds Boost will use what Nuheara has dubbed “EarID™” which will provide for a more “personalized experience unique to the user’s sound profile.” Sounds familiar, right? Bragi’s “Earprint™” technology and Nuheara’s “EarID™” both aim to serve as a way in which the user can further personalize their experience via each company’s companion app.
In addition to the new product announcements, Nuheara also announced a partnership with the National Acoustic Lab (NAL), “to license its international, industry-recognized NAL-NL2 prescription procedure, becoming the only hearable company globally to do this.”
Here’s what Oaktree Product’s in-house PhD audiologist, AU Bankaitis, had to say about the significance of this announcement:
“Kudos to NuHeara for partnering with the National Acoustic Lab (NAL), the research arm of a leading rehabilitation research facility that developed the NAL-NL2 prescriptive formula commonly applied to hearing instruments. It will be interesting to see how this partnership will influence future IQBud upgrades. Whether or not this approach will result in a competitive advantage to other hearables remains to be seen. Research has clearly shown that relying on a fitting algorithm without applying objective verification with probe-mic measurements often times results in missing desired targets for inputs and frequencies most critical for speech. “
4. Qualcomm Introduces New Chipset for Hearables
Some of the most exciting innovation happening in the whole wearable market, and in particular the hearable sub-market, is taking place under the hood of the devices. Qualcomm’s new chipset, the QCC5100, is a good representation of the innovation occurring inside the devices, as these chips will reduce power consumption by 65%, allowing for increased battery life. Per Qualcomm’s SVP of Voice & Music, Andy Murray:
“This breakthrough single-chip solution is designed to dramatically reduce power consumption and offers enhanced processing capabilities to help our customers build new life-enhancing, feature-rich devices. This will open new possibilities for extended-use hearable applications including virtual assistants, augmented hearing and enhanced listening,”
It’s wild to think that it was only back in 2016 (pre-AirPods), when battery life and connectivity stood as major barriers of entry for hearable technology. AirPods’ W1 chip dramatically improved both, and now we see other chip makers rolling out incremental improvements, further reducing those initial roadblocks.
5. Oticon wins Innovation Award for its Hearing Fitness App
Oticon’s upcoming “hearing fitness app” that will be used in conjunction with Oticon’s Opn hearing aids illustrates the potential for this new generation of hearing aids that are able to harness the power of user data. The app gathers data from your hearing aid usage, to allow the user to view their data in an app that looks somewhat similar to fitbit’s slick data readouts. The app will display the user’s hearing aid usage, which can then be used to further enhance the user’s experience based on the listening environments the user is experiencing. So, not only will this empower users, but this will also serve as a great tool for Audiologists to further customize the device for their patient using real data.
Furthermore, this app can integrate other data from other wearable devices, so that all of the data is housed together in one app. It’s important to look at this as another step toward bringing to fruition the idea that hearing aids are undergoing a makeover into multi-function devices, including “biometric data harvesting” to provide actionable insight into one’s data. For example, if my hearing aids are recording my biometric data, and my app notifies me that my heart rate is acting funky or my vitals are going sideways, I can send that data to my doctor and see what she recommends. That’s what this type of app ultimately could be, beyond measuring one’s “hearing fitness.”
What were your favorite takeaways from this year’s show? Feel free to comment or share on twitter!
I will be traveling to the Alexa Conference this week in Chattanooga, Tennessee and will surely walk away from there with a number of exciting takeaways from #VoiceFirst land, so be sure to check back in for another rundown next week.
Editor’s Note: In my initial post, I mentioned that along with the long-form assessments I’ve been publishing, I’d also be doing short, topical updates. This is the first of those updates.
In the first week of 2018, we saw a handful of significant updates that pertain to various trends converging around ears. Here’s a rundown of what you need to know:
Amazon introduces the Amazon Mobile Accessory Kit (AMAK)
As Voicebot.ai reported from an Amazon blog post, Amazon’s new Mobile Accessory Kit will allow for much easier (and cheaper) Alexa integration into OEM manufacturer’s devices, such as hearables. It’s been possible in the past to integrate Alexa into third party devices, but this kit will serve as a much more simplified process to convert any type of hardware into Alexa-integrated hardware. This is great news for this new use case, as it will surely put Alexa in more and more of our ear-worn devices.
Per Amazon’s senior product manager, Gagan Luthara:
“With the Alexa Mobile Accessory Kit, OEM development teams no longer need to perform the bulk of the coding for their Alexa integration. Bluetooth audio-capable devices built with this new kit can connect directly to the Alexa Voice Service (AVS) via the Amazon Alexa App (for Android and iOS) on the customer’s mobile device.”
Starkey Announces Exciting Additions to Next Generation Hearing Aids
There were a number of exciting revelations at Starkey’s Biennial Expo, but among all the announcements, there were two that really intrigued me. The first was the inclusion of “fall detection” sensors in Starkey’s next generation of hearing aids. This will be the first hearing aid with inertial sensors:
On the surface, this is really great, as every 11 seconds an older adult is treated in the emergency room for a serious fall. The purpose of these sensor is to detect those type of falls, so that the user can get immediate help. What’s even more intriguing is the fact that we’re now beginning to see advanced sensors being built into this new wave of hearing aids. As I will write about soon, the preventative health benefits combined with smart assistants, offer some very exciting possibilities and another promising use case for our ear-worn devices.
The second announcement, was the upcoming live-language translation feature to be added to this same, next generation of Starkey hearing aids. This stems from Starkey’s partnership with hearable manufacturer, Bragi, which has this feature available with its Bragi Dash Pro. The live-language translation is not Bragi’s proprietary software, as Bragi currently uses the third party application, iTranslate to power this feature for its device. Although it has not been announced formerly, I expect that Starkey’s live-language translation feature will also be powered by iTranslate. Expect more features like this to become more widespread across our connected devices over time as more manufacturers support this type of integration.
As we move into week two of 2018, expect another wave of exciting announcements coming out of CES. Check back here next week as I will be doing a rundown of the most important takeaways coming out of Vegas this week.
If you examine the past 50 years of user interfaces in computing, what you’ll see is that a new one surfaces every 10 years or so. Each of these new interfaces has been an incremental step away from hardware-based interfaces, to ones that are more software-based. From the 1970’s – early 1980’s, in order to “communicate” with a computer and issue your intended command, you’d need to use Punch Cards and Command Lines.
PCs were introduced in the 1980s and as computers began to migrate from the military, government and academia, into our homes, so too did the Graphical user interface start to permeate as it was far more user-friendly for casual computer users than Command Lines. This was the preferred user interface until the mid-90’s when the Internet began to really take off.
As the Internet opened the door to an endless amount of new uses and functions for computers, the Hypertext interface (HTML) bloomed as we needed an interface that was more conducive to web-based functionality, such as hyperlinking and connecting parts of the web together.
Then in 2007, Steve Jobs famously ushered in the mobile computing era with the unveiling of the iPhone. Along with the introduction to our pocket-sized supercomputers, we were also presented with the Multi-Touch interface which has gone on to become the most widely preferred interface globally.
So, 10 years after the iPhone debuted and based on the history of new user interfaces surfacing every 10 years or so, it begs the question, “what’s next?” Since this is FuturEar after all, you better believe it will largely center around our ears, voices and how we naturally communicate.
There are two underlying factors to consider when looking at why we gravitate toward each evolution in user interfaces. The first is the tendency for users to prefer as little frictionas possible. Friction essentially represents the clerical, tedious work that you’re required to do in order to fully execute your command. Let’s use maps as an example and the idea of trying to get from point A to B in an unknown area.
In the past, prior to the PC and internet, you were limited to good, old-fashioned maps or asking for directions. Then, technology enabled you to use the likes of MapQuest which allowed you to print off turn-by-turn directions. Today, in the mobile era, you can simply pull up your favorite map app, punch in your destination, and let your phone guide you. Each progression reduced friction for the user, requiring less time and energy to do what you were trying to do: get from point A to point B.
The second factor to look at is the type of computers being used in conjunction with the user interfaces. When we shrank our computers down to the size of a phone, it wasn’t feasible to use a mouse and keyboard, so we shifted to just using our fingers on the screen. Nor was HTML necessary prior to the internet. The interface adapts as the computers we’re using evolve.
Which brings us to our über-connected world where we’re bringing everything we possibly can online. Gartner estimates that in this age of the Internet of Things (IoT), we’ve brought 8.4 billion devices online and that figure will climb to 20.4 billion devices by 2020. So, how then do we control all of these connected-devices, while continuing to reduce friction?
The answer lies in what tech pioneer Brian Roemmele has coined the “Voice First” interface. He hypothesizes that as we move into this next decade, we’ll increasingly shift from issuing commands with our fingers, to issuing them with our voice. Which is great, because speech and language are humans’ most natural form of communicating, meaning there’s no learning curve in adopting this habit. This is an interface that is truly for all ages and levels of sophistication. It’s built to be as simple as conversing with the people around us.
So, what are we actually conversing with? That would be our smart assistants, which are primarily housed in our smart speakers and phones currently. Amazon took an early lead in the smart speaker market, but it didn’t take long for Google to introduce its own line of “OK Google” speakers, resulting in 20 million Alexa speakers and 7 million Google speakers sales thus far. This number will grow significantly before year’s end, as it’s estimated that 20% of US households will be purchasing a smart speaker for the holidays.
You might be asking, “but wait, we’ve had Siri in our iPhones since 2011, how is this different?” You’re right, but it wasn’t until recent machine learning breakthroughs that have drastically improved speech recognition accuracy in understanding us. Hence the recent popularity of these smart speakers and our voice assistants. There are far less, “I’m sorry, I didn’t understand that” responses and they serve an increasingly important role in facilitating our commands to control the billions of connected IoT devices we keep bringing online.
So, let’s look at the two criteria that we need to check off in order for this interface to be mass-adopted. We need to ensure the interface is conducive to the computers we’re using and do so in a way that reduces friction beyond how we’re interacting with them today. Voice provides us the ability to quickly control all of our IoT devices with simple voice commands, trumping the finger tapping and app toggling that multi-touch offers. When it works properly, speaking to our assistants should feel like talking to a genie, “Abra Kadabra, your wish is my command.”
TV – “Alexa change the channel to the Kansas basketball game,”
Thermostat – “Ok Google, bump the temperature up to 72°”
Security Cam – “Hey Siri, send me the last 10 minutes of my Dropcam footage to my phone.”
Groceries – “Alexa order me all the ingredients for Dave’s Famous Souffle recipe”
I believe that over the course of the next decade the Voice interface will continue to become more powerful and pervasive in all of our lives. Although we’re in the infancy of this new interface, we’ve quickly begun adopting it. Google confirmed 20% of its mobile searches are already conducted via voice, Pew Research found that 46% of Americans currently use a voice assistant, and Gartner projects that 75% of US households will own at least one smart speaker by 2020.
We’re also seeing smart speakers and voice assistants begin wading into new waters, such as the workplace, cars, and hotel rooms. This will likely open up brand new uses cases, continue to increase the public’s exposure to smart assistants, and expand our understanding of how to better utilize this new technology. We’re already seeing an explosion of skills and applications, and as each assistant’s user network grows, so too do the network effects for each assistant’s platform (and the interface as a whole) as developers become increasingly incentivized to build out the functionality.
Just as we unloaded our various tasks from PCs to mobile phones and apps, so too will we unload more and more of what we currently depend on our phones for, to our smart assistants. This shift from typing to talking implies that as we increase our dependency on our smart assistants, so too will we increase our demand for an always-available assistant(s).
What better place to house an always-available assistant than our connected audio devices? This isn’t some new, novel idea, as 66% of all hearables already include smart assistant integration (this figure is almost entirely driven by Apple’s AirPods). In addition to AirPods, we saw Bose team up with Google to embed Ok Google in Bose’s next line of headphones, and Bragi integrate Alexa in the Dash Pro’s most recent update. Rather than placing smart speakers throughout every area we exist, why not just consolidate all of that (or a portion) to an ear-worn device that grants you access whenever you want?
I originally surmised that our connected audio devices will give way to a multitude of new uses that extend way beyond streaming audio. Smart assistants provide one of the first, very visible use cases beginning to emerge. I believe that smart assistant integration will become standard in any connected audio device in the near future – be it ear-buds, over-the-ear headphones or hearing aids. This will provide a level of control over our environments that we have not yet seen before, as we simply need to whisper our commands for them to be executed.
Our own little personal genie in the bottle ear… what better way to reduce friction than that?
Think back to 2007 when the iPhone debuted. If you recall, the device itself was pretty unique – multi-touch touchscreen, completely new user interface, iconic form factor – but, the initial functionality that it provided was not all that different from what existed at that time. Steve Jobs introduced the iPhone as, “an iPod, a phone and an internet communicator.” Aside from the iPod/music aspect of the iPhone, the other “smartphones” at that time provided the same combination of phone, email and (limited) internet. It just couldn’t do much more than what already existed.
The original iPhone was met with a lot of criticism and it was easy to point out the shortcomings and hard to see the potential. We know how this story goes, though, as the iPhone went on to be a smashing hit and many of us use one today. What’s interesting, however, is to look at why the iPhone was so successful. One of the primary reasons for its success was due to the power of network effects that Apple leveraged.
One year after the iPhone was released, Apple introduced the App Store with their iOS version 2.0. In less than two months, iPhone users had combined to download 3,000 different apps, 100 million times. This caught the attention of the software developer community, “100 million downloads in less than 60 days.” A pipe dream come true for anyone who had the technical wherewithal to develop mobile software. The gold rush was on, and thus began the virtuous cycle that is network effects.
Each new person that bought an iPhone became a potential candidate to download apps. This growing pool of users incentivized developers to create new apps, compete in existing apps to make them better, and introduce new features that could generate revenue. The more users there were, the more potential customers developers could acquire.
Simultaneously, as this third-party app ecosystem grew, it spurred further adoption of the iPhone because of the constant influx of new apps, or better apps that could be downloaded through the App Store. The more stuff you could do with an iPhone, the more compelling it became to purchase one. The value just kept appreciating.
This is what was so revolutionary about the App Store – it created a marketplace that brought together third-party developers and users. By bridging the two, it allowed for the developers to produce an endless supply of utility, functionality and capabilities to be instantly downloaded and utilized by the users, enhancing the value of the device. It generated entirely new use cases and reasons to use a smartphone.
Ok, so what? Well, as I pointed out in my previous post, we’re all shifting to using “connected” audio devices. Furthermore, our connected audio devices as a whole represent a quasi-network, as one of the common denominators across these devices is the wireless connectivity to a smartphone. Therefore, these connected, ear-worn devices serve as new delivery mechanisms for software. Network effects can now begin to take hold because we’re using audio devices that can seamlessly access apps from our phones. In other words, we’ve erected additional bridges to allow developers to supply limitless value to our ears and wherever else we’re wearing computers on our bodies.
Thus, the virtuous cycle becomes enabled. As the number of connected audio device users steadily increases, developers become motivated to build apps specifically for said devices, resulting in more incentive to go buy Airpods, Pixel Buds, MFi Hearing Aids, or the many hearables to take advantage of all the new stuff you can do with these things. That’s why the shift to connected devices is so fundamentally significant. It has now become technically feasible and financially motivating for developers to create apps tailored to our little, ever-maturing ear computers.
We’re at day one of this new phase of software development, yet we’re already seeing applications specifically targeting and catering to this network. Smart Assistant integration, apps designed to collect and provide actionable insight on your Biometric data, live-language translation, and augmented audio. These are some of the first new applications and use cases for connected audio devices that will transform our single-dimensional devices into more sophisticated and capable pieces of hardware, enhancing their value.
That’s why I think it’s so important to point out that regardless of whether your interests lie in hearing aids, Airpods, or hearables, you should be excited about the innovation that is taking place in any one facet of the connected-device network. Over time, software and features tend to become made widely available throughout the network, so we shouldn’t really care where the innovation originated. Sure, some devices will be capable of things that others won’t, but for the most part, you’ll be able to do a lot more with your connected audio devices compared to what we’re used to with previous generation devices. Just as we learned over the past decade with our smartphones, network effects help to accelerate this pace of change.
Hello and welcome to FuturEar! As the name insinuates, the purpose of this blog is to provide an ongoing account of the rapidly evolving audio landscape. My goal is to help make sense of all the trends that are converging toward the ear and then consider the implications of those progressions. This blog will feature both long-form assessments, as well as short, topical updates on news pertaining to the ear.
The inspiration for this blog was the realization that we’ve quickly begun wirelessly connecting our ears to the internet. For starters, at a broad level, Americans are buying more Bluetooth headphones than non-Bluetooth headphones:
If you look more specifically into any one segment of audio devices, you’ll see the trend applies there too. If we are considering hearing aids, Resound introduced the first Made for iPhone (MFi) “connected” hearing aid back in 2013 – the Linx. Flash forward to today, and all six major hearing aid manufacturers sell a MFi hearing aid (Phonak’s Audeo B hearing aid is actually compatible with Android too.) Similar to headphones, the majority of hearing aids now entering the market are connected devices.
Hearables, everyone’s favorite buzzword, have collectively attracted more than $50 million through crowdfunding on sites like Kickstarter and Indiegogo. These devices are inherently connected and wireless. Additionally, you have the 800 lb. gorilla in the room – Apple – that introduced Airpods last December. Airpods have accounted for 85% of totally wireless headphone dollar sales in the US since last December. Google launched its own flagship headphones, the Pixel Buds, at the beginning of October. So now we have two of the largest tech companies in the world competing and innovating audio hardware.
This shift to Bluetooth-connected devices represents a fundamental change to these devices, as this new generation of connected devices are able to leverage the power of software. Essentially, previous, non-connected devices would be considered “entropic” meaning that they flat-line then depreciate in value as the hardware deteriorates. There’s no new value created by the device.
On the flip-side, these connected devices are “exotropic” meaning that they appreciate in value, so long as the hardware permits (all hardware eventually craps out). Through over-the-air software and firmware updates, as well as software app integration, new value is constantly being created. In other words, we’ve essentially gone from using headphones and hearing aids that are akin to flip phones to one’s that more closely resemble iPhones.
This blog will explore all of that new value, honing in on specific new use cases, as well as piecing together how a multitude of seemingly disparate trends all relate and ultimately lead to the ear. Just like all software-powered hardware that’s connected to the cloud, these devices will evolve, iterate and advance quickly and shift in unexpected ways. Exciting times!