Essays, Hearables, VoiceFirst

A Journey to the Center of the Ear

Ear valley

The Road Starts Here

If you examine the past 50 years of user interfaces in computing, what you’ll see is that a new one surfaces every 10 years or so. Each of these new interfaces has been an incremental step away from hardware-based interfaces, to ones that are more software-based. From the 1970’s – early 1980’s, in order to “communicate” with a computer and issue your intended command, you’d need to use Punch Cards and Command Lines.

PCs were introduced in the 1980s and as computers began to migrate from the military, government and academia, into our homes, so too did the Graphical user interface start to permeate as it was far more user-friendly for casual computer users than Command Lines. This was the preferred user interface until the mid-90’s when the Internet began to really take off.

As the Internet opened the door to an endless amount of new uses and functions for computers, the Hypertext interface (HTML) bloomed as we needed an interface that was more conducive to web-based functionality, such as hyperlinking and connecting parts of the web together.

Then in 2007, Steve Jobs famously ushered in the mobile computing era with the unveiling of the iPhone. Along with the introduction to our pocket-sized supercomputers, we were also presented with the Multi-Touch interface which has gone on to become the most widely preferred interface globally.

So, 10 years after the iPhone debuted and based on the history of new user interfaces surfacing every 10 years or so, it begs the question, “what’s next?” Since this is FuturEar after all, you better believe it will largely center around our ears, voices and how we naturally communicate.

User Interface Shift
From Brian Roemmele’s Read Multiplex 9/27/17

Reducing Friction

There are two underlying factors to consider when looking at why we gravitate toward each evolution in user interfaces. The first is the tendency for users to prefer as little friction as possibleFriction essentially represents the clerical, tedious work that you’re required to do in order to fully execute your command. Let’s use maps as an example and the idea of trying to get from point A to B in an unknown area.

In the past, prior to the PC and internet, you were limited to good, old-fashioned maps or asking for directions. Then, technology enabled you to use the likes of MapQuest which allowed you to print off turn-by-turn directions. Today, in the mobile era, you can simply pull up your favorite map app, punch in your destination, and let your phone guide you. Each progression reduced friction for the user, requiring less time and energy to do what you were trying to do: get from point A to point B.

The second factor to look at is the type of computers being used in conjunction with the user interfaces. When we shrank our computers down to the size of a phone, it wasn’t feasible to use a mouse and keyboard, so we shifted to just using our fingers on the screen. Nor was HTML necessary prior to the internet. The interface adapts as the computers we’re using evolve.

Which brings us to our über-connected world where we’re bringing everything we possibly can online. Gartner estimates that in this age of the Internet of Things (IoT), we’ve brought 8.4 billion devices online and that figure will climb to 20.4 billion devices by 2020. So, how then do we control all of these connected-devices, while continuing to reduce friction?

Walking through the woods

Abra Kadabra

The answer lies in what tech pioneer Brian Roemmele has coined the “Voice First” interface. He hypothesizes that as we move into this next decade, we’ll increasingly shift from issuing commands with our fingers, to issuing them with our voice. Which is great, because speech and language are humans’ most natural form of communicating, meaning there’s no learning curve in adopting this habit. This is an interface that is truly for all ages and levels of sophistication. It’s built to be as simple as conversing with the people around us.

So, what are we actually conversing with? That would be our smart assistants, which are primarily housed in our smart speakers and phones currently. Amazon took an early lead in the smart speaker market, but it didn’t take long for Google to introduce its own line of “OK Google” speakers, resulting in 20 million Alexa speakers and 7 million Google speakers sales thus far. This number will grow significantly before year’s end, as it’s estimated that 20% of US households will be purchasing a smart speaker for the holidays.

You might be asking, “but wait, we’ve had Siri in our iPhones since 2011, how is this different?” You’re right, but it wasn’t until recent machine learning breakthroughs that have drastically improved speech recognition accuracy in understanding us. Hence the recent popularity of these smart speakers and our voice assistants. There are far less, “I’m sorry, I didn’t understand that” responses and they serve an increasingly important role in facilitating our commands to control the billions of connected IoT devices we keep bringing online.

So, let’s look at the two criteria that we need to check off in order for this interface to be mass-adopted. We need to ensure the interface is conducive to the computers we’re using and do so in a way that reduces friction beyond how we’re interacting with them today. Voice provides us the ability to quickly control all of our IoT devices with simple voice commands, trumping the finger tapping and app toggling that multi-touch offers. When it works properly, speaking to our assistants should feel like talking to a genie, “Abra Kadabra, your wish is my command.”

  • TV – “Alexa change the channel to the Kansas basketball game,”
  • Thermostat – “Ok Google, bump the temperature up to 72°”
  • Security Cam – “Hey Siri, send me the last 10 minutes of my Dropcam footage to my phone.”
  • Groceries – “Alexa order me all the ingredients for Dave’s Famous Souffle recipe”

Ear welcome home

Heading Home

I believe that over the course of the next decade the Voice interface will continue to become more powerful and pervasive in all of our lives. Although we’re in the infancy of this new interface, we’ve quickly begun adopting it. Google confirmed 20% of its mobile searches are already conducted via voice, Pew Research found that 46% of Americans currently use a voice assistant, and Gartner projects that 75% of US households will own at least one smart speaker by 2020.

We’re also seeing smart speakers and voice assistants begin wading into new waters, such as the workplacecars, and hotel rooms. This will likely open up brand new uses cases, continue to increase the public’s exposure to smart assistants, and expand our understanding of how to better utilize this new technology. We’re already seeing an explosion of skills and applications, and as each assistant’s user network grows, so too do the network effects for each assistant’s platform (and the interface as a whole) as developers become increasingly incentivized to build out the functionality.

Just as we unloaded our various tasks from PCs to mobile phones and apps, so too will we unload more and more of what we currently depend on our phones for, to our smart assistants. This shift from typing to talking implies that as we increase our dependency on our smart assistants, so too will we increase our demand for an always-available assistant(s).

What better place to house an always-available assistant than our connected audio devices? This isn’t some new, novel idea, as 66% of all hearables already include smart assistant integration (this figure is almost entirely driven by Apple’s AirPods). In addition to AirPods, we saw Bose team up with Google to embed Ok Google in Bose’s next line of headphones, and Bragi integrate Alexa in the Dash Pro’s most recent update. Rather than placing smart speakers throughout every area we exist, why not just consolidate all of that (or a portion) to an ear-worn device that grants you access whenever you want?

I originally surmised that our connected audio devices will give way to a multitude of new uses that extend way beyond streaming audio. Smart assistants provide one of the first, very visible use cases beginning to emerge. I believe that smart assistant integration will become standard in any connected audio device in the near future – be it ear-buds, over-the-ear headphones or hearing aids.  This will provide a level of control over our environments that we have not yet seen before, as we simply need to whisper our commands for them to be executed.

Our own little personal genie in the bottle ear… what better way to reduce friction than that?

-Thanks for Reading-

Dave

7 thoughts on “A Journey to the Center of the Ear”

Leave a Reply