I like to occasionally use the daily update as a way to surface new endeavors in content creation by people working around the various areas that FuturEar covers. Today, I wanted to dig into the podcast that Nick Myers has recently launched, The Artificial Podcast, where he explores the world of AI and voice technology. He’s published three episodes thus far, one detailing the intent of the podcast, the second with guest Scot Westwater, and the third going into depth on Alexa in particular.
Nick is the founder and creative director of RedFox Creative, a creative marketing and consulting agency. As someone who works with companies of all sizes to implement emerging technology, Nick brings a thoughtful perspective to the table with his podcast by being grounded in what his clients are asking for. That’s what I think distinguishes his podcast – the fact that it’s based around real interactions with clients.
If you’re someone that is hungry for another podcast centered around voice technology and the AI that drives it, give The Artificial Podcast a listen. I particularly liked the episode with Scot Westwater, as I found their discussion to be rather fascinating, particularly around how businesses and brands can immediately start planning around the implications that voice technology is bringing to the market. I’m looking forward to seeing what Nick has in store and to listen in as he brings a variety of guests on the show from around the industry. As the market for voice technology grows, so too does the community of experts on all the different aspects of voice and AI.
One of the persistent thoughts that runs through my head as it pertains to the emerging voice market and all the affiliated hardware, is that the building blocks are coming together for a new type of social media experience, one that is predominately built around audio. Today’s audio content however is pretty static. There’s no real native aspect of engagement, which we know is a fundamental pillar of today’s social media. Twitter, for better or worse, would not be Twitter if not for replies, retweets and liking tweets. The same goes for Instagram. So, to me, in order to see this idea of an audio-based social medium take hold, the content needs to become dynamic and engage-able by the audience.
Anchor is at the forefront of changing the way we interact with audio. As a podcasting platform, which was recently acquired by Spotify, Anchor presents a platform that is designed from the ground up to help podcasters create and disseminate their content out. Now, it would appear that engagement is becoming a focus for the company as it makes the (rather hidden) voice messaging feature much more broadly accessible.
Listeners can send an audio message, no matter the device or browser, which can then be implemented by the podcaster into future episodes. I believe this is only the start in terms of what can be done with engagement. Here are some examples of how we could see engagement being built out for an audio-based social medium:
Leave a review via your voice – Giving fellow listeners a chance to hear a review would be pretty cool. Maybe it then gets transcribed into a text review as well.
“Let’s take some callers” – blending some of the best aspects of radio with podcasting. Each week, designating a portion of your show to “calls” (voice messages) from your listeners. Maybe you pose a question to the listeners and ask for them to send in their responses, and each week you read off the answers from the previous episode’s question. I think this alone might berth a whole new set of podcasting formats.
The next-level step here would be to say “Alexa/Google …let’s send in a voice message…record it, send it, and then resume playing the episode.
Audio snippets – I believe this exists to an extent today for the podcaster to share snippets of their episodes, but I would hope that feature gets opened up to to the listeners too. Being able to grab your favorite clip of a podcast and share it out very quickly on the various legacy social channels would be awesome. It would be cool for the podcaster to know what portions are resonating most with people too.
These are just a few ideas and I’m super curious to know what others think about this concept of audio-engagement and other directions we can go with this. Tweet at me and let me know what type of audio-engagement features you’d like to see built out!
I was invited this weekend to speak on a panel discussing the impending OTC hearing aid law that will go into effect as soon as the FDA finishes drafting the guidelines and regulations for this new class of in-the-ear device (sometime in 2020). The panel consisted of Dr. Dawn de Neef, an ENT specialist, Dr. Ram Nileshwar, an audiologist who has been practicing for 30+ years, Kate Handley, VP of Sales at hearing aid manufacturer, Unitron, and myself. We each spoke for about 15-20 minutes, providing each of our unique perspectives to the audience of hearing healthcare professionals.
The perspective that I shared with the audience was that all of our in-the-ear devices are becoming computerized. If a hearable is defined as a, “body-worn computer that resides in the ear,” then it should be understood that just about all of our in-the-ear devices are trending toward becoming hearables. The first major step toward this computerization was by making our ear-worn devices an extension of our smartphones, and therefore, reaping all the abilities and processing power from the smartphone. In the very first post I wrote for FuturEar, I pointed out that this fact was the motivation behind starting the blog – we had begun to seamlessly connect our ears to the internet.
As you can see from the two slides above, Bluetooth standardization has transpired across the last 5 years with consumer and medical in-the-ear devices. We’re seeing companies like Sonova bringing innovative approaches to connectivity with the SWORD chip that is capable of handling 5 different Bluetooth protocols, so you can pair hearing aids embedded with that chip to Android and iOS devices (and soon, the Rodger system).
Apple analysts are projecting that nearly 50 million AirPods will be sold in 2019 alone, with that number estimated to climb to 75 million in 2020. To put that in perspective, we’ve never seen annual hearing aid sales cross 4 million in US. We also need to be aware that Samsung and Google have an AirPods competitor on the market and Amazon will be unveiling its hearable in the second half of this year. These are companies with deep pockets and are aiming to put each of its smart assistant directly in your ear.
So, the question starts to become, “what happens when we’re all wearing mini, ear-computers?”
The first thing that becomes possible, is to increase the things that the device can do and make it more multi-functional. We probably already take it for granted that you can stream any audio to your Bluetooth hearing aids, so in the last five years, the device has become capable of playing music, podcasts and streaming calls. In addition, the devices can log and share data with the smartphone, which can then be sent up to the cloud or can use the processing power of the phone. This is the backbone for new applications, such as automatic adjustments being made by the hearing aid on-the-fly, which is doing so via machine learning in the cloud.
Another set of use cases that we’re beginning to see are being derived from sensors that have finally become miniature enough to fit on a RIC hearing aid. These include inertial sensors and PPG optical-based sensors. These type of sensors can capture a wide variety of fitness and heart-related biometrics, so that you can monitor everything from heart rate variability to the orientation of the user’s body to detect if the user has fallen down. We’re in the early infancy of this new aspect to the devices, but it’s plausible that in five years as the devices become more sensor-laden, hearing aids will act as a preventive health tool. Imagine wearing a hearable that alerts you to warning signs of a heart attack or a stroke – that’s where this technology might ultimately be headed.
If we’re to consider these ear-worn devices as “computers” then it makes sense to think about what kind of operating system and user interface is conducive to something we’re not looking at (since it resides in our ears). To me, the most obvious candidate to play the role of both UI and eventually OS, would be our smart assistants that act on our behalf via voice commands. Alexa, Google Assistant, Bixby and yes, even Siri. Those are just the top layer, “master assistants” with a much larger class of specialized assistants residing a level below that serve as conversational interfaces to the companies that sit behind the specialized assistant (i.e. Mayo Clinic).
These are just two of many examples of how these devices and their use cases will evolve as the devices become standalone computers in their own right. The Apple Watch has gone through four product iterations, with the first version being nothing more than a digital watch with a few features, to a device that is now capable of supporting its own cellular, LTE connection and embedded with a medical-grade ECG monitor. It’s possible that our hearables will follow a similar trajectory to being standalone devices, with medical-grade sensors of their own, especially when you factor in what’s possible with the companion charging cases (that will be a fun piece to write one of these days).
As I pointed out in my talk, the computerization of all our ear-worn devices will act like a rising tide that lifts all ships. We’ll likely see OTC devices that have multiple qualities of hearables, such as Bluetooth connectivity and companion apps that are used to program and calibrate the device to the user’s hearing loss. Just as Bluetooth connectivity has become standardized across the past five years, we’ll see other elements of the devices become table-stakes across the next five years, and there’s not much reason to think that OTC devices will be excluded from the increasing sophistication. As the tide rises, so too will the premium hearing aids be lifted in its capabilities such as performing better in situations like speech-in-noise thanks to innovation around machine learning and augmented audio and filtering.
The Role of the Professional with OTC Devices
At the end of the day, in a world of OTC, the professionals’ value resides with their expertise and knowledge. As the market becomes more saturated with options, the consumer can end up paralyzed with too many choices and ultimately look to an expert. At Oaktree, our in-house Audiologist, Dr. AU Bankaitis has worked with Wash U to establish a protocol to run OTC and PSAP devices through, to determine how well the devices perform across nine frequencies in eight standard audiometric configurations.
We’ve started to build a database of our findings and are working with Wash U, University of Pittsburgh and Johns Hopkins to build the database. The idea here is for the busy professional to use the database to help them determine what might make sense to bring into their clinics and offer their patients.
For the hearing healthcare professional, I believe OTC allows for the professional to act as experts and be a provision of knowledgeable assistance. This is an entirely different business model, one that is service-based, rather than device-based in terms of the way to generate revenue. As the landscape of options for the patient to choose from becomes murkier and over-saturated with options, experts can step in to understand the patient’s needs and then guide the patients through their options to help connect them to the best device specific to their needs. In essence, OTC is a perfect opportunity to show more people the value of treating their hearing loss with the help of an expert.
I came across an article by Cory Treffiletti in MediaPost that posed the question, “can advertisers monetize voice?” Cory boiled the current possibilities down to two opportunities – voice search and custom skills. With search, advertisers will need to create more conversational content rather than traditional text content, as the way those results are retrieved is audibly. This is a challenge in itself because of just how new and different this equation is than the incumbent method that we’ve grown accustomed to.
As far as custom skills go, there are hundreds of thousands of registered developers that are creating voice skills to sell to businesses or as consumer-oriented skills, and some have been quite successful, such as Nick Schwab. Companies like VoiceXP have taken the platform approach, to enable virtually any business, of any size, to create it’s own custom skills and create a voice web presence.
In talking with VoiceXP CEO, Bob Stolzberg, for the Harvard Business review article I wrote about businesses using smart speakers as a channel to communicate with their customers, he pointed out that creating a skill is only half the battle. The other half is creating consumer awareness. Just as Cory pointed out in the article with the Pringles example, companies need to take it a step further and really drive awareness toward the skill. Leveraging traditional marketing methods to make people aware of the new marketing channels.
It’s a really well thought article and while the article focused on monetization, Cory did a very good job of concisely articulating what makes voice simultaneously so exciting and promising, while also being quite challenging:
Voice is an interface. It is a UI. In fact, you could consider voice to eventually become something like an operating system in that it gives you a means to access the tools that are important to you, but it is not a tool in and of itself.
It’s also much larger than that. Voice is a way to interact or engage with technology and consumers. It is not a media format to directly monetize. You don’t see ads embedded in Windows or the Apple operating systems, so why would you expect to hear ads embedded in a voice UI?
This is spot on. It’s easy to forget that voice represents a multitude of things. It is akin to Windows and Apple’s OS in that it is the active environment where you interact with the technology and access its utility. The assistants that serve as the UI play the part of mediator and facilitator, and given the context of what you need from your assistant, can lead to an environment where advertising is appropriate. This is no different than being advertised to through all the apps on your phone, whether it be Yelp, Facebook, Pandora, etc. We enter in and out of different apps (environments), which we knowingly and expect to be advertised within.
Sometimes, however, our smart assistants become the “app.” One minute I might be asking Alexa for general inquiries, such as the weather or to play a podcast, and the next I’m asking to shop. When I’m shopping, is it suddenly appropriate for Alexa to advertise to me based on the context? If I’m indicating that I want particular type of item, with no brand specificity, does that prompt Alexa to start suggesting different (paid?) results based on what it knows about me? The same goes for Google, when I ask it to tell me about the best restaurants in a given proximity, is it going to have paid results fed to me first?
The fact of the matter is that voice is a new type of web, it’s a new type of computing, it’s a new user interface, and it’s seemingly on the path to being a full blown OS. There are a few ways to monetize it today, but like the dawn of the internet, we’re only seeing a glimpse of what’s possible that will only become more apparent as the technology as a whole matures over time.
Google released a number of very interesting updates around Google Assistant at this year’s Google I/O, such as Duplex on the web which I wrote about last week. Another key revelation was the upgraded Google Assistant dubbed “Assistant 2.0” that will be made available with the release of Android OS, Q. As you can see in the video below, Assistant 2.0 is handling command after command in near real-time.
As Bob O’Donnell wrote about in his Tech.pinions article yesterday, the underlying reason for this upgrade in speed is that Google has moved the assistant from the cloud to on-the-device. This was made possible due to improvements with the compression of the algorithms processing the spoken commands, which Google cited a 10x improvement from when those commands are processed in the cloud. The end result is near-zero latency and exactly the type of step forward in terms of friction reduction necessary to compel users to interact with their devices via voice rather than tap/touch/swipe.
The other notable aspect of moving the processing away from the cloud and onto the device itself is that it helps to alleviate the privacy concerns surrounding voice assistants. As it stands now, when voice commands get sent to the cloud, they typically are logged, stored, and sometimes analyzed by teams inside Amazon and Google to enhance their machine learning and NLP modules.This has caused quite the controversy as publications like the Bloomberg have stoked fears in the public that big brother is spying on them (although, this article by Katherine Prescott does a very good job relaying what’s really going on).
Regardless, by localizing the processing to the smartphone, the majority of the commands fielded by the assistant no longer get sent to the cloud, and therefore, no longer can be assessed by teams inside the smart assistant providers. The commands that do get sent to the cloud, do so via a new technique Google announced called federated learning, which anonymizes the data and combines it with other people’s data, in an effort to continue training the learning modules.
Ultimately, Google I/O was a shot across Apple’s bow. Apple’s big theme across the past few years has been, “privacy, privacy, privacy.” Well, Google made privacy a focal point of this year’s developer conference, with Assistant 2.0 being one of the clearest examples. Additionally, Google is starting to paint a picture of how our assistants can be used from a utility standpoint with the massive reduction in latency in Google Assistant, along with the introduction of Duplex on the web. Apple has not yet shown Siri’s capacity to do anything near what Google is doing with Google Assistant from a utility standpoint.
The past ten years were all about shrinking down our jobs-to-be-done into apps on a single, pocket-sized super computer – our smartphone. Google is making the case that the next ten years might very well be about utilizing our assistants to now do those jobs for us by interconnecting all the bits and data stored on the smartphone and its apps, so that we don’t have to spend the time and effort communicating with our phones by tapping and swiping, but rather just speak to the phone and tell it what to do.
This past weekend, I was invited by the organization, Speech-Language & Audiology Canada, to present at its annual conference held in Montreal on the topic of what the future of hearing technology will look like across the 2020’s. I presented alongside Dr. Gurjit Singh who is a senior research audiologist and program manager at hearing aid manufacturer, Sonova. The two of us presented for 35 minutes each, which was then followed by a fireside chat and Q&A from the audience of hearing healthcare professionals. The session was moderated by clinical audiologist, Remington Shandro.
My presentation was broken into three portions that all circulated around a central theme: all in-the-ear devices are becoming computerized. Nearly everything is trending toward being a hearable. The first potion focused on understanding the question, “why now?” During this part, I laid out a series of consumer tech trends, hearing aid trends, and the innovation occurring within the devices that is making for certain advancements feasible. For example, pointing out the standardization of Bluetooth hearing aids across the past five years, and then extrapolating on the various new use cases that said standardization will allow for and the feasibility of multi-functional in-the-ear devices.
The second portion begins to look at two sets of use cases that I believe will increasingly enhance the value proposition of the devices – voice computing and biometric sensor tech. If you’ve been following my blog or my twitter feed, then you know I am passionate about voice computing, and so anytime I present on hearables, I always try to make the audience aware of the significance of having our smart assistants right in our ear-worn devices. I typically use the Jobs-to-be-Done framework to illustrate the point that much of what we rely on our smartphones for, and before that our laptops and PCs, will migrate to our voice assistants.
In addition to focusing to voice computing, I also spend time talking about biometric sensors and the idea of converting one’s hearable into a biometric data collector and preventative health tool. Building off the facts I previously laid out around the underlying technology trends, I touch on the idea that certain bio sensors have only recently become miniature enough to be fit onto an in-the-ear device. These sensors allow for all types of data to be captured, which today can detect health risks such as atrial fibrillation and pulmonary embolisms, or whether a person has fallen down (which is a huge driver of hospital visits for older adults). As these sensors continue to shrink in size and become capable of capturing a wider variety of data, we’ll likely view the devices as tools to help keep us healthy by alerting us of potential dangers happening inside our bodies.
Finally, the last portion combines this all together to answer the question, “why does this convergence toward hearables matter?” For hearing healthcare professionals who by-and-large deal with older adults (age tends to be one of the leading indicators of hearing loss), we’re witnessing an evolution with the devices to becoming something so much more profound than an amplification tool. When it comes to voice computing, I point out the fact that older adults are one of the fast growing groups adopting this technology. Adults ages 55-75 years old are among the early adopters – when have we ever seen something like that? And it makes sense! Voice is so great because it’s conducive to the entire age spectrum – it’s natural and not limiting like mobile might be for someone who has poor vision or dexterity.
In addition, the transformation into a preventative health tool could turn out to be essential to our aging population. Every day, 10,000 US baby boomers turn 65 years old, and will continue to until 2030. Using a combination of AI and sensor-laden hardware, such as a hearable, will help to serve as a guardian to one’s health. These devices can do so much more than even a few years ago, and the trends are not slowing down. We’ll continue to see advancements with the components housed in the devices, and as the tech giants all aim their sights at the ear, we’ll likely see Apple, Amazon, Google, etc drive a lot of innovation that is duplicated by other manufacturers, and then ultimately reaped by the consumer.
It was awesome having the opportunity to bring all the concepts I write about to life. I can’t thank the SAC team enough for inviting me out to present on my thoughts around hearable technology. These are fast-changing times, so it’s great to share some of the trends that I follow closely, with busy professionals who will ultimately be impacted by the changing nature of these devices. I firmly believe that these devices are going to only become more and more compelling to the end user as these new use cases and all the technology that goes into the devices, mature.
Oh, and by the way, Montreal is really cool too. 10/10 will go back!
Last year at Google I/O, Google introduced Google Duplex and everyone’s collective jaw dropped. It was sci-fi brought to life – need a reservation or an appointment? Just ask your Google Assistant to do so, and voilah! Your assistant will literally call the restaurant for you, and use strikingly human sounding voice, and book the reservation with the person who answered the phone.
Google’s Duplex technology is impressive to the point where it’s actually a little jarring. As cool as it is to just ask your Google Assistant to book you a table for four at your favorite restaurant, and then five minutes later get an email from Open Table confirming the reservation, it also opens up a whole ethical debate around whether the person on the other line should be informed that she’s talking to an AI bot.
This year, however, Google introduced an update to Duplex that makes the application considerably more useful, without adding any fuel to the ethical debate fire, by focusing Duplex on helping to reduce the time spent booking all types of things online:
What makes Google Duplex and Google Assistant so powerful is Google’s ability to connect all of its properties together for the user, which results in a much more sophisticated level of assistance and a lot more utility for the user. If you use Gmail, it can read through your emails and discern information and context that way. If you use Google Maps, it will know your history of the addresses you’ve entered. If you use Chrome as your browser, Google can access things you’ve auto-filled or saved in your browser. Google calendar? Yep, it will access that.
This is the paradox with smart assistants. If we want them to continue to progress and become increasingly useful, it should be understood that it’s a trade-off. The more data that we share with our assistants (and the companies that sit behind the assistants), the more utility we can derive from them. Sharing our data with Google is what allows for someone to simply say, “rent me a car for my trip,” and have their Google Assistant navigate the process of booking that car for them and understanding what we would want.
Google has upped the ante on what’s possible with smart assistants with Duplex on the web, and is differentiating Google Assistant on the basis of utility. Google is leveraging all of its legacy properties and fusing them together to create the ultimate productivity tool with Google Assistant. It’s going to be very interesting to see how Apple and Amazon each respond to the advances that Google rolled out at this year’s developer conference.
Google held it’s annual developer conference yesterday, Google I/O, and had a flurry of announcements that I will touch upon in later updates. Today, I want to focus on two of the initiatives Google announced around accessibility.
Google has introduced a new feature in its new Android operating system, Q, called Live Caption (I think they were interchanging the name “Live Relay” too). The feature is due out later this year and according to CEO, Sundar Pitchi, “Live Caption makes all content, no matter it’s origin, more accessible to everyone. You can turn on captions for a web video, podcast, even on a video shot on your phone.”
Being able to caption virtually any video on an Android phone that’s running Q will be hugely valuable to the Deaf and hard-of-hearing community. It’s also really convenient for anyone that is in an environment where they want to watch a video and not play the audio. A shout out to KR Liu for her cameo in the video and her collaboration with Google in bringing this feature to life! She and the folks at Doppler Labs were pioneers in the hearables space, and it should come as no surprise when the Doppler alumni pop up here and there with contributions like this. Amazing stuff.
Project Euphonia is another initiative for Google to use its machine learning technology to help train its speech recognition systems for people who have speech impairments. Google is training this particular speech recognition model by people who have had strokes, Multiple Sclerosis, stutters, or any other impairment such as the individual in the video, Dimitri Kanevsky, who is a research scientist at Google and has a speech impairment of his own.
Dimitri alone has recorded 15,000 phrases to help train the model to better understand speech that isn’t traditionally being inputted into the training model. According to Dimitri, his goal and Google’s with Euphonia, is to, “make all voice interactive devices be able to understand any person speaking to it.” This is really important work as it will be crucial to ensure that the #VoiceFirst world we’re trending toward is as inclusive to as many people as possible.
In addition, this project is aiming to bring those who cannot speak into the fold as well. Creating models that can be trained by those with ALS themselves, to recognize facial cues or non-speech utterances (like grunts and hums), which then trigger sounds from companion computers, such as a cheer or a boo. As Dmitri points out, to understand and be understood is absolutely unbelievable.
This is tech for good. Apple’s been doing a lot of great work around accessibility too, and in light of all the tech-backlash, if these companies want to compete for positive PR by re-purposing their technology to empower those who need it most…well, then that’s fine by me!
Microsoft CEO Satya Nadella put the company’s enterprise smart assistant, Cortana, front and center during yesterday’s Microsoft Build developer conference. It appears that Microsoft is making great progress with Cortana, aided by the Semantic Machines acquisition from last year that has helped Cortana get a whole lot more conversational. Watch the Cortana demo in the clip from Nadella’s keynote to understand what I’m talking about.
This is equal parts brilliant and exciting from Microsoft as it continues to double-down on its enterprise properties and works to integrate its new, conversational-capable Cortana into its software, starting with a big emphasis on Microsoft Outlook. In my Daily Update from two weeks ago, I pondered what our smart assistants will look like in the Enterprise space, and extrapolated that one of the most obvious areas would be for companies like Microsoft to enhance all of its Office and Microsoft 365 properties by baking Cortana into the software. It’s amazing to see that two weeks later, we’re beginning to see exactly this, as the demo displays a 30-turn conversation between the user and Cortana, as the user manages her calendar through Cortana on-the-fly.
To me, the biggest takeaway from what Microsoft displayed with Cortana is the fact that we’re moving toward a multi-assistant future. Microsoft is really good at enterprise software, so rather than trying to purpose Cortana toward being a general-use assistant, like Alexa and Google Assistant, Microsoft instead made Cortana a specialized assistant, capable of handling queries that are pertinent to Microsoft’s software, very, very well. I think this is likely what we’ll see moving forward. Companies taking all the assets it has under its umbrella, and then conversationally-enabling those properties with a specialized smart assistant interface.
How Smart Assistants will Impact Each Provider’s Revenue: Amazon
I came across these awesome charts a little while ago from Visual Capitalist and I thought it might be interesting to write about how I see smart assistants impacting each of the major providers’ revenue (excluding Samsung and the Chinese assistant providers).
Let’s start with Amazon. Whenever writing about Amazon and Bezos’ vision for Alexa, it should always be noted that Amazon has 10,000 people working on Alexa. Think about that number for a minute. Clearly Alexa is set to play an important role in company’s future, but what is that role? One possibility, is that Alexa fits into the theme across all of Amazon’s offerings, which is that Amazon always ends up as, “The Tax Man.” I didn’t come up with this analogy, Ben Thompson did back in 2016 with a post that still resonates with me today.
Let me explain, starting with the e-commerce portion of the business. In 2017, More than 50% of all units sold on Amazon.com came from third-party sellers, and the “marketplace fees” it “taxes” the third-party sellers (commissions, shipping and fulfillment) accounted for 18% of its total sales in 2017. In essence, if you want to be a merchant in Amazon’s gigantic marketplace, Amazon takes a cut on every transaction from the merchant, for facilitating the marketplace.
On the consumer side, Amazon Prime could be considered a tax as well. As Amazon continues to capture more and more of the total number of e-commerce transactions, and U.S. retail continues to trend toward e-commerce and away from physical stores, Amazon is effectively collecting a $99/year tax on users who prefer e-commerce to traditional brick & mortar retail.
Along the same lines, the scale of AWS allows for network effects to compound to the point where Amazon’s offering is so appealing for businesses that most companies don’t really think twice about having Amazon service its infrastructure needs. As a business owner, would you rather build out your computing infrastructure yourself, pay the AWS “tax”, or pay one of AWS competitor’s tax, such as Microsoft Azure? It’s becoming increasingly obvious that the answer to that question for the vast majority of businesses, is either AWS or a competitor. Amazon has made it so that it doesn’t make much operational or financial sense to try and build out your own, so you’re better off paying the AWS tax.
Sellers can push more units on Amazon than any other marketplace, but you have to pay the tax man. Customers can get two-day, free shipping on the majority of items sold through Amazon.com bundled with video and audio content, but you have to pay the $99 annual Prime “tax.” You can have all your computing infrastructure needs established and managed by AWS, but you have to pay the tax. As Ben puts it in his piece, “Amazon has created a bunch of primitives, gotten out of the way, and taken a nice skim of the top.”
So what does Amazon “tax” with Alexa? I don’t know what Amazon’s grand plan for Alexa is, but the most obvious area to me is with the combination of Alexa and Amazon Pay. Shoppers can link their Amazon Pay account to their Alexa, so that anything that is purchased through Alexa (whether that be for Amazon goods, or if Alexa is brokering an exchange between the user and some other merchant on any given platform), Amazon facilitates the transaction and therefore reaps the payment fees. For merchants, they can enable payments to be made through Amazon Pay, and then take it a step further and allow for purchases of their goods to be made through Alexa-linked Amazon Pay accounts.
I think this is what’s flying under the radar with Alexa. It’s not as much about Amazon encouraging more buying from consumers on Amazon.com (although Amazon definitely wants that too), but more importantly, Amazon attempting to put Alexa right in the middle of any type of voice commerce transaction. This would effectively mean that Amazon is taxing any transaction that was brokered by Alexa, by fusing its payment offering for shoppers and merchants with its “master assistant” Alexa.
Stay tuned, as I will break down Apple and Siri next.
-Thanks for Reading-
To listen to the broadcast on your Alexa device, enable the skill here