Alexa, Aural Attention Economy, Daily Updates, Future Ear Radio, Google Assistant, Hearables, Hearing Healthcare, Longevity Economy, Podcasts, VoiceFirst

050 – Bret Kinsella – The State of Voice Tech & What Comes Next

Today marks the 50th episode of the podcast, so I want to lead off by saying THANK YOU to all the folks who have tuned in along the way and supported the podcast and blog in one way or another. All the feedback and support I receive serves as motivation to keep chugging along. I would have never gotten this far without all of you!

Since this is a landmark episode, I wanted to bring on someone special, and I thought, “who better than Bret Kinsella?!” For those who don’t know Bret, he’s the founder/CEO and head editor of the publication, Voicebot. His website hosts all the news pertaining to the world of voice technology, and is home to some of the best voice tech analysis and research. Bret is arguably the most knowledgeable person I know in this space, and since voice tech is such a dominant theme here at Future Ear, I figured he’d be the perfect guest for this special episode.

We start with an assessment of the current state of voice technology. When I last saw Bret present his ideas, it was in 2019 at the Alexa conference, where he stated that we had entered into phase two of conversational AI. The first phase pertained to the proliferation of the voice assistants, via the distribution of smart speakers, smart displays, hearables, connected cars, and other modalities that serve as homes for voice assistants.

According to Bret, the second phase can be categorized by habituation and specialization. The habituation piece speaks to the idea of building the habit of using voice assistants. I’ve alluded to this a number of times throughout the podcast as the, “training wheel effect,” which is an analogy coined by Bret. It’s the idea that the habit starts with one use case -perhaps it’s browsing your smart TV with your voice, or maybe it’s controlling your music via voice. Eventually, the habit begins to form and the user begins to migrate other use cases into this paradigm, which leads to the “training wheels” to come off as the habit has been formed.

This is why it’s crucial to search for experiences that are presently 10X better via voice than with incumbent methods. As Bret points out, one glaringly obvious example is controlling your smart TV with your voice – not only because it’s faster, but because it’s more robust too. You can query in more conversational ways and retrieve a broader set of results than you can by simply typing. 10X better.

Media in general is a very interesting use case to think about for building the habit of using voice assistants. As Bret mentions, one of the best episodes of his podcast that highlights the potential for this use case is the conversation Bret had with Andy Webb of the BBC. Due to the vast amount of media content that the BBC generates on a daily basis, it can become cumbersome and overwhelming to search for specific media within the catalogue. So, perhaps the way to innovate around this issue is to create a different method to search and surface specific types of content that the listener might be looking for (i.e. “Hey Beeb, play me all the news clips from today about Manchester United).

Again, to reiterate why this is important, it’s about habit-building. Each use case that migrates from mobile or the web to conversational AI, represents the building of a habit that might eventually be applicable to a broad swath of use cases. This is why I write and podcast about Spotify so much, as I believe ambient media will be the number one driver of conversational AI habituation (I strongly believe Spotify will eventually be a major player in the voice ecosystem).

The other aspect to phase two is specialization. We’re seeing a lot of momentum from companies that are opting to create their own voice assistants, rather than creating Alexa skills/Google actions. These custom assistants tend to be “narrow and deep” in their applications, rather than Alexa and Google being “broad and shallow.” As Bret points out, it’s very possible that we’re going to see this trend continue with specialty assistants and eventually have thousands of voice assistants, each with a very specific application set.

This makes a lot of sense to me. If you’ve heard the episode I’ve had with Jeff Adams of Cobalt Speech, or the episode with Bruce Rasa’s of AgVoice (one of Cobalt’s clients), you’re familiar with some of these different type of custom assistants. In AgVoice’s case, it allows for highly specialized lexicons and natural language processing, which allows for much more customizable use cases (i.e. hands-free reporting tool specifically built for farmers).

From here, we begin to tackle what the next few years of voice assistants will look like. As Bret mentions, we’re likely to exist in phase two for some time, as more compelling habituation opportunities surface, and more specialty assistants are built. However, there are some notable areas to keep an eye on.

Google’s Duplex might ultimately be how businesses manifest a voice presence en-masse. Rather than having to build their own voice applications and experiences, Duplex would enable nearly any business to exist via a voice experience. In addition, we’re likely to see new use cases become enabled via new modalities. Smart displays offer an entire set of applications that smart speakers do not allow for, so we’re likely to see more emphasis from developers and designers as they build for these new modalities.

Another big opportunity lies at the intersection of voice + hearables. Foursquare’s Marsbot presents one of the first truly hearables + voice assistant applications yet, and likely marks the beginning of experiences built specifically with this intersection of technology in mind. Enabling voice assistant access in the ear of the hundreds of millions of AirPod and other hearables’ owners, allows for GPS-enabled content (Marsbot) and experiences based on the users proximity. This intersection of voice and hearables is what I’ve been anticipating from the jump and cannot wait to see how this materializes over the next few years.

Ultimately, the world of voice technology is becoming much broader than Alexa, Google Assistant, Siri, and smart speakers. These things all represent the core of most people’s experience today, but the technology is advancing into new territory. While the technology is still in its infancy, this conversation with Bret helped to illuminate a wide variety of ways that conversational voice assistants are maturing and how they might serve as a magnet for many of the use cases that we currently rely on legacy technology for. Good news for hearing aids and hearables, as they represent one of the primary homes for this burgeoning world of voice assistants.

-Thanks for Reading-
Dave

Subscribe to the podcast: Apple PodcastsSpotifyAnchorGoogle Player

Transcription

Dave :

All right. We are here. We’ve made it to the first milestone of the podcast, episode 50. Thank you to everybody who has joined me along the way in Future Ear Radio. And to celebrate episode 50, I thought, I want to bring on a great guest. I want to bring on somebody special. So who better than Mr. Bret Kinsella. Bret, how are you doing today?

Bret Kinsella:

I’m doing well, very honored that you introduced me that way. I remember seeing some of your early interviews like David Cannington and other folks. And it’s really been useful for me. I love to dip it in. I don’t listen to every episode, but I definitely like, oh yeah, that’s someone I want to hear what they have to say and get caught up because you are deeper in this space than almost anybody else out there. I mean, there’s clearly a couple of other people, like, I think you’ve talked to Nick and some other folks, but you’re really deep in this and I really value your perspective.

Dave :

Well, thanks, Bret. Well, for the folks listening, I’m going to give Bret a chance to formally introduce himself, but Bret has been a really good sort of North star in terms of content creation and setting the tone of the consistency, I think, that you got to have. I’m at episode 50. He’s done, I think, like over 175. So he is the master when it comes to the consistent podcast creation and just really building a narrative that way. So Bret, before we get started just go ahead and introduce to the audience who you are, a little bit of background around Voicebot and what you do with it.

Bret Kinsella:

Okay. Sure. I guess the quick shoutout is Bret Kinsella. You can find me on the Twitter @bretkinsella. So that’s an easy way to connect with me if you’re in the community or just have questions and want to check out some of the conversations people are having in the industry. Also, on LinkedIn. So if you’re in the industry, I’ll connect with you on LinkedIn. There’s a lot of knuckleheads that I don’t connect with on LinkedIn, but anybody in the industry I try to connect with.

Bret Kinsella:

But probably more meaningfully for most folks, they’re going to be familiar with Voicebot.ai, where we run 60 to 70 articles a month just on voice and AI technologies. We also have Voicebot Podcast, which you talked about. So I’ve got, I think over 200 hours of interviews on there now. So yeah, like 175, but a lot of those are an hour and a half plus. You’ve been on and we were definitely well over an hour, when you and Andy were on with me to talk to you about hearables.

Bret Kinsella:

So Voicebot Podcast, really been great, really fun. We’ve done it every week for over three years now. Just keep putting it out, because it’s just a great way to learn. I mean, I think for me, and to pass that learning along to the interviewers, because what I do is I just ask questions and I let people talk. And I think it’s been really useful for so many folks.

Bret Kinsella:

And then we have Voicebot Research, which is a newer enterprise, which we’ve been putting out research for a couple of years now, since early 2018. We did start in 2017, but really formalized it in 2018. And we launched a formal research service so we could do more. And we did that earlier this year. So those are these long, in-depth profiles and reports, consumer adoption, technology analysis, market analysis. So that’s been good.

Bret Kinsella:

And then the last thing is, which I think we might talk about a little bit, is Voice Insider, which is sort of fun. That’s something that people were asking me to do, three years ago. They were like, “Oh, why don’t you do a newsletter?” I was like, “Well, we write every day on Voicebot.ai.” But what I figured out at some point that the news wasn’t like a great place just to do like thought pieces or just to engage in topics that were just for discussion or these little vignettes or knowledge bombs that are useful to people in the industry. So that was the genesis of Voice Insider. We’ve now had that for two years, over 100 … Or at 100 issues as of this week. By the time people listen to this, it’ll probably be 101, because we do that every week as well.

Bret Kinsella:

And that’s been really fun as well, because it’s just like a pallet every week just to say, “Hey, what’s important this week. What’s something that maybe you don’t know that would be really useful to know.” And just to share a little bit of industry insight as well. So if we think about, we’ve got the podcast, as you mentioned. We’ve got Voicebot.ai for the news, we’ve got the Research and then we’ve got the Voice Insider, which is just another way to engage.

Bret Kinsella:

And I don’t really think about it is content. I think about it is a discussion and education. What we’re in the business of is helping people make better decisions. And a lot of that is just awareness of what the options are, but then it’s also taking the time, whether it’s in Voice Insider or in the Research to step back and to say, “Okay, we have all these data points. What do those mean in the broad scheme of things?” How might that influence the way you think not just about what you’re going to do this week, but over the next quarter, year or two years?

Dave :

Yeah. I mean, when I first came around to the voice space, it was all sort of predicated on this idea that who knows, maybe one day hearing aids will be a great home for voice assistants. And lo and behold, I met you. I met just about every voice guest that I’ve had on the podcast through one of these different and in-person shows. And you really are, I think a lot of people will attribute, a lot of the way that they keep track of what’s going on through, a lot of the research that you do, a lot of those charts pop up everywhere. You see them all the time.

Dave :

But I really do like the podcast and I liked the Voice Insider so much because I do like … I like the news, but I like hearing your own thoughts on them because as somebody that is doing this on a consistent basis, I think you’ve formed this knowledge tree that has so many different branches that they all interrelate in some way, shape or form. And I love hearing a conversation that you have say with like Jeff Adams and then three months later, I hear you reference that episode.

Dave :

I just think that’s such a good way to, when you can follow somebody’s thought process in terms of all of the knowledge that they’ve been acquiring and then almost in real time, being able to connect the dots with them is just really, really valuable. And the first I guess way to kick things off here is, you … I think the first time I saw you speak you talked about phase one. And I believe you said phase one of voice assistance is more or less around proliferation, which I think you can attest is been something that’s been an ongoing thing through all of the different devices.

Dave :

And correct me if I’m wrong about phase one. And then phase two was habituation and specialization, which I believe you said that in like 2019. So I’m curious, do you consider where we’re at with voice technology today to sort of be in the same phase? Or are we entering into a new phase and what categorizes that new phase?

Bret Kinsella:

Okay. That’s interesting. I’ve been thinking about this a little bit, because I think it was actually the beginning of Voice Insider that I talked about entering phase two. Now, I’ve had a couple different discussions about this, also at conferences as well. So I probably characterize it slightly different ways, depending on the context. But as we’re talking now, let me tell you how I think about it. I think about the first phase was really the introduction. It was essentially a narrowcasting type of concept. It was a voice assistant within a context. And that context was defined by the device it was available on and the things that people are doing on those devices.

Bret Kinsella:

We had the narrow context of Siri on smartphone. We had the narrow context of Alexa on a smart speaker and then where those things went. And when I first started talking about phase two, which I still think is right, it was really about expanding the distribution across surfaces. So this idea that voice assistant would have universal access. And distribution is that term because we have a lot of technologies that come along that seem very promising, but then they don’t get distribution. Not enough people either have access to it or seek it out or then use it when they do have access to it. And so they just die in the vine.

Bret Kinsella:

And that’s really one of the things I was most interested in very early on, would there be enough entry points for people to use voice assistance so that they could actually bring them in. So that second phase that I was really talking about two years ago, as I was saying, “Hey, we’re seeing the rise of smart displays. We’re seeing other types of smart home devices. We’re going to see expansion in the car. We’re going to see more use on the phone and other places.”

Bret Kinsella:

And I think that’s largely been correct, that’s happened. Now, when I think about this idea of habituation and the deeper adoption, that’s really where we are. I would say we’re in phase one or phase 0.5 of that. And what it’s largely been is habits that people had in other areas of their life and they’ve transferred it to a voice assistant.

Bret Kinsella:

The number one example is music. I know Nick talks about that, [Vifor 00:09:50] talks about that a lot. He really ties a lot of the things that are going on to music. And it’s a useful rubric. I don’t like buy into that 100%, but there’s absolutely a significant correlation there with music because we see that on smart speakers. But I think more broadly, what we’re going to see is just the benefit of this throughout all media.

Bret Kinsella:

So if you look at other surfaces, I’ve talked in the past about having the Xfinity, that’s Comcast, it’s the cable provider for your overseas listeners, voice remote. And it’s been around for a number of years. It’s amazing. I had Jeanine Heck, who led that project on my podcast a few weeks ago. And it is such a superior way to interact with your cable television that it’s hard to even conceive of going back, if you ever had to. I don’t look at the guide anymore. It’s just my thing.

Bret Kinsella:

And I think about that from a music standpoint. If you’ve ever used Amazon Music Unlimited or YouTube Music on maybe a Google device, you’ll also see that it’s a similar experience for music. So no more tapping, typing. It’s not just like Spotify where you want a playlist because it’s set it and forget it. You don’t really have to be locked into that. That’s fine too. You can call up a playlist. You can switch playlist by voice. Or you can just ask for that thing you just want right now that’s not on a playlist.

Bret Kinsella:

A lot of people underestimated, I think how complex media search is, especially since you’ve got a lot of common words, you’ve got uncommon words in it. You’ve got a lot of recurring titles and you’ve got a lot of complexity like artists, band, genre, song, all these other things. And then when you go to visual media, it’s even more complex, although there’s not as much of it. So I think that it’s sort of like, I’ve gotten a little bit off on a tangent here, but I think if we think about the habituation, people have shifted music, at least on smart speakers, over to voice control.

Bret Kinsella:

And actually, I just did some analysis on this. Consumers that are using music on smart speakers using Amazon are much more likely to be using Amazon Alexa app on mobile because they’re using that to like get to the music or they’re using the voice assistant Alexa through their Amazon Music Unlimited app. And so when you start using it in one place, you use it the other because it just becomes a habit. And I could even say that a member of my family does that. I mean, who never had streaming music before, and then all of a sudden had it on the smart speaker and said, “Oh, I want this on my phone too.” And the interaction on the phone then became what was on the smart speaker before that. So I’d say that that’s one of them.

Bret Kinsella:

Another use case that people did is, at least some of those quick questions, that you’d always go to your phone or sit down and type it in, now people are asking much more frequently. So timers and alarms here, if people had those, they have on their … Microwave, they have them on their phone, whatever. So people are using those more by voice either through the phone or through a smart speaker, something like that. So I’d say that this phase really has been the shifting of existing use cases predominantly to voice and we haven’t expanded that much into new use cases.

Dave :

Yeah. Yeah, no, I think that’s really well said. And it’s like I know we’ve talked before about Clayton Christensen and Jobs to Be Done. And I feel like you’re right, where it’s sort of at a phase right now where we’re seeing … And I think this is a delineation between where Alexa does well and where Google does well in particular, in terms of where they’re sort of picking off use cases. With Google, if you’re in the Google ecosystem, I think you might be inclined to start defaulting to using Google Assistant to just use like as the navigation interface for your phone itself. A lot of search as well.

Dave :

I love what you said about TV. I think the TV is such an interesting use case that I don’t think it’s talked about enough because to your point, it is the training wheel effect where if for nothing else, if that becomes the default way that you search for everything that’s on the Amazon fire stick that you use, you’re still building that habit. That’s the habituation piece. But the one that I really want to kind of explore because we haven’t even gotten into specialization, is the media piece.

Dave :

I know you were going down a little bit of a tangent, but media to me, I think I’ve heard you mention this before too, that you think this is at least one of the killer use cases in the short term for voice assistance. That makes so much sense in my mind because it is something that even in the primitive state, more or less that voice is in, in a lot of ways it’s actually superior in comparison to the legacy way in which you retrieve bits and pieces of content.

Dave :

For example, the conversation that you had about Beeb, that’s really, really interesting to me because if we are thinking about media being cataloged in individual repositories, but it being narrow and deep and it being just way more accessible in these sort of, I guess, custom voice assistants tied to the media. I think Spotify is another really good example of this, of where this is going is if you’re able to retrieve information in a much more, not only a quick way, but I think just like a more robust way where you can query through a little bit more of a conversational interface, it seems like in conjunction with the fact that media already sort of seems to be the use case that people are gravitating toward with smart speakers in general, starting with audio …

Dave :

Or I’m sorry, starting with music, but I think we could see that the next progression being podcasts, and then you just sort of start to climb that ambient ladder that’s being built as it goes. That is a thread that I really want to pull on with you and just get your thoughts on this, because I know you’ve had some conversations with folks like at the BBC. And what’s your current thought process around this whole notion of media as a big entry point for people to build that habit?

Bret Kinsella:

Yeah. I think we can look at media as the example of where we’re going to have profound change. Not to say that that tells you immediately other areas of our daily lives, but it is, if we find something that looks and acts like media, in some way you can say, “Oh, there’s probably a really good application here.” Now, a lot of people know that we founded Voicebot in 2016. So that’s about four years ago, just over four years ago, at this recording date.

Bret Kinsella:

But I’ve been working in this space for at least two, three and a half years before that or something like that. So 2013. And we were working with media and it was voice interactive advertising actually on mobile. So I was really deep in that media space before I ever got into thinking about the industry as a whole, because we were doing voice interactivity, people … An ad comes up and when you’re listening to a streaming music service and you can just say, “Play video. Open map or give me directions to the store. Download now. Send me the coupon.” Whatever it’s going to be, really simple, interactive voice capabilities, but new to the world. So not something you could do before.

Bret Kinsella:

And in that case, it was just more than 10x more efficient than doing it in a visual mode on the screen, because we found in our research somewhere between 79, 83% of the people were listening without the screen invisible. So if I think about, I step back and I say, “Okay, well that was sort of one thing from an advertising standpoint within a specific context.” But when we get to media, I really liked that you pointed out was that there’s actually these new things that you can do. So there’s new features.

Bret Kinsella:

To a certain extent, I love the Xfinity Voice remote because I can just say the station I want to go to or I can say the program I want to listen to and it already pulls it up. Now, I could do that on the remote before. I could type it in. I could go to the guide. If I remember the number, I could punch it in, but it’s not as efficient. It’s much more efficient.

Dave :

The 10x.

Bret Kinsella:

Yeah, it’s definitely 10x better in that way. And it’s 10x better because there are things you couldn’t do before. So for example, I could say, “Show me Clint Eastwood movies.” And it would just show me all of the ones. And it would not only show me all of the Clint Eastwood movies, but it would show me which ones are free, which ones you can pay for, when you could watch it, which ones have ads, if it’s a third-party streaming service you have. So that’s really intriguing, that you can do that.

Bret Kinsella:

And it will also recommend other things for you. So if you say you want to watch golf or something like that, it might say, “Hey, here’s golf. And here’s other things that are related to golf that you might want to see.” There’s not that much related to golf. So maybe that’s not a great example, but there’s other golf, I guess. So if you wanted to watch rugby maybe it’ll show you Australian rules football. That may be a better example.

Bret Kinsella:

So that’s one of the things I really like about that, but let’s look at just music. So the music experience, I don’t think a lot of people have woken up to this, it’s a lot better with a voice assistant. And it’s because they’ve never tried it, because they have these existing habits that they would need to displays or at least ignore for a little bit while they try it. But this idea that you could just say, “Play something by,” at any time. “Play this song. Play the song with these lyrics. Play the song that goes 99 problems and the …” Whatever. It knows it immediately and starts playing it.

Bret Kinsella:

And the other things that if you could do through a visual interface were very difficult, but in the most cases you could not do. And we always had this benefit of like the systems, like hearing what we like and sort of … Not always, but for the last 15 years creating these new streams of music based on what you like. But now you can just like pick something and that’s your starting point. It just goes and you can change it at any time. I think that’s, from a media standpoint, a lot of it has to do with search.

Bret Kinsella:

So you’re looking at this complex catalog. And then this idea that if you’re going to search, you might actually know a little bit more than traditional search. A lot of traditional search, what you’re doing is you’re saying, “Hey, I don’t know about something. Tell me what the options are.” There’s a different type of search, which is, I already know what I like or I know what I want, but I don’t know how to get to it very quickly. And when I think about media, that’s actually one of the really big benefits of voice and this conversational UI.

Bret Kinsella:

Now, there’s other ones that are really amazing, which we can talk about later, but that’s about this idea that you can have this iterative multi-term conversation. So if it doesn’t understand, like in this case in music, it might say, “Do you want the live version or do you want the studio version?” Maybe that’s a point of clarification. That’s not usually what it’s going to ask me to do, but it could or to better serve your need. But a lot of other things, it can actually do clarifications where it doesn’t have enough information.

Bret Kinsella:

And I just look at this and I just say, “Listen, if we look at conversational interfaces and we look at media, I can already tell you that is a killer app for voice, because it is so much better.” I believe there are many others out there. We’re not seeing them necessarily yet.

Dave :

Yeah. Well, I think that it’s important though to constantly reiterate though that we’re in this habituation stage. Okay. So first came the proliferation and then came the question of like, well, what do we do with these devices? All of these smart speakers that we have, that we’ve been outfitting our homes with, our offices with, wherever. And what’s interesting to me is that when you start to … And you factor in the multimodal component of this. So it’s like, yeah, maybe my primary mechanism for where I’m building my habit is through my TV. That, as your research has indicated, a lot of people, their habituation point starts with their phone and then they actually might carry that over to their smart speaker or vice versa.

Dave :

But I think that, this point around the 10x increase is really important to point out because I think there are a lot of really exciting use cases that this … Built around media in particular down the line. I still go back to the conversation that you had with Amir Hirsh at Audioburst as being one of the more mindblowing episodes and one of the times where I really kind of started to see the bigger picture here. But that’s a little bit down the line, and we can get into that throughout the episode.

Dave :

But I think that even in today’s day, it’s these things that are, they seem sort of small in isolation, I think are really important in aggregation because they help to, just again, it’s not just building the habit, but it’s getting people comfortable with this idea of, if I go into my phone 30 to 40 times a day to adjust my music or adjust my podcasts and I start to shift that over to my voice, it’s just like you’ve suddenly moved a significant chunk of your computing to an entirely different paradigm. And that’s what can be built upon, I think, into the future.

Dave :

And I just think that’s worth pointing out, because if we’re in phases of these technologies, it’s not probably going to be something where just like overnight, suddenly everybody starts using voice assistants for everything. It’s going to be a gradual, year-long process or multi-year process of just gradually moving more and more things over there.

Bret Kinsella:

Yeah. I think that’s right. And if we talk about this idea of distribution to habituation, I would say let’s revise and extend my remarks there. There’s an adoption phase in between there. So maybe it’s distribution, I can use it. Adoption, I do use it. Habituation, I use it regularly. I think what a lot of people forget is between distribution and adoption is this concept of applications. So there needs to be some innovation there that people say either, “This is a much better way of doing what I’m doing today.” Or, “It’s something that’s new that I haven’t done before.” So there’s innovation, there’s awareness, there’s this development of applications.

Bret Kinsella:

For voice assistance as a market to be successful, for the technology to take hold and for people to value it they really only need one thing to stick. Because then that becomes the anchor, people become used to using voice assistance and then it becomes an option to use it in all these other places and then it becomes a lot easier for everybody else. So if I’m already using a voice assistant for music and then I add it to games people are going to be like, “Oh, okay, let me use it for my game as well.” Because they’re already aware that it’s a thing and they built that cognitive trust that oh, voice assistant actually enhances the experience.

Bret Kinsella:

So I think everyone building an application now, because we have in particular, I put that at a higher level than like timers and alarms, which are also very popular or a navigation is another one that’s really popular. Probably the most used of the voice assistant applications on smartphones today aside from phone and some of the other phones utilities, calls and texts. People have seen those things. Now that’s like, oh, okay, that much easier.

Bret Kinsella:

So if I’m coming up with this new idea that you can’t do today on a smartphone, for example, through a touch interface and it just happens to leverage a voice assistant as the UI, as well as call it the dual engine if we go back to the Siri terminology, it does something for us, then I’m already there. Then I don’t have to teach people to say, “Oh, there’s this voice assistant. And if you have a voice assistant, you can do this.” You can just say, “Hey, here’s a new application. You can do this.” “Oh, how do I access it?” “Use it with your voice system.” “Okay. I know what that is. You mean the same thing I use for music or on my smart speaker?” “Yes.”

Dave :

Yeah, no, I think that’s really well said. What about the specialty piece? I mean, do you think, so if this is habituation and specialization, would you categorize this as what you sort of just described there? Or how would you describe that aspect of this phase?

Bret Kinsella:

Well, to expand beyond some of these general purpose solutions, it is specialization and we’re already seeing this. There’s a number of banking apps out. Now, I can think of three off the top of my head that have voice assistance that can hold like a conversational interaction, maybe single turn, potentially multi-turn, but not that complex. But if you look at US Bank, I think he, Richard Weeks told me they had over 300 features available within their mobile app. You can’t expose 300 features through a visual navigation.

Bret Kinsella:

Now, you don’t necessarily know how to teach people that they should know that this is there, but at least if they start getting used to asking questions, they can ask for something that maybe your development team didn’t anticipate that they might want to do. And then as long as you’re NLU does the matching properly, you can say, “Oh yes, I have a feature for that.” And then go ahead and start to execute the request. So when we think about that specialization within the mobile app, a lot of people are telling me … I don’t think … This is less common than it used to be. They’d say, “Oh, it’s going to be all Siri or all Google Assistant.” Or … Fewer people said all Amazon. But they said that was a possibility. It’s going to be this one assistant to rule them all.

Bret Kinsella:

And I’ve always said, “No, I think they’re going to be more like applications.” We use multiple applications and it’s okay that they have different UIs, they’re specialist, they do something really well. Because these general purpose solutions ultimately can’t specialize enough-

Dave :

Exactly.

Bret Kinsella:

… to fulfill every feature I might have like in banking. Or if they did, they would do sort of the generic, maybe it’s a parade over 80% set. And then there’s like this 20%, which maybe capital one really differentiates on and they can’t just wait for Apple or Google to release that feature because then it just commodities what they do, which might be what the general purpose is and they want to do. But they know more about their business. If you look at Suki and Saykara in the healthcare space, which I think you’re familiar with, they know more about what doctors are saying and doing with patients or orbit of health and those types of things.

Bret Kinsella:

If you think about these broad and shallow solutions that the general purpose assistance are, it’s really valuable. A lot of people are going to be using those. But they can only take a few wells. We see this in every application category. Every time someone comes out with a one-stop shop solution, someone else comes with a bolt-on, which is a specialist. So I think that’s where we are. I think we’re starting to see that a little bit for development. That’s really where a lot of the action is.

Bret Kinsella:

People thought they were going to do it as an Alexa Skills or Google Actions. Some of them tried it. Some of them were dissatisfied. Some of them tried it and like, oh, okay. I haven’t seen anybody who said, “Hey, this is just like the most amazing thing.” A lot of people said, “Hey, this is good.” Some people said, “Not good.” But I’ve yet to come across someone who’s built their own and said, “Oh, this was a waste of time.” They’re generally thinking, hey, this is really valuable.

Dave :

I think, not to pump your tires too much, but you were pretty prescient with you called your shot. You said that it was going to be habituation and specialization. You’ve had a big emphasis, I would say, over the last six months on the podcast, largely around this theme of customer assistance. And to your point, this idea that it’s not … I think it’s become pretty apparent that we are entering into a world that there’s a multitude of voice assistance. And I think that’s actually a really good thing, going off of what you said about Eno and some of these different banking apps, I’m not sure if it would ever really fly for me to ask Alexa about my banking information, to go retrieve it from a voice … You know what I mean?

Dave :

I feel like that there are so many complications with just having one assistant. So in addition to being able to have specialty assistants that have these deeper wells of very specific not only contexts, but also like one of the big learnings I’ve had in speaking with people like Jeff Adams and some of the different groups that Cobalt’s worked with is that they have very, very specific lexicons of terms. When we’re talking about NLU engines that are like when I was interviewing Bret … Or I’m sorry, Bruce Rasa with AgVoice, he’s got lexicon that’s all built around the scientific term for different molds that grow on berries.

Dave :

And you can apply this to the world of banking, the world of hospitality, you name it, all these specialty niches that all have sort of their own jargons, medical is another big one. I think that’s a really important piece to this too, is that everybody’s sort of like if you’re building your own custom voice assistant, you have the ability to build your own NLU engine that’s running on top of that with its own lexicon. And therefore you can create like these highly, highly accurate systems.

Dave :

So my question is across these custom voice assistant podcast conversations that you have had, in light of what you said, where we are maybe moving more toward a world that is … And I’ll be curious to get your thoughts, if you do still kind of consider there to be echelons of assistance. So the general purpose ones, like do they truly sit on a level above and are they kind of like “master assistance”? But first, before we get into that, with the custom assistance, what have been some of the things that you’ve seen that have been really interesting to you across some of these interviews that you’ve been focusing on lately?

Bret Kinsella:

Okay. Yeah. More than 15 episodes this year have been on that. We do somewhere between 50 and 60 episodes a year. At this point it’s probably three quarters of our episodes have been … Or no, it shouldn’t be three quarters. More than a quarter of our episodes. So 25% probably or 25, probably to 35, probably a third I would say at this point, because we still have a number of episodes left this year. In any event, the reason I di that is because I thought we needed more attention to this topic. And the other reason I did it and I didn’t do it in previous years was there are just more examples.

Bret Kinsella:

In the past, I couldn’t have put together … A year ago I don’t think I could have put together 10 or I think I did 14 straight or something like that. I don’t think I could have done that. I maybe get seven or eight or I would have, in order to get to that level, I would have had to dip into things that weren’t really that fully baked, because I try to bring people on sometimes with early research, but I try to make sure that they’re out there and have something already working so that we can look at it and see if it’s real and not just an idea.

Bret Kinsella:

But if I think about what’s interesting, first of all, I’d say that … You read Voice Insider so you might remember when I put out the GOWN framework, and that would have been more than a year and a half ago. I think it was early ’19, might have been even ’18. What I was trying to do is I was trying to basically open people’s eyes to say, “Hey, we talk a lot about Alexa and Google Assistant, but you know, Siri, it’s not the only game in town. There’s other things that are happening.”

Bret Kinsella:

So then in that GOWN framework, I’ve got this general purpose category, which is Alexa, Google Assistant, Siri, Samsung Bixby, et cetera. We’ve got the O is the owned. So that’s like Eno or Erica from Bank of America. We’ve got these white label, which are all the tools that allow anyone to sort of add voice assistance to their solution. That could be SoundHound, [Acudo 00:35:50], Microsoft Bot Framework, all those things. And then we have these niche, which are specific to a task. So it’s almost like Uber is specific to like getting a car. They’ve designed voice system specific to your back office as a small business or recording transcripts from your meetings.

Bret Kinsella:

I sort of laid that out and said, “Hey, this is a much bigger world than a lot of people are thinking about it.” It might be more obvious to people now, but back then, it was just like everyone only wanted to talk to me about the big tech companies and their voice assistants. I say, “Yeah, I don’t think that’s where we’re going to end up. I think that’s a place that’s been very important because it’s focused innovation. It legitimated the market and made it a lot easier for all these things to come underneath it.”

Bret Kinsella:

So this year we’ve had a great time talking to a lot of different folks. I just had a great conversation with Monica Lam, Dr. Monica Lam from Stanford. They have Almond, which is the open-source voice assistant. I had Alan Nichol from Rasa, another open-source voice assistant a couple months back. He’s the CTO over there at Rasa, a really fascinating, a lot further along than something like Almond and really widely used. I think people are going to be surprised about Rasa because it’s so widely used that people don’t even recognize like how big that company, how much momentum they have. I think those things are very interesting from a tooling perspective.

Bret Kinsella:

The healthcare ones are certainly interesting. We just talked about Suki and Saykara. Those sorts of build out a much longer heritage of voice interactivity and healthcare that really was pioneered by Nuance. We’ve had Orbit on as well. I think in that space, I think that’s worth pointing out. But I will say that I think the media ones in some ways are the most interesting to me, because they’re the ones that are so obvious. We just talked about Comcast or Xfinity Voice, for TV search and control. We had BBC one with Andy Webb. And Andy Webb was one of my favorite conversations.

Dave :

Yeah. That one was really good.

Bret Kinsella:

And it’s not necessarily because of any … Yeah, I don’t know if there was anything what he said, but what I liked about it was he was articulating a way forward and how you think about using a voice assistant. And they’ve built an architecture that goes beyond just the single point solution deployment. So there’s so that the Beeb, their voice assistant, will create commonality no matter what platform you’re on, whether you’re accessing it through Alexa or through their own mobile app or through a smart TV or those types of things.

Bret Kinsella:

So the Beeb is really this capability in the backend that if you’re in a BBC property, you’ll be interacting with … The front end will be the Beeb, the voice assistant you know, but if you’re using Google Assistant, it’ll essentially be the same thing. You’re just going to hear a different voice because it’s the engine behind it that’s going to be the same.

Bret Kinsella:

What I liked about that is they looked at voice as more of a strategic capability. And they said, “Hey, we could build this this backend once and then we can plug in all these different front ends.” And for a media company, like theirs, it has such a rich and deep catalog product. It makes a ton of sense for them to do that. And I contrast that with someone like US Bank that has like a lot of customer content. And in that case, it’s not the same thing as they’re just trying to surface amazing content from their back catalog. There’s trying make a more efficient process for people to execute their transaction.

Bret Kinsella:

In some ways it’s like, it doesn’t have to be as robust. It just has to let you get the job done. It just happens to be getting the job done with banking as a different expected experience or a desired experience by the customer than when you’re trying to interact, find, discover or engage with media.

Dave :

Yeah. The Andy Webb one and just Beeb in general, I think that was one of the moments I think over the last few years, that really stands out in my mind. Another was, a lot of it’s like the plumbing that I think is really interesting. If you recall, with NPR, they made it so that with any smart speaker, and it was a lot of like really clever coding, you could just say like, “Hey Alexa, play NPR.” And for it to register your location and feed you the affiliate NPR station. Again, these are like small little details, but I think they’re really meaningful because again, it goes back to the whole habituation and specialization piece where suddenly, if I like to listen to NPR and I had traditionally been listening to it through my radio in my house or in my car. And then like now I’m accessing it through my voice. It’s just one more thing.

Dave :

And the other thing with Beeb that I found really interesting is that like, it’s such a robust library of news and in content that you … There’s so much that it lends itself really well to the idea of being able to query for things. So being able to just, rather than just say like, “Launch BBC.” You can actually have like that turnstile conversation to get to what you’re actually looking for. And again, it goes to the 10x thing where it’s like, the traditional way of navigating through all of that content that’s constantly being replenished starts to become kind of daunting.

Dave :

Again, from my own perspective, I’m looking for where are those 10x experiences right now to continue to build those habits. Because if we … Just in the same way that we had the proliferation that enabled the habituation, I feel like the habituation enables, like you said, this idea of, okay, maybe a voice modality on any given game suddenly makes more sense. So it’s like the chicken and the egg thing in my mind.

Dave :

And what’s fascinating about the custom voice assistance to me is that we’re seeing that basically any company can have some sort of existence, has its own voice assistant. And one that I’ve been thinking a lot about is, like I want to see this open up to the masses and I don’t know how this will happen, but we had an interesting Twitter exchange not long ago where I was basically advocating that I think that we need to have a Google ask moment.

Dave :

You had some really interesting caveats to that, but more or less I said that I look at companies like Voiceflow and some of their predecessors like a VoiceXP, whatever, this idea of, okay, so if we have these engines that are building the websites or the voice sites for any given company, any restaurant, any service-based company, any company like mine, Oaktree products, how do we manifest into this new economy of the web?

Dave :

And what I’m curious about is what you said, this automated interactive presence. That was something that you had said, like you thought that was going to be where things are going to start to get interesting in this regard. And I just want to flush this out a little bit, because again, if we have the tools, and right now they’re just limited to Alexa Skills and Google Actions, but I do kind of think that, that this is going to become a broader ecosystem. I don’t know how all this is going to work, but I think it’s really important when you have no code type tools that any company can deploy or agencies can get ahold of and they can make it part of their suite.

Dave :

But I think that this idea of empowering people with the ability to have assemblance of their online existence in a voice modality is really, really intriguing to me. And I feel like the first movers here are with the big companies that have the capacity to build their own voice assistant. Even if they’re not super robust, they’re still their own entity. How do you see this expanding into more of like the general population, if you will, like anybody having their own voice presence? Might be really loaded.

Bret Kinsella:

You said a lot there, so. All right. So let’s see. First thing I want to talk about is adoption.

Dave :

Okay.

Bret Kinsella:

Okay. So I like history. I like to look at the way things have had in the past. I don’t believe that every historical precedent is useful or is going to be replicated in exactly the same way, but it’s a good frame of reference. If I think about the early days of the web and I think about the early days of mobile, it was not corporate content that got people there. The corporate content, the brand content tended to be the long tail. Very few people used it, but when they did use it, they liked it. So it was good to be there.

Bret Kinsella:

So I think, if I recall that Twitter exchange, there were a couple of things around the analogy that I didn’t think it was quite right, but the thing that I was really thinking about more was does it matter if Nike has a voice presence or not? Does it matter if Goodyear does? Or pick any company, Chase or maybe something more fun, Under Armour, the CrossFit games, Spartan races, whatever you want to say. Music artists.

Bret Kinsella:

What I think we would see as you say, “Yes, we want that.” Because breadth of content winds up being a really big benefit. But we do potentially get into a situation that we had back when we were building the first internet net markets in the 1990s. And that was that when you have a two-sided markets with high fragmentation on both sides, it’s like hard to get critical mass because nobody really needs to be there. And this is the same problem you have with someone trying to create, let’s say like arrival to Facebook.

Bret Kinsella:

Because they provide it. They can put the utility out there, but then like, what’s the rationale for people to join? And I think that that’s the first thing that I start with, is say, okay, so if you’re looking at adoption around like a new technology like this, what are the reasons that everybody’s going to go there first? And then while they’re there, can they then benefit from all these different things?

Bret Kinsella:

Because let’s say like, this is very common, if you’ve got a number of features of an application, everyone uses three of those features. So let’s say you’ve got 100 people, all 100 people use three features. And then of that, you’ve got the next, the fourth most popular feature like 50 people use and the fifth most popular, 27 people use. And then you wind up getting out to like the 10th feature and like three people use it. There’s of that three then, 10 through 150, like three people use it.

Bret Kinsella:

So if we just step back and we talk about sort of this adoption and we think about where things are going, it’s like you want to see a lot of people coming in for a couple of things, come for the music, stay for everything else. Or come for smart home, stay for everything else. Maybe that’s the thing. So there’s multiple ways that people get it. So that’s the first thing I would just say, is like … So I’m a little bit … And I think someone was talking about this idea online too. We just need all these little things, and I say … Because I think, I guess they had said something like a Pokemon Go moment or something like that. Or maybe I was probably repeating something David Beisel had said in the past, but something like that is useful.

Dave :

Yeah.

Bret Kinsella:

Because it just focuses everyone’s attention. And then once they’re there, they start dabbling in other things. And the interesting thing is, and we can take this back to what we were talking about earlier. So augmented reality, it’s with Pokemon Go, the game that a lot of people will be familiar with was the first introduction for many people to augmented reality. And it was highly popular, normalized it for a lot of people. We have a frame of reference now, which is great.

Bret Kinsella:

The problem with AR is it doesn’t have distribution. So you can’t really have … You can have adoption of there it started with specialization. So you can have adoption. So that game had distribution because it was on smartphones. But AR as a concept doesn’t have distribution. So AR, to me, is a feature of other things, as opposed to its own thing. There’s no concept of like a general purpose to AR, at least that I can think of. There’s like capabilities, like location.

Bret Kinsella:

I was just talking to Dennis Crowley of Foursquare about this. So he’s going to be on my-

Dave :

Nice.

Bret Kinsella:

… my guest. He’s coming up shortly.

Dave :

I wanted to ask him. This is good sign.

Bret Kinsella:

We can talk about that, but I’d say that we need some of these things that capture people’s imagination. And then we have this evolution. Okay. So that was one thing, you talked about adoption. There were a couple other questions baked into there, but I will just stay just on specialization. There’s this whole idea of tribes when you think about adoption? There’s one thing, this general purpose thing that like it cuts across tribes. Tribes are the affiliations we have in life. But there’s this other thing that happens is that you have lots of different tribes who then adopt something because it serves a specific need, it might serve a different need for them, but then eventually they’ve all adopted it.

Bret Kinsella:

And if you think about voice interactivity as a whole, that’s certainly this idea of interactive engagement, voice interactive engagement or conversational engagement is ultimately I think … We had this moment with Alexa, then with Google Assistant. We had a previous moment with Siri, but it wound up being very confined. We had this sort of new moment with Alexa and Google Assistant. We’re seeing this movement around media and music. So all those things are consolidating.

Bret Kinsella:

And that’s created the space for someone to create their own custom domain and specialty assistant, and for them to not have that friction of people saying, “What is this? I don’t understand this. It’s mind blowing. I’m afraid of this.” They say, “No. Okay. This is just like using Alexa or Siri. I can use it here.” And what I wrote about it, and this was about a year and a half two, and I like to break this one up because I was off by two orders of magnitude. I said, “Alexa and Google Assistant will be known as launching 1,000 assistants.”

Bret Kinsella:

And it might be three orders of magnitude, but I think it’s at least hundreds of thousands of assistants, probably millions of assistants, because what we’re thinking about, and this goes back to … And I’ll tie this all up now, for those of you looking for how this thread ties back to the original question. We say, “Okay, I’ve got all these different people who are using voice assistance. And I’ve got all these different people now adopting because of specialization. Where did that start?”

Bret Kinsella:

Well, they looked at Amazon and they said, “Well, Apple can do this. Google can do this because they already have this installed base.” Oh wait, what did Amazon just do? That’s crazy. They just created a new product category. And this engagement that they’re getting with their customers is something that I would like. Now, it might’ve seemed out of rage that got 10,000 people working for them and all these different things. But it turns out that there’s a lot of people building tool sets. And so it’s not just like voice flow to work on to Alexa or Google, but it’s these other people, whether it’s SoundHound or [Acudo 00:52:55] or Spokestack, or any of these folks who are building these tools to allow anybody to deploy their own assistant in like a reasonable timeframe with reasonable level of complexity.

Bret Kinsella:

What they did is this idea of they saw voice interactivity was really valuable. They’re like, oh, I should be able to have voice or conversational interactivity with my customers, or I should be able to use that to extend the feature set on my service offering. How can I do that? I would really like to do that. And then it just happened to be that there were people who were coming up to try to make that better. I don’t know if I tied that all together yet, but hopefully-

Dave :

No, I think you didn’t. I think no because I asked a poorly, very long-worded question, but my question was the gist of it is it’s all building off of what you’ve been describing here, which is like this phase that we’re in. And as the whole process of using your voice to compute becomes more habituated through different entry points, I think that sort of incentivizes the broader ecosystem. And then the question becomes, what is the broader ecosystem? Is it confined to Alexa Skills or Google Actions or are those just call them super mini ecosystems, but you have something that’s bigger than that.

Dave :

And that’s why I think Marsbot is really interesting to me. And I’m sure you’re going to have a great conversation with Dennis, because again, it goes back to how does the restaurant that’s on a busy street, what does their parallel to a website look like in this world? And what I think is interesting about Marsbot is you kind of are looking at a way that in one capacity you can have that manifestation.

Dave :

Maybe it’s just, I’m going to have the specials of the day that are read to you by the chef or it’s more or less user-generated content about that particular restaurant. But again, in this world where not only is there distribution in terms of the voice assistance and using your voice, but you have all the different modalities that go along with it, whether it be you’re driving in your car that has the … Marsbot’s activated through the Apple CarPlay or it’s through my AirPods, whatever it might be.

Dave :

I guess what I was getting at initially was how do you see the small guys, like the SMBs of the world, entering into this space in a meaningful way where it does sort of become its own internet or something that resembles the internet in terms of everybody has a website kind of moment?

Bret Kinsella:

The first thing I’d say is everybody doesn’t need to have a website, like a voice site.

Dave :

That was a great example. Because you said in the caveat, you said the whole … The shift from analog to digital. So it was necessary to create a website back then. Now a lot of this information already exists on the web.

Bret Kinsella:

That’s right. Now there’s a couple of different modalities that we could use for sort of everyday companies, particularly restaurants. So restaurants are a really interesting example. When’s the last time you went onto a restaurant website and you’re like, this is a really good website?

Dave :

Very, very rarely.

Bret Kinsella:

I don’t think I ever have. I’m sure I have, but I don’t think I ever have.

Dave :

But I do like to look at the menu.

Bret Kinsella:

Correct. That’s what you do. So let’s just start with the fact that websites have terrible … Or excuse me, restaurants have terrible websites. I think that’s generally true. And it can be hard to get to the menu. The menu usually comes up as a PDF. If they have it set up so you can actually order or search or those types of things, it’s usually like too hard to see everything, the trade-offs are series of terrible outcomes. Okay.

Bret Kinsella:

So what does that mean? I’d say that in this era, a lot of people are thinking about ordering takeout. Okay. So there’s two different ways that that winds up happening. If it’s a favorite restaurant, they could have a website or a mobile app that you go to and they have a voice assistant inside like Domino’s has for Dom. And again, do multi-turn interactions with you, tell you what the specials are, what’s available. It can fill in the blanks if you don’t tell them all the information, like what size pizza do you want? Not just what toppings you want, those types of things.

Bret Kinsella:

So that’s pretty straightforward to do inside your app. And I think anybody with an app that’s trying to drive people to that, whether it’s Starbucks or Domino’s or the deli on the corner, they should figure out a way to deploy some sort of voice assistant capability because it just, it’ll help them, their sales would go up. I think a lot of things would be good for them.

Bret Kinsella:

Now, there’s this other angle, which is having the general purpose assistance, which probably makes more sense for most of these organizations, whether it’s Google Assistant or Siri, let’s say those are going to be the most popular. Allow them to deep link into your solution. Now, right now they’re somewhat limited. Actually you can use Siri for banking. It’s one of the domains they support. If you think about like RBC, but you can’t do a lot. You can find out things like your balance. I don’t even think you can transfer money with it.

Bret Kinsella:

If they had their own assistant, they could do all these different things, but you can go from outside into RBC and do that. I’m really interested in Google Duplex for web because anybody who has a website, then it would enable you to potentially use Google Assistant to go pull information back in this conversational way and then also fill in a form, which is essentially what Duplex for web would do.

Bret Kinsella:

So that might be something … I think we’re a couple of years away from that being anything that restaurants could use. But it would seem like a really good option. We’ve got this one option. If you’re Domino’s and people are using it a lot, okay, your path is really straightforward. And then we have these other people that you want them to be able to use their general purpose assistant to get in there. And I think that’s a little further off, but I do believe that we’ll get there, some basic features to be able to be done.

Bret Kinsella:

The one thing I think is just important to point out about restaurants is … Two things. One is if you’re going to order, ordering from restaurant is actually really complex because there’s additions, substitutions, subtractions, and that’s hard to manage. There’s also this idea of comprehensiveness. So you have to be able to go through all these different steps. The other reason people want to interact with a restaurant is what you’d want to do is they just want to find out information, so what’s on the menu. And that’s not a great experience listening.

Bret Kinsella:

So then we’re thinking sort of, it may be voice request and visual response, unless you have like a single shot, like specific requests to say, do they have this dish. Osso Buco, is that on the menu tonight? That’s something where it could, say, “Yes, it is. It’s available.” And that would be really … But I think just this general browsing idea. Audio is not good for browsing. There’s this idea of an audio browser, but it hasn’t really taken off. I’d say the closest to that is Audioburst, which you mentioned earlier, but that’s not really just browsing because it gives you sort of complete content as well. It’s just this idea that you can go deeper, that’s the browsing concept of it.

Bret Kinsella:

So that’s it. And I just, well, I should let you jump in, but I will just say on Marsbot, I think it’s really interesting because we haven’t really talked about hearables.

Dave :

Yeah. For the hearables podcast.

Bret Kinsella:

But that’s a application that really can’t exist without AirPods.

Dave :

Yeah. Yeah. I mean, the whole Marsbot thing, what’s fascinating in my mind is similar to smart speakers, you had the proliferation, and then now it’s like, now what? And this is what I’ve been waiting for is that when you have 100 million AirPods that are out in circulation and it’s just climbing. I mean, now we’re going to be entering into … I think 2021, we might actually see 100 million sold in 2021 alone. So the question is like, well, what gets built on top of this?

Dave :

And I think that we’re … Foursquare’s a really interesting company that’s like kind of like comeback from irrelevance to … Because when I was in college in like the 2008 to 2012, it was actually a really cool, hip app, like checking into bars and I’m the mayor here and all that. And then it kind of fizzled out, but now it actually is relevant again because we’re all … Geolocation, I think it’s sort of the parallel that I see as being the camera was to the smartphone. It unlocked tons of new use cases that only that modality could do.

Dave :

I think, in a way GPS having audio fed based on your proximity opens a lot of really interesting doors, especially if they’re going to open the door to UGC. Again, I just think about like the generation below me as just running wild with it. I have a lot of different thoughts on this, but I think that the common theme that I keep coming back to is, with all these things, the proliferation has now happened. Since the time that I’ve met you, we’ve only seen it be like just straight up into the right, in both smart speakers, smart displays, hearables, all of the modalities that the voice assistant can reside.

Dave :

And now all of the questions and the thoughts that I have in my head are all pertaining to this idea of like, what do we do with this? The footprint exists to build on top of. And I kind of feel like the Alexa, I kind of think that … My own feelings are that Alexa was a little bit of a head fake. I think that ultimately Jeff Bezos just wanted to get us to buy more things through Amazon and through Prime. And I think Alexa is, especially given that you can tie it to Amazon Pay, it seems like that’s probably the number one incentive that Amazon has around Alexa.

Dave :

Google Assistant is like the next phase of the interface that you use. I mean, that’s the delineation I see between Android and iOS that’s shaping up, is that why own an Android phone over a smart or an iPhone seems to be more and more Google Assistant. So those two things … And there are ways that I think like this whole … There are ecosystems that can be built around it. I think you’re really spot on about Duplex and how that might be this solution for the conversation element for all the different businesses out there.

Dave :

But what are your thoughts on just this sentiment that … I mean, we’ve already talked through a lot of this, but I guess as we kind of wrap up, we’re entering into another … I feel like this phase is maturing now. People are becoming more comfortable with using their voice for different things. And I think that, as we’re seeing with the custom assistance, some companies now are sort of chartering their own path. And they’re saying that like, we’re going to just have this independent conversational experience that you can have through your AirPods or through your smart speaker or whatever.

Dave :

But I wonder, if it is Alexa and Google launched … They’re the Helen of Troy and they launched the 100,000 voice assistance or a million voice assistants, does it just become a world where that’s the evolution of the app economy, is just that you assume that just about every mobile app, it has a layer of conversational assistant to it? Because I think you started off the conversation by saying that they’re more like applications. So what’s your thoughts in terms of like the next three years, if you will? I know part of it, we’re probably still in this habituation and specialization phase, but how do you kind of like see this maturing now that we’ve had what’s transpired this last year or so?

Bret Kinsella:

Well, I think it’s interesting. A lot of people like to look at calendar decades. And if you were to look at calendar decades, you would say the decade between 2010 and 2020 was the mobile decade. But actually, mobile was a revolution before then. So you might say that it was the smartphone decade because it really, in many ways, 2000 and 2010 was the mobile decade because it just happened to conclude with an explosion of smartphones.

Bret Kinsella:

But if we look at the iconic smartphone that was introduced, being the first iPhone. If you put it into a decade, if you’re willing to use that metric, you have 2007 to 2017. Let me see, what did the first hearables start to come out? ’13, ’14?

Dave :

’14 was when [Braggy 01:06:25] launched the Kickstarter campaign. And that’s according to Nick Hahn, that’s when the hearables era began, was 2014. But AirPods came out in Christmas of ’16 or ’17. So they’ve really only been around in mass for about four to five years.

Bret Kinsella:

Right. Okay. What I’m thinking is, we said the Amazon Echo came out in ’14, sort of a limited release, really ’15. And then we had Google Home in ’16. So we’re talking about the same period. What I think people aren’t talking about is we’re in the middle of the audio decade. So this idea of mobile screens and all these things, but you also look at that’s when, I think it was maybe ’12 or ’13 when Spotify first hit two 20 million subscribers, something like that.

Bret Kinsella:

Streaming music had been around for a decade, but it had been largely confined to desktop and was just starting to become a mobile an all day type of thing. So if you look at the last five years or so, whether you want to say five or three or whatever, with the rise of streaming music, with the rise of podcasts, which has come along at the same time, the content part of it, and then the rise of the devices, hearables and smart speakers. We’re in the midst of the audio decade.

Dave :

Yeah. For sure.

Bret Kinsella:

And it’s probably not lost on people in your industry because they’re like, of course, but maybe they’d say it’s been the last, in the audio a quarter century, as far as they’re concerned. I don’t know. But I would just say that if you just look at broad consumer trends, particularly around technology, we’re in the middle of that. So I expect that to continue. And I don’t think we’re anywhere near where we’re going to be. If you look at something like Marsbot, I don’t know if there’ll be successful, but they actually have all the ingredients to make this work if it’s something consumers like.

Bret Kinsella:

It’s something you couldn’t have done before. If you had hearables, it really wouldn’t have been practical. They’ve got a really interesting set of policies and UI formats, user experience formats that they’re trying to lay out for that. And it’s going to go beyond just the thing you walk by and telling you something about. It’s going to suggest things like around the corner that you might not normally see and those types of things. And if you think about it, that’s the type of solution that needs a hearable to be successful. It needs that audio to be successful.

Bret Kinsella:

But once you add the ability for the user to interact, to have a conversation, to ask more like, because they limit it to five seconds of information, this is what you get. But if that intrigues you, what are the hours? When did they open? Do they have a reservation available for tomorrow at 6:00? What’s the next closest restaurant like this? The same thing. So there’s all these other things all within the capability of someone like Foursquare to actually fulfill, then becomes really interesting.

Bret Kinsella:

And this is one example where voice takes things to another level and actually not only creates a broader capability, but enables them to add a service that makes it far more meaningful because I think over time you might get kind of annoyed about this thing, it just tells you, it tells you, tells you, then you have to pull out your phone to do what you want. But you just say, “Oh, just that quick thing.” You say, “Oh, just tell me more about that.” So that’s where I would say that I’m expecting things to go over the next several years.

Bret Kinsella:

Voice as a capability, conversational AI as a capability, this interactive capability that people want to be able to provide and consumers want to be able to have. Because, I mean, static web pages kind of look kind of … They look a little quaint at this point. You look at it. You could only do what they’ve provided you a link to do. Same thing in mobile apps, when there are other things that those providers have that they could provide, could give to you, those companies, media organizations, whatever, but they don’t have a way to expose that to you.

Bret Kinsella:

With conversation, everything becomes immediately available because it flattens out your UI significantly. So that’s really where I think it’s going to happen. I think if you look at that seminal moment, Alexa really put this out there for us in a way that Siri set the table and then I think Amazon started serving the meal in many ways.

Dave :

Yeah, no, I couldn’t agree more. Before we wrap up, I do want to ask you one other question. We haven’t gotten into it at all, and I just haven’t heard your thoughts on it lately, is Siri. You just mentioned Siri so I want to just really quickly dive into this a little bit. What are your thoughts? Because I know that there’s a lot of frustration among the developer community and the design community about where’s the third-party ecosystem. However, I do think that we can’t dismiss Siri because of the fact that you have AirPods, you have the Apple Watch, you have the iPhone. So they have a lot of the pieces there.

Dave :

A lot of what I’ve read, it’s not as if I’m in like tied into any like insider circles or anything like that, but that John Giannandrea, like he is the real deal and that he’s just in the process of kind of implementing his own take on Siri. So I feel like maybe we’re going to see something come the WWDC 2021. This has been kind of an ongoing joke is like everybody’s been saying, “Oh, are we going to see something that comes out.” And nothing ever happens. So maybe I’m just like optimistic to a fault, but what are your thoughts on Siri? Do you think that there’s anything that’s happening there that’s interesting? Or do you just think that there Apple’s just like inherently not that interested in Siri and the larger ambitions that maybe it could have with a voice assistant to the iOS ecosystem?

Bret Kinsella:

I think Apple can be serious, but have different ambitions. If you look at what Amazon has done or Google has done, Alexa and Google Assistant are actually the service. That’s the end point. Well, Siri has never been the end game for Apple. They’re really focused on that as a feature that helps them sell more devices. So that’s how they launched it with the iPhone 4S. That’s the way they’ve looked at it. And there’s been other things that they thought were more valuable in the interim.

Bret Kinsella:

Now, ultimately they may have decided that it’s something that could long-term hurt them if they’re not more competitive or that it does fit with their new services-oriented strategy, because it would seem to align really well with that. So I do believe that we will continue to see more from Apple along those lines, but I wouldn’t expect to see the same type of thing from Amazon or Google. That’s really the first thing that I would say.

Bret Kinsella:

The second thing is I do believe in 2021, we will see something. I told people well before WWDC in 2020, that they would not see anything this year. And they didn’t. Well, we saw some UI updates to Siri, but that was about it. So it was more cosmetic. It takes a while to integrate some of these technologies. I think people underestimated how much work needed to be done to re-platform Siri from the legacy architecture that was built in 2009. And it’s been scaled up over the years and really wasn’t designed even initially to handle the type of below they put on it, but certainly not the breadth of depth of activity that you would have today with voice assistance.

Bret Kinsella:

I believe they are, from conversations I’ve had, they are making significant updates to it. It’ll look different and more capable going forward. It will still be different than what we’re seeing from Amazon and Google, just because their ambitions are far more narrow, at least today. Eventually that might change, but they tend to be okay incrementalist from what I can see. I will say I just finished this report up. So people can go to research.voicebot.ai. They can check this out.

Bret Kinsella:

One of the things I laid out is I said, “Hey, if you look at smartphone based voice assistance, there’s four things to keep in mind.” So there’s the hardware elements. And in this case we have the smartphone and then we have all the peripherals and hearables or watch would be in that category, glasses coming up. And then we have the services layer, which would be the general purpose assistance. And then we have custom assistance.

Bret Kinsella:

So if you think about those four quadrants being really the holistic view of voice on smartphones, Apple is a killer in three of those areas. They’ve got a heavily used voice assistant. They have the alternative platform for voices for smartphones in the world. They have the most lucrative, but anywhere from 15 to 25%, depending on where you’re looking at, certainly half in the US, 50% market share. And then they have the most dominant peripherals. So watch and AirPods. It really gives them a significant structural advantage.

Bret Kinsella:

Now, where they’re lacking is they’re lacking in this capability for integration into iOS apps. They do have some. So we talked about RPC earlier. So SiriKit does have a few ways that you can order a taxi and Uber, you can check your bank account on RPC, those types of things. But they actually have the same gap that Google has. And Google now with Google Assistant App Actions is really a step up from where SiriKit domains are. And I think that’s going to really expand Android integration with Google Assistant.

Bret Kinsella:

But if you look at Google, then you’re saying, “Oh, well, they’ve got the great … The OS leadership. They’ve got Google Assistant,” which is excellent. They’re okay in the peripherals, sort of number two or number three in those categories, at least Android compatible. And then they’re sort of … They’re not yet there on the application side. So in some ways I think Apple’s been okay and if they think of their core franchise as being mobile, they haven’t really given that much up in the interim because Google hasn’t like run way out in front of them, to date.

Bret Kinsella:

So I do expect us to see a lot more voice in apps, but even with that, as we are going back to where we were talking earlier, Google Assistant and Siri going into apps is not going to be that full rich experience anytime soon. I think it’s going to ultimately be custom assistants are being built. The one thing I will say that would really make things different is if you think about Google Assistant App Actions, they allow you to deep link into the app, really underestimated in terms of how valuable that is, because then you can use Google Assistant to actually go to tasks, not just open an app like you were talking about earlier.

Bret Kinsella:

So that’s a really nice thing, but once you’re there, you can’t do multi-term conversations. You can’t access other features after you get into it. There you’d need to start using your custom assistant. The PullString acquisition may well be because Apple wants to have you allow to have Siri, not just to be able to deep link, like you would with a shortcut, for example, into an app to execute a transaction, but also so that once you’re there, you can keep doing these multi-term capabilities and that there’s some way then that they’re segmenting off Siri so that you can customize it maybe with a different voice, maybe with different logic flow or workflow builders and those types of things so that you can actually have that custom assistant and you can create this end-to-end solution where when you’re in Siri at the sort of native platform level and then you go into an app, it’s like seamless and you can go deep as well as go broad.

Dave :

Yeah, no, really well said. I always kind of forget about the PullString acquisition. And I agree with you, it could end up being a really big deal down the line, particularly like you said, in terms of accessing the apps and kind of siloing off Siri in certain portions of time. The other acquisition that happened right around that time was the Dark Sky one. And I do kind of wonder if these were kind of congruent. They were thinking like, well, weather is a really big application and what better of an app than Dark Sky, where you can get like a really awesome experience through Siri.

Dave :

So part of me often wonder is if those two were maybe like John Giannandrea inspired and maybe they’ll manifest down the line. But regardless, I agree with you that it’s a little too early to tell, but I think, like you said, there’s, from a structural standpoint, with all of the infrastructure in place, it certainly seems like all it would take is a really awesome update and then suddenly Siri is really, really relevant to this whole space.

Dave :

And maybe what that means is that as a developer, we’re back to developing for mobile apps, but just through more of a conversation element to them or something along those lines. Or maybe it’s the custom assistant piece that Siri works in tandem with, and it’s the next iteration of the app economy, is it becomes sort of like the voice assistant economy.

Bret Kinsella:

Yeah, absolutely. And I think that that’s likely going to be the way that it builds out. But keep in mind that Siri is specific to the iOS ecosystem. So if you’re going to build for that, you’re only building for that ecosystem. If you want to build for Android, you’re going to have to do it again and then you’re going to have inconsistency. And so this is one of the hypothesis I have that, is that there will be some people who will rush ahead and we’ll do this and deploy within iOS apps. Let’s say Siri does it like I’ve sort of been thinking 2022 will be like the really big step up. There’ll be a lot of interest. There will be some very interesting early things in 2021, but it’ll be 2022 when they have the new NLU from voices integrated and all of these other types of things.

Bret Kinsella:

So that might be too late because people may have already made some other decisions, but I think ultimately if you look at like what Capital One has done, you use the same assistant engine in the backend across web, mobile and chat. Well, I guess they differentiate between chat and SMS. So it’s just like one thing. It’s very similar to this idea of BBC having one engine behind everything.

Bret Kinsella:

So I think that most of those organizations, once they realize how great this is in their iOS app, if they haven’t already thought about it, they’re going to be like, oh, I need this everywhere. And then it’s like, oh, can Siri go everywhere? No, it can’t. So that would be the really big thing, is if Siri then could go everywhere, across all these platforms. The first thing is, could you actually make it robust within the apps, make it really easy for people? Okay, great. So you make Siri better. Did you make Siri better by just making all the apps that … Siri is not that important. Apple, iPhone is not that important. It’s the apps that are really important. And you make all those better. And that would be really cool.

Bret Kinsella:

But then if they spread it across surfaces, wow, then it … Like, wow. That would be really big. I don’t really expect that. I expect the former, but if it did the latter, that would be really, really significant.

Dave :

Yeah. No, I couldn’t agree more. Well, Bret, this has been awesome. Thank you so much for sitting down with me, 50th episode. Really-

Bret Kinsella:

Excellent. Congratulations.

Dave :

Thanks, man. I appreciate it. It’s just really cool to have a conversation like this. It’s like I’ve been kind of through osmosis, learning so much about the voice technology space. And again, why is this all relevant to future year and to like all the stuff that I cover, is I really do think that there’s going to be a marriage between these technologies. I think the voice assistant economy will be really, really relevant to hearing aids, hearables, all the different things that we’re wearing in our ears.

Dave :

And I think for the voice enthusiasts out there, I really always try to get people to think about like designing and developing for that modality. I think that’s what got me so excited about Marsbot, was it’s something that is so hearable-centric that it’s like, whoa, this is now starting to get really interesting where these two worlds are really starting to intersect.

Dave :

It’s been a lot of fun chatting with you, catching up, just getting your thoughts on where things are in the voice tech space. Like you said, check out Voicebot, Voice Insider, Voicebot Podcast, Voicebot Research at Bret Kinsella.

Bret Kinsella:

There you go.

Dave :

You don’t even have to say anything. I’ll just list them all out for you.

Bret Kinsella:

Perfect.

Dave :

Thank you. Thank you so much, Bret, for joining me. Thanks for everybody who tuned in here to the end and we will chat with you next time. Cheers.

Bret Kinsella:

Thanks a lot.

Leave a Reply