Alexa, Aural Attention Economy, Daily Updates, Future Ear Radio, Google Assistant, Hearables, Hearing Aids, Podcasts, VoiceFirst

079 – Carl Robinson – Rumble Studio: Creating Better Audio Production Tools

This week on the Future Ear Radio podcast, I’m joined by the great Carl Robinson, CEO/Founder of Rumble Studio and Host of the Voice Tech Podcast. Carl sat down with me to share the backstory of creating Rumble Studio and the vision of what Carl and his team are trying to build. Our conversation spanned a variety of macro trends regarding the state of audio, including the current innovation transpiring within the tools enabling audio production, like Rumble Studio.

As Carl and I discuss, audio is in its infancy as a medium and like any form of media, it is undergoing a significant transformation as it matures.  Look back at previous forms of media, such as TV or Film, and consider the transformation over the course of time as the media, the methods of consumption ( i.e. colored TVs), and the tools of production incrementally matured and improved.

I think there’s an argument to make that we’re still in the, “black and white” phase of audio, as we’re only now seeing an emergence of entirely new formats that challenge the assertion that audio is limited to podcasts. Take social audio for example, such as Clubhouse or Twitter Spaces – would you consider that type of audio to be “podcasts?” It feels distinctly different.

That’s what makes Rumble Studio so intriguing as it enables asynchronous podcast production, enabling yet another format of audio. With Rumble, the host of the conversation is able to issue a series of questions, sent as audio files, with prompts for the guest(s) to respond on their own time. The host then assembles the audio clips into a single audio file.

This opens the door to a series of interesting possibilities. For example, as Carl mentioned during our chat, they’ll soon be releasing the ability to issue a public link, allowing for multiple people to respond, enabling a format that very well may resemble something like a Reddit ask-me-anything (AMA). I think creators would jump at the opportunity to experiment with the ability to string together a series of individual audio clips, from a variety of people, into one singular file.

As people increasingly outfit themselves with in-the-ear devices that allow them to wirelessly and seamlessly dip in-and-out of the audio internet, the demand for new types of audio content stands to grow. So, keep an eye on Rumble Studio and what Carl’s team is building as it very well may be home to a new, popular form of audio.

-Thanks for Reading-
Dave

PS – Future Ear got a new glow up 😎!

EPISODE TRANSCRIPT

Dave Kemp:

Hi, I’m your host, Dave Kemp, and this is Future Ear Radio. Each episode we’re breaking down one new thing, one cool new finding that’s happening in the world of hearables, the world of voice technology, how are these worlds starting to intersect, how are these worlds starting to collide, what cool things are going to come from this intersection of technology. Without further ado, let’s get on with the show.

Dave Kemp:

Okay, we are back. The podcast is back. I’m very pumped to introduce my guest today. I think this is going to be a great conversation. Joined here by Carl Robinson. So Carl, tell us a little bit about who you are and what you do.

Carl Robinson:

Hey Dave. Thanks very much for the invite. Yeah, it’s wonderful to be here. For those of you who don’t recognize my voice, my name is Carl Robinson. I’m the host of the Voice Tech Podcast, on which I interviewed for about 100 experts in voice technology and conversational AI. We talk about things like Alexa, Google Assistant, talking to machines in general. I’m also the co-founder CEO of Rumble Studio, which is a new startup. We’re building a SaaS platform that allows companies to produce spoken word audio easily, affordably, and consistently, which is the main word, so that they can extend their brand to audio channels, which many companies haven’t done yet, and ultimately acquire more leads, convert more leads.

Dave Kemp:

Love it. That’s awesome. This is very fitting too, because I’m speaking with you. You’re based in Paris, yes?

Carl Robinson:

That’s right. Yeah, on the-

Dave Kemp:

Or you’re just outside of it.

Carl Robinson:

… outskirts. The edge only is 20 minutes on the train. I live right next to Disneyland, if anyone’s been there.

Dave Kemp:

Okay. Very nice. Well, the reason that Future Ear was on a bit of a hiatus was I actually spent some time traveling around France. I was there for about two weeks. So, I got to enjoy Carl’s backyard and all the fun that is France. I guess my first question to you is, how do you live over there and not get really fat from eating a ton of bread and cheese, and drinking wine? Because that’s all I did for like two weeks, and it was amazing.

Carl Robinson:

It’s one of the mysteries of France. Why are the French not fatter? I have no idea. Yeah, that’s seriously sugary, fatty and everything else, but yeah. I guess maybe portion sizes that are little bit maybe more controlled.

Dave Kemp:

Yeah.

Carl Robinson:

Yeah, it’s a mystery.

Dave Kemp:

They’re not eating an entire baguette a day and eating like a block of cheese.

Carl Robinson:

Some of them are probably.

Dave Kemp:

So the coolest thing, I loved the … My favorite thing was you’d go into one of these grocery stores or their markets and there’s just like cheese and charcuterie for a euro, one euro for like a whole thing of amazing prosciutto or something like that. So, we would just load up every single day, just go and get a bunch of cheese, a bunch of charcuterie, some baguettes, some wine, and it was like okay, I’m set.

Carl Robinson:

I know.

Dave Kemp:

All right. Well, it’s so good to have you here though because we’ve been talking a little bit. I’ve known you for a little while, particularly from the voice space. I think we’ve attended some of the same shows and I’ve gotten to meet you in person. I always have really admired your ability to iterate. I think you’re a really great iterator. Every single time I meet you it seems like you’re working on whether it’s the project that you had been working on previously and you’ve just taken it to a whole new level, or you’re doing something different.

Dave Kemp:

This whole Rumble Studio, the startup that you founded is fascinating to me. I want to eventually get into that, but I think what would be best for the listeners is to start from your kind of background, and maybe you could walk us through from the start with your background in IT and how it eventually sort of culminated into migrating into the voice space, and then eventually founding Rumble Studio.

Carl Robinson:

Yeah, pleasure. So, I haven’t always been into the voice. I’ve always been into music and definitely listen to the radio. Remember making mix tapes as a kid and things like that. So, I’ve always been into audio, that kind of thing, but definitely not always into podcasts.

Carl Robinson:

I started my academic career in Manchester in the UK. Did a computer science degree. Did three years working in management consulting, IT consultancy in London and in the north, and then actually moved to China. My friend had gone over there and convinced me that was a great idea, and it was super cheap to live, and was like, “Yeah, come over, start a business.” So I was like right, consulting is not for me, and off I went. I didn’t really know what I was expecting, and it was definitely one of the best decisions I ever made in my life, to move to Asia, because it gave me the freedom, financial freedom and also just the time to go to explore and try different things in my 20s, which working in London in a fast-paced environment just would never do.

Carl Robinson:

So, over there I launched a couple of startups. We first started with this translation marketplace, which turned out to be a huge big and hairy project that went on for over four years, because it was taking so long and it was probably a bit too big for us as naïve entrepreneurs. We actually started a second project, a much smaller one, which is this iPhone app called Munch 5-a-Day, which is a healthy eating app which allowed you to log your fruit and veg. We hired a programmer, a designer, and that went really well. It was a much more manageable project. We actually launched that on the Apple iTunes store, as it was called. It got a lot of downloads, and then we got contacted by this American company that actually bought it from us for a substantial sum. So, that gave us the money to basically live in China for the next few years. We probably plowed most of that back into the Hyperlingo translation marketplace, which ended up going nowhere, but it taught us a huge amount about how not to run a startup.

Carl Robinson:

Then eventually I got a job in an American startup over there, still in Beijing, which was a conversational startup. So, we actually developed a chatbot for educating diabetics. It was actually a remote care platform. So, it had health coach and diabetic communication, but also an education component. That was my first introduction to cha bots and this kind of asynchronous communication.

Carl Robinson:

The one problem with that app that we built there, which turned out not be as effective at changing behavior, which was the aim of the app, was that we thought because it was text and image based only, wouldn’t it be great if it had voice, and wouldn’t it be great if this text and image based educational chatbot would mimic more what the health coaches were doing live, which was basically taking people through a multi-turn conversation, and listening to them, and responding to them, which turned out to have really good results. So, we were saying, “Wouldn’t it be great to have some voice technology in this to be able to do what they do?” And that’s where my attention first turned to voice tech.

Dave Kemp:

Yeah, I love that whole background. I mean, I think that it’s really interesting when you hear people that they spent this time doing things that seemed like they were sort of abstract from what they’re currently working on, but it sounds like with you, you spent all this time sort of learning how to even develop apps, and then that kind of then eventually cascaded into chat bots, which then led to voice.

Dave Kemp:

So, I think it’s really cool to hear when people have these kind of deep rooted backgrounds in all these disparate areas that all kind of culminate together into something that is like I think a pretty beautiful thing. When you get to the point to where you’re like, “Okay, I have all this experience, and I’ve kind of identified some of the different deficiencies that these existing experiences that I’ve had have led to.” And you now realize okay, then maybe voice is the optimal way to solve kind of these different issues that I’ve been presented with along the way.

Dave Kemp:

So, go on. So then you get into the voice space, and then how did everything unfold once you started to then turn your attention that way?

Carl Robinson:

Sure. So, that experience at least gave me the intuition that voice would be an effective solution to that problem. I then moved back to Europe. I met my girlfriend over there who is French, and as I was saying before the show, I actually have dual British French nationalities. So, for me moving to France was super easy, especially given the chaos of Brexit and everything. So, I dodged all that and moved to France, did a master’s degree in data science, which is a two year course.

Carl Robinson:

Then for the internship at the end of that I worked in a lab called IRCAM, which is actually a music and speech research lab in central Paris. In that lab I was working on voice emotion transformation. So, I decided okay, I’m going to work on voice. This is around 2018 time when Alexa and everything else was really going well. I thought this is a great field to work in. It’s got a lot of practical uses. I’m sure there’s going to be a lot more job opportunities in this field. Plus, it’s a field that I’m really interested in. Got this experience working with chatbots on the past, so I’d love to work on voice.

Carl Robinson:

While working in the lab on building machine learning models with voice data, it also occurred to me oh, maybe I could launch a podcast to sort of boost my profile and have an excuse to talk to all these experts that are around me but are heads down on the code all the time. Being a researcher, or an intern or whatever is a bit of a lonely task. You’re there 99% of the time on your own, just you and the code, but as soon as you launch a podcast then you’ve got an excuse to go and talk to everyone, and you can deliver some value straight off the bat.

Carl Robinson:

So, for me that was a huge thing. The podcast quickly became popular, but then also quickly added a lot of work to my plate.

Dave Kemp:

Yeah.

Carl Robinson:

As I’m sure you’re familiar, podcasts are a lot of work, and I was spending my entire weekends editing, and promoting, and publishing and doing everything else around the show. It occurred to me at some point maybe I can combine all these cool conversational AI tech that I’m talking about with these experts every day, and it’s clearly getting better week on week, with a solution to scaling my podcasting output. Maybe I can clone myself to some degree. Maybe I can find a tool, first of all, that can help me capture audio with less effort. So, I looked online and there really was nothing. A lot of the tools in podcasting are quite antiquated. The stack is pretty set. People have been doing the same thing for a few years. At the time there was Anchor, which was, as you know, it’s kind of a B2C podcasting platform for free, and then Descript came out a little bit later. These were the two, the real innovators in the space, but apart from that there really wasn’t that much around the recording side. So, that’s where the initial idea for Rumble Studio came from, to solve a problem that I had firsthand.

Dave Kemp:

Which is like I think usually the best solutions come from those who actually were experiencing the problem. This is so similar to, I’m not sure if you’ve ever heard the All-In Podcast.

Carl Robinson:

Yeah, yeah.

Dave Kemp:

So, David Sacks who is on that, he’s the founder of Yammer, which sold to Microsoft. He was part of the PayPal mafia. So, he is a really famous SaaS entrepreneur. Over the course of doing that podcast he realized that there were quite a bit of deficiencies around the production side and kind of like the consumption side as well. That really struck me with what you were saying there, because he basically said the same thing. He was like, “I am a podcaster and I experienced all these issues, and I wanted to solve them.” Now he’s going about a totally different track with his company Callin, but I think what you’re building with Rumble, it’s a totally different thing from the standpoint of solving different kinds of issues, because I think that one of the recurring themes that always goes through my mind is this idea that we’re really in the infancy of audio consumption. Audio consumption tends to sort of take the form of podcasts. For those of us that are in the voice space that were doing flash briefings, some people still are. I think that’s another example of the kind of audio production that has sort of emerged over the last few years with microcasting.

Dave Kemp:

I often feel this sense of we’re at a really early, early innings of this, in the same sense that if you think about podcasting or audio production as an actual medium, kind of like an art medium, if you will, it’s not that far off to compare it to something like film or television. You look at the way in which television was produced on the ’40s, and then the ’50s, and the ’60s, it just incrementally changed. Some of that was brought upon by technological advancements and it enabled new ways of production, but a lot of it was just the ingenuity of the creators themselves to look at it as a form of art. So, that kind of leads me into Rumble.

Dave Kemp:

I want to give you a chance to really explain, as you were saying there. So, you had been doing the Voice Tech Podcast and you had been doing a whole lot of putting all this time and energy into it. You realize like man, I’m kind of burning myself out here to a certain extent. I’m spending too much, a disproportional amount of time on the production side. So, walk us through kind of what you learned from that period of time and what you’re trying to solve with Rumble.

Carl Robinson:

Yeah, for sure. So, I mean, you summed it out nicely. I just felt like I was spending way too much time on the nuts and bolts of producing the audio and not enough time on the things that I felt made the show really great, which was researching the guests, coming up with some interesting questions, an interesting angle and that kind of a thing. Voice Tech Podcast is a live podcast, I should say. Asynchronous wasn’t really an option at that point, and I absolutely love these live conversations. What we’re doing at Rumble isn’t designed to replace that in any way, it’s designed to supplement it or be an option for people for whom that isn’t an option.

Carl Robinson:

So, I like the idea of being able to scale things, be able to use the power of the machine to be able to scale myself and be able to improve productivity, is basically what machines are there to do. With all this technology coming out, the conversation of AI, the dialogue management, the text to speech, the synthetic voices, the new ways to consume audio, like microcasts and just a general interest level in voice technology and audio content. I thought this is a fantastic opportunity to actually bring something new into the market and solve some of the problems that I’m currently experiencing.

Carl Robinson:

So, my first thought was if I could just build a tool that allowed me to capture audio efficiently from some of my guests, I could scale that. I could maybe do some opinion canvassing, for example, and send a few questions to multiple guests, put those on my show and then ad lib around it, and add my comment. That would be a scale up. But I also was thinking about it with all these technologies that are coming out, why not go a step further and have this audio Dropbox turn into more of a remote conversation tool where it can be a multi-turn conversation, and the machine is actually listening to what the guest is saying and responding accordingly. That’s the thing that really inspires me. What we talk about on the show all the time, this is the vision that a lot of voice tech enthusiasts have, is the Star Trek computer type thing, or even Data in Star Trek. Wouldn’t it be great to just have a conversation with a machine about anything you want for as long as you like, and have that conversation be super engaging and personalized to you? Wouldn’t that get the best out of you? I think it absolutely would, and it wouldn’t just be entertaining, it would be useful in a business context, in a health and therapy context. I mean, the applications of that would be limitless.

Carl Robinson:

So, that’s the vision that inspires me and inspires a lot of people. I think as I was thinking about it more and more, I thought well, today chatbots and voice assistants, they’re generally there to answer questions. It’s quite transactional, for a start. So, they’re generally not multi-turn conversations anyway, and the single transaction you have with them is that you ask a question and then it comes back to you with an answer, or you ask it to do something and then it comes back to you with an action.

Carl Robinson:

So, we flip that on its head and say well, why not train the chatbot to actually ask interesting questions and then follow up with more questions where the human is there to speak as much as possible, and then simply record that conversation. I’m sure depending on the quality of the questions and the guest, interesting stuff will happen and we’ll be able to use that as a form of audio content for either purpose. So, that’s really the thinking behind Rumble.

Dave Kemp:

I mean, so I was kind of wrapping my head around it. I was looking at your website, rumble.studio, and I know you have this new podcast. I thought it was so cool that part of this podcast, it looks like you have a handful of interns.

Carl Robinson:

Yeah.

Dave Kemp:

Or young professionals.

Carl Robinson:

Absolutely.

Dave Kemp:

All different backgrounds. So I was listening to the one where it was the sixth episode of Audio Leads, that’s this podcast that you have within Rumble, and it was kind of like what did you learn. What really dawned on me as I was listening to that was, and I think it’s important to really frame what it is that Carl is building with Rumble. So, with a podcast like this I’m sitting here live with Carl and we’re having a back and forth conversation, but that doesn’t necessarily need to be the only way in which you can sort of glean that information. This asynchronous format is a totally different way in which we do actually communicate, if you think about an email correspondence, that’s asynchronous. You wait until you get the response back from the person that you’re corresponding with and they do that on their own time and they have this thoughtful response. You think about okay, so how do you apply that to audio? That’s really what Rumble is doing, is they’re taking this Q&A style and they’re making it so that it’s very, very easy to turn a interview into something that can be done back and forth asynchronously, and I think that really lends itself to I think, again, it’s this idea that we’re in this black and white period it feels like with podcasting. We’re in the very beginning stages of it.

Dave Kemp:

So, to think that Joe Rogan in a three and a half hour podcast format, long form is going to be the definitive form of podcasting, or Future Ear Radio, where we do these live one hour conversations, or it’s a 20 minute kind of like news poppy, here’s what happened in today’s news. I think it’s extremely limiting in terms of the creativity that’s actually going to emerge. That’s what I found to be so cool about Rumble, is it’s like I look at this as you’re enabling a totally different kind of creator.

Dave Kemp:

One of the interns that you had during that episode said something really interesting, which is like I don’t have the kind of personality that lends itself to being able to on the fly ask these things. I want to have time to process what they said, and then ask a more engaging question. I think that the same can be said for certain guests, where they maybe get a little nervous, and it’s a little bit challenging to really articulate what you’re trying to say, especially if you don’t have a lot of experience in public speaking or podcasting and you’re not really a polished speaker quite yet. So, you kind of repeat yourself a lot, you say a lot of filler words like um. So, by giving somebody a new format where you say, “Look, we can still have a very similar conversation.”

Dave Kemp:

Now, granted, there are some downsides to this and I think we can talk about the advantages of live versus asynchronous, but I do think that having this as a totally separate type of format that exists and making it really, really easy and turnkey for people so that they can just go and they can basically issue an interview, and it’s just kind of a back and forth thing where I record something. So, I record my questions and the Carl responds with his answer. That prompts me then to ask my next question. Before you know it you have a 20, 30, 40 minute conversation on your hands that was all done kind of in this turn style manner.

Dave Kemp:

I just think that it’s going to expand this whole ecosystem dramatically because it’s going to bring in people that are for whatever reason maybe hesitant or they just don’t find the current formats that exist in audio production to be that appealing to them, and I think that the more diversified the formats are that exist, the larger this ecosystem is going to be, and I think that’s going to be a huge net positive because the more diversified that you have of creators in any given format, the more beautiful the art ultimately becomes.

Carl Robinson:

Very well said. I really appreciate that. Thank you first of all for having listened to our episode and gone through it and really absorbed it. We learned a huge amount from that first series of Audio Leads. So, I want to give a shout-out to Joey, and Nived, and Laetitia, and Emily who’ve all contributed and really drove the production of that show. The idea of that was it was a proof of concept to build a fully asynchronous podcast. The last episode you listened to was live, it was kind of a light relief to be able to shoot all the ideas and the feedback amongst each other. Yeah, it was an asynchronous podcast. We did five asynchronous episodes. We did it in the style of trying to model a live podcast, which is actually one of the things we really learned that has its pros and cons for sure, because just so the listeners know, it’s actually the way Rumble works is that you write some questions upfront. So, you write your batch of seed questions at least, might be eight to 10 questions. You invite one or multiple guests and then the guests click a link in their inbox and in their own time they go in and they can record one or all of the answers to the questions there.

Carl Robinson:

The questions could be text only, they could actually be audio if you’ve taken the time to record them upfront. It can either be synthetic voices, like a text to speech. Then when the host or the editor gets that audio back they can reorder them, they can delete the bits they don’t like, they can see the transcripts, they don’t have to listen through the whole thing, add some jingles, and then click merge and that exports the audio episode that you can then publish.

Carl Robinson:

So, the way that we did it with audio leads was to send the questions, wait for the answers to come back, listen to the answers and then add like an informed comment at the end of it, and then lead into the next question. So you would get this question, answer, followup comment, question, answer, followup comment type order.

Carl Robinson:

That’s fine, and I’ve actually done that with some success on the Voice Tech podcast. If you listen to the last five or six episodes, like with Christian Mahnke from EarReality for example, it works really well, but I’ve recorded a 100 podcast episodes, unlike most of our users and the interns that are producing the show. So, when they came to record their followup comments sometimes the audio environment was different, so there was more background noise. Sometimes they felt kind of fake in having to respond in a realistic manner. They were like, “Oh, that’s such a great point.” But they’re in a room laughing on their own to themselves, it’s just weird. So, these are the things you only really learn by doing it. On paper, it sounds great.

Carl Robinson:

So, for this season two we’re actually experimenting with more of a narrative format, where we send the questions, get the answers, but then once we look at the answers we actually kind of arrange them in more of a story flow and then add a third-party narration, speaking directly to the listener as opposed to the guest, adding sound clips and stuff to support. So, in a way we kind of make a point, and then the guest audio supports that with evidence and then we move on. So far it’s working well, but it’s yet to be seen if it adds more work, because our mission at Rumble really is to make things 10 times quicker, easier, cheaper for the user.

Dave Kemp:

But I think this is, as I said earlier in the conversation, I’ve always admired the fact that you’re a big iterator. This is such a good example of that, where it’s like I think you’re really onto something with this different kind of format. To your point, I think it’s already been identified by you and your team that a lot of what makes a live podcast appealing just can’t really be replicated, and I think once you sort of understand that these are kind of like two entirely different formats, so you don’t have the serendipity, you don’t have the live candor, you don’t have the laughing that spontaneously occurs throughout a conversation. However, it does lend itself to totally other advantageous things. So, I love that idea that you just mentioned there with adding this narrator, right? Where you kind of have this entirely separate entity, if you will, that’s playing kind of like god’s eye view over the conversation that, again, it works in that kind of format. Something like that would be so alien to this kind of format. So, I just find that to be really interesting.

Dave Kemp:

Another thought I had in my head, and it might be something that you would be open to in future seasons of your podcast, is the other thing this really could open the door to is a much more expansive episode that involves a lot of different perspectives.

Carl Robinson:

Yes.

Dave Kemp:

So, I see this as like okay, so you have this broad theme. Let’s just say we’ll use voice tech because a lot of the people listening are familiar with that. So, let’s say that we’re talking about something really broad like the NLP or something like that. So, I think that what could be fascinating is it allows for you to curate a conversation almost where so first you solicit Kane Simms’ opinion on something, right? And then the next person that you have, let’s say it’s a Heidi Culbertson or Amy Stapleton, somebody like that where they then come in, they can see what Kane said, and then they can build off of that. So, ultimately you get something that is a curated sort of opinion that is being built upon by multiple people that would be highly, highly challenging to do in any other format because that would basically entail that you would have to get all these people on a podcast where you’d have to record them separately.

Carl Robinson:

As I was saying[inaudible 00:26:18].

Dave Kemp:

And it would be hard for them to … So, I think that there is a lot of power to this too, where it gives you the opportunity to kind of expose people I think to an ongoing conversation and then give them the opportunity to kind of build upon that. I think that could lead to another really fascinating format, which again doesn’t quite exist today.

Carl Robinson:

Completely agree, yeah. I mean, it’s so flexible. So, you described one of these formats, which is kind of an iterative format. You send out the questions and you get guest after guest playing off each other. We did that once with Voicemod. The CEO and then the product manager I believe come in and reply and could hear the CEO’s responses. So, in a way it was like they were in the same room together, but they’re recording separately. But we’re really excited about a new feature that we’re actually going to release in September, which is this public link. So, instead of having to invite specific people with their email addresses just get a link that you can put it on Twitter or Facebook and have as many people that you’d like join, and those can either join the same conversation, so they can play off each other, or they could each get their own copy of the conversation with the same question audio but then give their own answers and allow you to capture all this audio super quickly with very little effort.

Carl Robinson:

In a AMA for example, you could have … So a celebrity could be like ask me anything, drop me questions, or flip it like a Quora and say, “This is the question of the day. Give me your answers, I’ll pick the best ones.” I’ll merge them into an episode and I’ll put them straight back out onto social media, and then that way you can complete the loop and you can engage your fans, allow them to participate, but then give them really timely and well-informed opinions as well, your curated selection of the community opinion, which, again, would just basically be impossible with the tools of today.

Dave Kemp:

I love the AMA. I mean, again, I think that this is what’s so exciting to me, is I’ve long held … I wrote a blog post, a two part blog post for VoiceBot. I think it was at the beginning of 2019, that was like the Cambrian explosion of audio. My thesis of that was basically like if you actually look at both on the production side and the consumption side, the building blocks that are now in place, what we’re looking at is like a recipe for just a massive explosion of adoption. So, on the consumption side that’s kind of obvious. You look at just look around any busy place that you’re in now that the world is opening up a little bit again, everybody has true wireless headphones in. Hearables really are kind of everywhere now. That wasn’t a thing even four or five years ago. It’s really exploded in popularity over the last few years, largely ushered in by AirPods. We’ve talked about this ad nauseam on this podcast, so I don’t need to rehash that.

Carl Robinson:

Yeah.

Dave Kemp:

But it transcends hearables too, it’s connected everything, from smart speakers, to cars. You’d be hard-pressed to get into a car that’s even 10 years old that doesn’t have at least the ability to port into Apple CarPlay or Android CarPlay. So, it’s so easy now to just stream whatever audio you want, whether it’d be all of your Spotify or whatever subscription service you use. So, the consumption side is pretty obvious, and I think the production side wasn’t as obvious, and I think it’s becoming more obvious, which is like again, it goes off this whole idea that once the tools become available, it’s very easy to actually think of this as analogous to what happened with video.

Dave Kemp:

You look at when it was in the early 2000s, very, very hard to create video. Here we are in 2021, you can literally put up a TikTok in 10 seconds, and it can have all these video effects and all this different stuff, stuff that was kind of unimaginable even 10, 15 years ago. It’s just become so dramatically democratized, and the same thing’s happening with audio production too, I think, is that we’re seeing so many new tools come into the fold, whether it’d be Anchor or it’s something like what you’re building with Rumble, where ultimately we’re kind of trying to get to the same thing, I think, which is making it so that it’s super, super easy to facilitate entirely new forms of production.

Dave Kemp:

I think that it’s fascinating to kind of see what’s worked well in the legacy web, if you will. Reddit AMA is highly popular, and to think like what does that look like in an audio format. I think that it’s really easy to get caught in this fallacy of well, it will just be a podcast. It’s like, but how does that actually come to be? And I think it has to be innovation that’s done on the production side to enable that, and it’s something like exactly what you just described, where you have this sort of asynchronous platform, and that platform iterates, iterates, iterates to the point to where you now have these public links where you can then have maybe a host, and they put it out and they solicit all of these responses to their question, and then it merges together into one giant audio file.

Dave Kemp:

That’s what’s so exciting to me, is that as somebody that’s coming at this from the angle of hearables and from the world of Bluetooth connected hearing aids, my big passion has always been what do you do with a world where everybody is wearing these devices? What are they going to actually use those devices for? And it’s kind of almost like cheating to just say, “Well, podcasting.” But you need to sort of break down what that actually entails. I think this is a perfect example of that, which is to say yes, you’ll have the option to stream Joe Rogan, but it’s going to go way past that. It’s going to go into all these new formats and I think that that’s what keeps going through my mind when I’m hearing you talk, is like you’re on the forefront of one of these different kinds of formats, and I think that it opens the door to a lot of what we’ve already seen work in sort of like the legacy web and it’s going to audiofy it in a way. I think that it’s going to be eye-opening to a lot of people to when they start to really recognize that podcasting is actually pigeonholing the medium. It’s much bigger than that. It’s audio, and what audio can be I think we have just barely tapped into how expansive this world is going to me.

Carl Robinson:

I mean, podcasting is such a loaded word, right? When people say podcast they think of usually two people in a basement talking to each other.

Dave Kemp:

Yeah.

Carl Robinson:

It’s been like that for a while, but really it was because that’s what existed. A podcast, as people traditionally think of it, it’s just on-demand audio over the internet, right? But now you can stream it. Now there’s microcasts. There are just so many of these different formats, these varieties of podcasts. The word podcast doesn’t really hold any meaning anymore. Is it live? Is it on demand? How long is it? How many people? Is it a human even? What is a podcast? Is it read, is it narrations, does it come from a blog post? Does it come from … Who knows where it comes from.

Carl Robinson:

So, I think it’s the only word we’ve got right now to describe it, but I think as like microcasts for example, these new words come along, people will start to see the audio space with a much finer grain of detail and not just lump everything into the podcast basket. A lot of it comes from the tools, like you say. The more tools that come out, the more creativity can be unleashed.

Carl Robinson:

Video is a great example. When TikTok comes out, and there’s Instagram Reels and now YouTube Shorts, suddenly there’s this whole different type of video that people can create, and the videos they create on those are very different from a standard YouTube video, which is also very different from the type of video that you would watch at a movie theater or on your regular TV.

Dave Kemp:

100%.

Carl Robinson:

I mean, the number of types of video that exist just dwarves the number of types of audio at the moment, and that’s because it’s been around for longer and there are more tools, there’s been more investment in it, but people are really waking up to audio now, businesses, and the consumers, and the creators. We’re just going to see a huge explosion in that. Like you said, there are just so many ways to listen to it, it’s why it’s more worthwhile. It’s more worth spending time and money creating audio because there are just more potential listeners and more niches, and all around the world as well. This is something that I’m really excited about at Rumble, is the ability to allow people to create content more cheaply and easily, to serve niches that it wouldn’t usually or hasn’t been up until now worth serving because the audience sizes are too small. The same goes for different languages and communities around the world.

Carl Robinson:

I love the idea of people all around the world creating podcasts in all different languages, not just English, talking about really niche subjects where maybe they only get 100 listeners, but that doesn’t matter because it’s super quick and easy to do this, right? If you can reduce the cost of production down to just an hour or two a week, it’s worth it, right? It should be as simple as making a quick TikTok video or something. Then you can serve the audience that is interested in listening to what you’ve got to say.

Dave Kemp:

Yeah, that’s brilliant. I want to stick on businesses. I think this is a really interesting one, because I think it’s easy for us to imagine the consumer side of things, like how the end user eventually kind of lines up with this piece of content. I’m curious to get your thoughts on businesses, both kind of like how you envision brands coming into the fold, whether it’d be branded content that’s directed at consumers. The other really interesting avenue that I’m really curious to get your thoughts that’s kind of also running through my head right now is how do you see something like Rumble being used by corporations for internal communications, because I see there being a lot of opportunity as to how businesses actually communicate within itself. I’ve been in big corporations before where we have all-hands meetings and you really only get to hear the CEO or kind of like the subordinates right underneath them, his lieutenants that are kind of like you don’t hear from them that often. Usually I feel like a big reason being is that it’s top down communication that it’s hindered by the size of the company. I see, again, kind of building off what we were just talking about with this idea of asynchronous and being able to kind of add in additional voices and opinions. I feel like that could really lend itself to some businesses.

Dave Kemp:

So just overall, I’m curious to get your thoughts on that particular facet of the audio market.

Carl Robinson:

Yeah, I’ve got plenty to say on that. So, Rumble Studio is aimed at businesses, small to medium sized businesses, SaaS startups, midsize SaaS startups, and hopefully eventually the larger companies as well, the ones that are more likely to have an internal podcast as well, which is what we can talk about.

Carl Robinson:

The first concept I wanted to perhaps introduce some of your listeners to is the audio first content funnel. This is what a lot of podcasting agencies talk about, is that audio being the most information rich medium can serve as the ground truth for all the other derivative forms of content you create. So, a lot of companies these days are into content marketing. Content marketing is growing at I think 17% CAGR. It’s billions and billions of dollars I think, the latest figures are like 470 billion going to all forms of content marketing next year. So, it’s enormous and growing fast, but that tends to be social media, emails, video, these kinds of things that we’re typically doing. Audio really is just getting started in that spend, but it serves as a fantastic ground truth, like I say.

Carl Robinson:

If you record the audio first, if you conduct these guest interviews with thought leaders or even your own internal staff, then you can write a blog post from it, then you can create short videos for social media from it. You can create longer pieces, you can mix and match and repurpose to create even more audio. There’s all sorts of stuff you can do once you’ve got the information from the guest. So, then the question is, how do I most efficiently capture that audio and have these conversations on a consistent basis?

Carl Robinson:

Audio in the traditional way is quite time-consuming to create, and what we find with podcasts is there’s a significant pod fade. So, although there are supposedly two million podcasts on the podcast app store, on the podcast directories, half of those are inactive, meaning they haven’t released an episode in the last 30 days, and about half of those have seven or fewer episodes. So, there’s massive attrition. There’s a huge bias towards people starting podcasts with all the enthusiasm and gusto and then realizing this is actually quite a lot of work. Are we getting the value for it right now? Well, it’s not made a lift on our sales this week, so maybe we should just put it on the back-burner and come back to it later, which they never do. Which is a shame, right? Because we know if you continue with podcasting or releasing audio content it has a massive effect on sales, on educating your potential clients, building trust and authenticity with people that have just come across you, and it can serve at the top of the funnel and at the middle of the funnel as well. So, it’s very, very useful for businesses. I would say if you can implement an audio first content funnel, then you can get the best bang for your buck from the audio that you create and then justify continuing producing it essentially, so you don’t quit.

Carl Robinson:

You mentioned internal podcasts as well. Internal podcasts is not something that we’ve looked at immediately, but it’s a number of customers are doing internal podcasts for basically helping larger enterprises produce private podcasts, so you need a password to access the feed. For that usually they use this like a custom app, which they roll out to their employees. The benefit of that is you can easily onboard people and then you can easily off-board them as well. So, if they leave the company you can close off their access to what can be sensitive information on the podcast. Those internal podcasts can be massively helpful for things like sales enablement.

Carl Robinson:

When these big companies are launching new products, they need to be able to educate the workforce on what the new products are. They need to be able to convince the sales force to start selling them, because the sales force have got quotas to hit and they tend to stick with the products that they know and love and that they know they can hit their quota with. So, you have to give them the incentive to actually sell these new products, and then you have to give them a way so they can listen to this content while they’re on the go, because they’re sales reps, often they’re out on the road. They can’t be reading long PDFs or watching videos. So giving them a podcast is an easy way to get them up to speed and convince them to sell this new stuff.

Carl Robinson:

So, I would say if, or even indeed when we do internal podcasts, we’d probably want to either develop or likely partner with one of these established private podcasting firms like Storyboard, for example, or FieldCast that I’ve just discovered because they’ve got the whole infrastructure to be able to serve the listeners within an organization, and we’re focusing on the content creation side.

Dave Kemp:

Yeah, I love that. I mean, I think when I think about you said something really interesting at the very beginning there. You said it’s sort of the ground truth. I had actually sent you over that Alex Danco piece that really always stands out in my mind. I’ll link this in the show notes. It’s called The Audio Revolution is the name of the title, but what he really kind of hones in on is this whole Marshall McLuhan school of thought, which is about hot and cool media.

Dave Kemp:

He talks about how audio, especially when it’s in your ears, like when you’re listening through headphones, it represents really hot media, which basically means that it’s extremely information rich, whereas something like Twitter would be cool media, meaning that you’ve got to kind of fill in the gaps a little bit, and so you kind of have to do like context clues because you’re limited by 240 characters. So, with hot media it’s so dense with information that it kind of informs every single thing that the person is saying.

Dave Kemp:

So, a good example would be like an email versus a sound clip. In an email, a lot of the actual sort of nuance of language can get lost. It’s very hard to convey sarcasm, right? It’s very hard to convey all of the subtleties of language. What I find to be really interesting about this idea of internal communications is I think that what we’re going to see are more and more companies gravitate toward these hotter forms of media, because I think that it’s that ground truth that you said. I think that it gives people the opportunity to really communicate with their teams, especially in larger corporations, without any context being lost. So, maybe it’s a weekly instead of having an actual live meeting, it’s more of an asynchronous thing where it’s still enabling them and maybe some of their kind of right hand men or women to interject and say things as well, but I just find this to be interesting where I see the future of communications, broadly speaking, finding a lot of appeal in audio. Again, because I think that, and maybe video too, but I think video the production and I think there’s more of a consumer tax, like basically a user tax. You’ve got to pull out a video and you’ve got to watch it, whereas with an audio clip you could just be walking and listening to it, hands-free, eyes free.

Dave Kemp:

So, I think there’s a lot to be said for that, which is like I really could see a future where a lot of internal communications migrate in this direction because I think that it could be found that it’s just a more information rich context and things don’t get lost in the shuffle.

Carl Robinson:

Completely.

Dave Kemp:

You can listen to it on your own time. To your point, then it sort of becomes the centerpiece which the derivatives flow from. So, I wanted to just kind of throw that out there because I think that, again, it’s all about as soon as you start to open your mind to everything being potentially made into audio, and again I think it’s not necessarily like it’s superior in every manner, I just think that it serves as an alternative that I think is going to appeal for a lot of different situations that don’t seem super obvious right now.

Carl Robinson:

It’s totally true. I mean, I was going to say that it’s happening in the meeting space as well. You mentioned meetings. There are many startups, the one that comes to mind is YakChat, who do asynchronous audio meetings. Just to take the pressure off people constantly having to attend Zoom calls. Again, that Zoom burnout. There are certain meetings that can take place asynchronously. The good thing about recording audio asynchronously, because it’s a zero pressure environment, you’ve got a chance to redo the audio as much as you want. You’ve got a chance to think about what you’re going to say, I think as you said before, is that it increases the certainty or decreases the fear in getting something wrong, and especially in a corporate environment. You’re worried about losing your job or embarrassing your boss or anything. If you can actually sort of take your time to get that audio just right before you then broadcast it, whether that’s on an internal podcast informing your employees, your workforce, or whether it’s on an external podcast asking questions to guests, then you’ve got a lot more certainty. You’re not so fearful, and therefore you’re more likely to do it and keep doing it and get the value from it without risking anything. I think that’s a big benefit of asynchronous that resonates with people in business in particular.

Dave Kemp:

And it fits into this new paradigm of remote work. I mean, I think that the Yak, those guys, their whole sort of business thesis has really been I think more pronounced throughout the pandemic. It’s like this makes a lot of sense. You hear people that are constantly talking about Zoom fatigue and being burnt out with the constant live meetings. Again, I think there is a lot to be said about an asynchronous world. What is that asynchronous evolution look like? And I think audio is a really integral part of it, and it’s going to be interesting to watch unfold.

Dave Kemp:

So, as we kind of come to the last portion of this conversation, I’m curious, as somebody that has been at the forefront of audio innovation, very tied into the voice tech space, what’s exciting you right now? What are some of the things that you find to be really intriguing, exciting, maybe on the horizon that we can come to expect, and whether it’d be in the voice space, the audio space, or the intersection of the two?

Carl Robinson:

Yeah. So, I’m focused more on the content world now. So anything that enables me to create content in an automated fashion and to a high degree of quality or to make it more interesting, that really interests me. So, some of the dialogue models coming out like Google LaMDA, for example. When I see it having such realistic conversations with a human, that really excites me, not just operating in a transactional manner but actually emoting and cracking little jokes and things like that. I just get super excited about that stuff. I know they’re working on the successor to GPT-3. So, who knows what that’s going to be saying?

Dave Kemp:

Yeah.

Carl Robinson:

Facebook Blender that just came out, and open-sourced it. So, all of these projects just get me really excited. I mean, they look super impressive. To be honest, we’re not going to try and compete by building a model of that size, but the fact that they’re open-sourcing it. In fact, there are so many of them as well, which partly leads to the more open-sourcing, I think. It means that we as a startup will probably get our hands on one of them and be able to fine-tune it for our own purposes as well. So, for example Facebook Blender, if we can fine-tune that to only ask intelligent followup questions, for example, or take elements of it to make a conversation more interesting, that is super exciting to me.

Carl Robinson:

What goes along with that is the quality of the text to speech voices these days. You’ve got Sonantic, DeepZen, LOVO, VocaliD, Resemble, just so many. Whenever I hear them I’m like, “Wow, that’s even better than the last one I heard.” I just can’t believe how good they sound.

Dave Kemp:

Right.

Carl Robinson:

So we’re definitely looking to incorporate that into content creation as well, because I really do think that the day will be upon us very soon where every brand will have its own custom voice or set of voices even, one personalized to each market segment and region around the world, which will be instantly recognizable and be like that is the McDonald’s voice. Right now we’ve got the McDonald’s jingle, but I don’t know if I could recognize the McDonald’s voice, but soon, because they’re releasing all these voice system based menu systems and all that, that you’ll want to hear McDonald’s, right? Or at least the McDonald’s personalized to you. I think the same, that voice will then carry through to all elements of marketing, including the McDonald’s podcast, assuming they’ve got one.

Dave Kemp:

Right.

Carl Robinson:

You know what I mean? So, those two things work hand in hand, and take us one step towards just being able to have a natural conversation with a machine that you actually want to have and listen to at a later date as well.

Dave Kemp:

I like the mention of TTS. So, text to speech and speech to text I think are like another one of these things that’s kind of flying under the radar a little bit but I think is going to be monumentally important to the future of the internet because I think that this whole idea of having the ability to make everything audible and then make everything that’s audible into text is it’s what you’re really kind of implying there is that you’re talking about two different kinds of language almost. So, you have the computer language, which is text, and then you have the … And what I mean by that is obviously it’s a phonetic language but that the computer understands it.

Dave Kemp:

So, you’re taking, if you think of that as the computer speech, and then you think of the audio speeches like the human speech, it’s what you’re really saying is it’s making it so that computers are better at understanding human speech and converting that into their language and then taking their language and converting it into human language, and making it so that it’s harder and harder to distinguish the actual genesis of it. Was this actually done through initially was this a piece of text that was then translated into audio or was this a piece of audio that was translated into text? The end result ultimately is that you have two parallel Internets more or less, one that is entirely audio and one that is entirely text.

Dave Kemp:

So, the obvious examples of that is every article that you want to read you could then listen to, and every article or every type of piece of audio that exists could be transcribed and you could read it. I think okay, so it’s like okay, so I could just read that, that would be great. But I think if you take it a step further what that means is you then have a piece of audio that’s transcribed that then could be search upon dynamically.

Carl Robinson:

Yes, exactly.

Dave Kemp:

Then suddenly you have the whole spoken web can be made into being dynamically searched in certain ways, and that opens the door to then you can make … That I think is the key to solving some of the discoverability stuff. Having a much more robust dynamic search overlaid on top of all this. I think all of that’s going to be predicated on the innovation that’s happening within text to speech and speech to text.

Carl Robinson:

Yeah, it’s really interesting you mention search actually. This is definitely something that I’m really excited about as well. I just had a really interesting conversation with the guys over at Sounder.fm, which is a very forward-thinking kind of podcast host/audio search company. Also the guys at Audioburst, and there’s a number that are working on this kind of stuff.

Carl Robinson:

What you said about the being able to search and solve the discoverability problem really resonates, because right now when you do a Google search often you’ll get a snippet, a feature snippet or a portion from the middle of a blog post that directly answers your question. I can just imagine the same thing happening in the audio world. Then internally as well at Rumble we were discussing well, why even listen to a whole podcast episode? Why does everything have to be an episode at a time, when maybe the bit at the beginning is just bumf, it’s just the origin story, you don’t care about that. You want to get right to the nuts and bolts, and wouldn’t it be better to listen to five two minute clips just talking about the question that you’ve got right now and get that range of opinions and then you’re done in 10 minutes? Rather than have to listen to five hour long podcast episodes and try and pick out the bits that really interest you. We were discussing the idea of dynamically created RSS feeds, for example, based on different topics that are just constantly refreshed with the snippets that address these things across all podcasts. When that actually comes out, and I’m absolutely sure they will, that completely changes the way that you think about content.

Dave Kemp:

Exactly.

Carl Robinson:

Because instead of making one episode that you know will be consumed in a standalone fashion, now you’re thinking about it both in terms of a standalone episode but also as a series of segments that will be consumed individually. That really effects the way that you record the audio, and this is something that I think the asynchronous can also help with, is that when you record this way it is pretty segmented and you can enable text to speech and all these other kind of programmatic solutions that bring those to the fore.

Dave Kemp:

Exactly. I mean, I think I know that’s a lot of what Audioburst is trying to solve. I think that it’s really interesting to think about it that way, where again it goes back, I think this is kind of a perfect bode to put on this conversation, taking it full circle, is like again, there’s so much innovation to be had because so much of this gets lost. You have one great podcast episode, and it’s a challenge to even get the right people to listen to it, let alone listen to the part that’s really tailored to their interest. So, there’s a lot of jobs to solve here. We have to solve a lot of work here, and there’s got to be a lot of innovation that’s done, but at a certain point in the same way that the web has evolved, look at TikTok. TikTok is the perfect example of the culmination of 15, 20 years of mobile video innovation. You now have this feed that is so incredibly curated that it’s like a drug. People can’t even put it down because it’s so insanely curated to your interest.

Dave Kemp:

I think it begs to reason that we very well may get there with audio as well, but you’re going to have to have incremental improvements that lead to that, and I think that’s going to be a lot of what’s done at the underlying technology level with things like text to speech and speech to text, and then the ability to dynamically search on top of that and then start to curate those things so that if I just want to basically go and listen to the news on the new Apple iPhone that’s going to be coming out, I should just be able to hit a button more or less where it just feeds me the most widely popular clips from the past 10 most popular podcast episodes specifically talking about the iPhone. So, it’s like I think we’re going to get-

Carl Robinson:

It sounds so simple, right? [crosstalk 00:55:02].

Dave Kemp:

It sounds so simple.

Carl Robinson:

Yeah.

Dave Kemp:

Yeah. So anyway, I think this has been a fantastic conversation. So interesting to get your thoughts. I think you’re one of the brightest thinkers and innovators in this space.

Carl Robinson:

Thank you. Appreciate it.

Dave Kemp:

As we wrap, where should people go to connect with you, learn more about Rumble? I know I’m going to personally definitely experiment with it and see if I can fit it into everything that I’m doing. I think there’s definitely a place for some asynchronous content to sit alongside my live content. So, where can people go to connect with you and learn more about what you’re building?

Carl Robinson:

Yeah, absolutely. So, if you’re interested in checking out the tool, it’s rumble.studio. Sign up to the newsletter, rumble.studio/newsletter, and you’ll find the podcast there, rumble.studio/podcast, it’s called Audio Leads, if you search for it in the directories. You can also listen to my podcast, which is Voice Tech Podcast, so voicetechpodcast.com.

Dave Kemp:

Awesome. Carl, thank you so much for joining here today. Thanks for everybody who tuned in here till the end and we will chat with you next time. Cheers.

Dave Kemp:

Thanks for tuning in today. I hope you enjoyed this episode of Future Ear Radio. For more content like this just head over to futureear.co where you can read all the articles that I’ve been writing in these past few years on the worlds of voice technology and hearables and how the two are beginning to intersect. Thanks for tuning in and I’ll chat with you next time.

Leave a Reply