Chief Content Officer,
Techstrong Group


In this Digital CxO Leadership Insights Series interview, Mike Vizard talks to Capsule CEO Champ Bennett about how generative artificial intelligence (AI) will democratize the creation of video content.



Mike Vizard: Hello, and welcome to the latest edition of the Digital CxO leadership insights series. I’m your host Mike Vizard. Today we’re talking with Champ Bennett, co-founder and CEO for Capsule. And we’re talking about large language models and video and how maybe the way we think about content is going to just be fundamentally changed. Champ, welcome the show.

Champ Bennett: Hey, thanks for having me, Mike. Great to meet you.

Mike Vizard: I think a lot of people are already used to the idea of large language models and texting. People are creating articles, but videos is a whole other venue and a whole other thing. So maybe you might want to walk people through how are these large language models and AI going to transform the way we think about creating and consuming video?

Champ Bennett: Yeah, we’re figuring that out right now. I mean, that’s the most, that’s the beauty of the time that we’re in, is all this stuff is happening very quickly. And it’s very exciting. And having worked in the video and Creator tool space for a very long time. And always been working towards this idea of making content creation simpler and easier. There really just hasn’t been a more interesting time in the last decade, that I’ve been in this space, than now. And so first and foremost is just, you know, we’re figuring it out. Some of the things that we’ve learned – one of the big challenges of video, the storytelling format is, it’s very complex to make. If you talk to anybody in businesses these days, whether it’s marketing teams, or comms teams, or sales or success teams, they’ll all tell you that they want to use more video. It’s the most engaging format they’re publishing. And then they’ll also tell you that they don’t get to use it all that often because of how complex it is to create and how expensive it is – too expensive it is – to produce. And so the thing that we’re learning is that, you know, when you’re making a video, and you want to tell a rich story with video, there’s sort of two parts to that first part which is just collecting all of the assets to put that video together, right? So by assets, I mean, the footage. So what we’re doing right now – we’re shooting some footage, that’s one aspect. There’s the editing aspect and kind of running that against B-roll that might be; B-roll is essentially like additional images or videos that supplement the main storyline. So where do you produce that? Do you make it? Do you pull it from stock photography? Do you generate it to go sound effects, you know, motion graphics, there are all of these different things. And now captions, and then resizing for different platforms; all of these different things that you have to be very, very good at, in order to produce high quality professional looking video. And where we think AI is going to eliminate a lot of that friction is in the creation of those assets. And we’re already seeing that already. And you know, stable diffusion and DALL-E in terms of generating images, you know, some new models coming out now that generate videos at 3,360 frames per second. And so we think about it leveraging these diffusion models and large language models to basically create the assets, and then help you very quickly put those assets together to tell a story. And so eliminating all of the creating part, and just allowing people to tell a story. And the outcome of that will be that a lot more people can do it, because it’s just a lot easier to do.

Mike Vizard: Walk me through an example of how that would work. Let’s say I wanted to create a short film or an ad about a particular product. And I wanted to create the narrative for that; do I just type in some text describing what it is that I’m looking for? And the language model will create that content? Or how does that work?

Champ Bennett: Yeah, so it depends. If you don’t know how to make a video, a lot of people just don’t know where to start. So the first step is kind of coming up with the idea. And you know, one of the things that we’re exploring is using large language models to even just generate ideas for videos, scripts for them, or, you know, chapters for them or sections for them, or even just brainstorming the concept. So that’s one area that’s really interesting to explore. And it could be very text and prompt based. The next area is sort of generating the timeline for that. A timeline is essentially the canvas of a video, so to speak – sort of start to finish how everything in your videos are laid out from, as I said, footage to adding audio to adding sound effects or layering and graphics, etc. So that’s the next step is is just kind of giving you a structure for how that video can essentially be structured from start to finish. And then the next step is actually generating those assets. Right? So you’ll want to go in, and you want to customize those. Again, we might use, you know, diffusion models that generate some images for you, that might be role based on the topic of your video. Rather than going and like searching for stock photos, or stock images or stock videos, I should say, we could generate those for you all the way down to like intro and outro graphics and lower third graphics – all this stuff that, typically would require being a creative professional essentially, can be eliminated with generative AI today. And so, then also using AI to kind of suggest how to improve your video, right? So imagine kind of training in AI, on all the best practices for video today. How long should be videos be? Should there be a free roll here? Or do you want to cut right into the content to make it more engaging? There all of the things that professionals know how to do that non-professionals don’t. And AI that’s trained on that information, that knowledge can be suggested, instead of your copilot so to speak, when you’re creating video content.

Mike Vizard: There’s a small army of people who work on that stuff today in all kinds of agencies and studios. Those people, over time, they evolve into what are they all going to become – storytellers? Or how do you perceive that the role will evolve?

Champ Bennett: I see it going kind of two ways. You know, when we talk to video making teams, these are creative professionals. The things that we hear from them – it’s like, they’ll say, “I went to school for film and I’m a commercial filmmaker,” or “I’m a short filmmaker; I make movies,” and what they’re actually inundated with day in and day out. And this is an enterprise context, which is the area that we’re most interested in – not for small creators, for businesses. The thing that we hear from those creative professionals is that they are just overwhelmed with the amount of content requests from the different parts of the organization. So typically, the structure you’ll see in a large org; somebody like an enterprise company, like a Google or Snowflake or Salesforce or something like that, is they’ll have basically, creative professional teams that are servicing large parts of the organization. These parts of the organization are constantly telling them, “I needed video for this, I need a video for that.” And typically, those videos are not, they don’t really demand real creative expertise. They’re typically short form, maybe they’re only published internally. Maybe it’s going to be on some ephemeral platform that is only going to be up for 24 hours, and then never seen again. And so the thing that excites me is that these creative teams of creative professionals, they’re not going to get replaced. Actually, they’re going to be augmented, they’re going to be scaled essentially. And what they’re gonna be able to do is they’re gonna put tools in the hands of everybody else in your organization that are very, very easy to use, so that they can focus on the more premium stuff, the stuff that is just really hard to do because of the storytelling. Complexity requires a lot more thought, a lot more planning, maybe they’re shooting with high-end studio cameras on location somewhere, or in a studio somewhere. And so I see that sort of divergence of content, and sort of enabling a whole new wave of creators to create, while simultaneously making creative professionals more efficient in their workflows.

Mike Vizard: So there’s certain videos that are just kind of basically wrote, and then there are others that are unique, and they’re one-off, and they’re multifaceted projects. And those require a little more care and feeding. And some need attention from a human as it were. Do you think, therefore, that more companies will be able to afford to use the medium for messaging and for whatever else they’re going to do? Because a lot of times with video, the cost is so high that only large companies really delve into it. So can this get down to the average small business?

Champ Bennett: Absolutely, yes. But I think there’s also an opportunity in large businesses too. I mean, if you look at the way these large companies are making videos; it is typically as I said, there’s a centralized team who is expert at video, and they’re getting inundated with requests. Every single team that we talked to is completely overwhelmed. There’s just not enough time in the day to make as much videos as they want to make. And then the second way is that they work with outside agencies. So both of those processes, if you’re on the outside, and if you’re on the requesting end of that, if you’re trying to get a video made in an organization, what we hear is, from like marketing teams, for example, is they’ll just not make the video. They’ll just say, you know, this is too complex. It’s too expensive. It’s too slow. It’s not timely enough. We’re just not going to make it and so on. Yes, I absolutely see the the opportunities across the board for a variety of different profit centers and ICPs, whether it’s small businesses or large or even consumers. I mean, the world of consumer videos is pretty well established with tools like Tik Tok and Instagram and others. But of course, the generative AI is making that more interesting and more efficient and more compelling.

Mike Vizard: As we go along, do you think that we might be in danger of reaching a point where we have more video than we can possibly consume? And there’s kind of this issue that goes along with it? Well, just how much video can we all watch? I mean, I can’t watch all the videos on the internet today; how am I going to watch it anymore?

Champ Bennett: I think we’re nearing that point on the consumer side of video. And that’s why, you know, recommendation algorithms have become the thing. There’s just so much content out in the world, that it’s impossible for any human editorial system to bubble up the stuff that’s worth watching. And so we’ve started to build algorithms that make it easier to surface that content and create really engaging platforms. On the business side, however, I don’t think that’s true, I actually think that there is way more demand for content than there is supply. And that’s largely because of everything we just talked about; how complex and expensive it is to me.

Mike Vizard: What’s your sense, then? How soon does all this come to be in your mind? I mean, is this one of those situations where it’s already here, but it’s just unevenly distributed? Or is it you know, something we’re looking for in the next six to 12 months or two years?

Champ Bennett: I think it’s probably six to 12 months. I think the technology for generating assets is, is pretty close, but still needs some refinement. It’s also very slow still. Right? So there’s a lot of optimization of these models, that’s going to be happening over the next next six to 12 months that makes these API’s, these models, actually more production ready. So I think we’re in the, we’re still in the seed stage of all this stuff. But I think we’re very quickly going to move into the series A and series B stage over the next year or so. And then I think, you know, in particularly for video, I mean, obviously, we have companies like Jasper and even Restlet, who are doing really interesting things already with the large language models, amongst many other companies. But I think on the video side, it’s kind of day zero. And over the next year, we’ll see some really compelling companies, Capsule being one of them.

Mike Vizard: What’s your sense of how any copyright issues might work? We’ve already seen some lawsuits from Getty Images. And so, how do we navigate that?

Champ Bennett: I have no idea. That’s my honest answer. I think it’s a very complex issue. And, you know, as a creative professional myself, I certainly understand the value of work that’s being made and produced by creators. The other side of the argument is that even creators are constantly kind of tapping into inspiration from people who created before them. Yeah, like, whether it’s your, your favorite artists or your favorite musician. And so I can see the arguments on both sides. And I think it’s just gonna get sorted out. I don’t personally know how, whether it’s some form of regulation, or just society or culture kind of just naturally agrees on how we move forward? I don’t know. But I think it’d be really interesting to watch over the next year.

Mike Vizard: You may not have an answer for this, but a lot of folks are concerned that we’ll be using these capabilities to create replicas of people, and it will lead to all kinds of fraud and cybersecurity issues. Because, you know, we won’t know that it’s me talking to you, or that you’re actually talking to me, because it could be, you know, some sort of avatars. So, how do we have faith? Or am I going to need to like call somebody to make sure that’s you on the other end of this phone call?

Champ Bennett: Yeah, I think technology will sort this out. But I think what’s actually very interesting is that what’s happening already is I’ve noticed that this is something I recognized about myself. I think just paying attention to stuff very closely, it’s now my default assumption that all content is fake until proven real. Right? And sort of, I see that as progress, in a way, because I think previously, it was almost the opposite. It’s like anything you read on the internet is meant to be believed and unverified. And so I’m really interested in this idea of sort of verified content; how that actually gets created and distributed. Not sure yet. I think they’re, probably some interesting technology solutions to that problem. But I do think that it’s worth mentioning that, you know, that problem has existed for a long time. I mean, the idea of fake news is not a new idea. And so for everyone to just kind of assume that everything is fake until proven real; I think it’s an interesting way forward. I don’t know exactly how it plays out. But I think it’s an interesting development.

Mike Vizard: What’s that one thing you think folks are underestimating right now, as we look at all of this?

Champ Bennett: Oh, man, I think folks are underestimating the opportunity to take something that was previously only able to be performed by people who spent years of their life creating a craft, whether that’s writing music or creating videos or writing books or stories, and you bring the cost of that to near zero; and you enable 1000 times, 2000 times, 10,000 times more people to do it. It’s hard to imagine what the impact of that is going to be. But we know that it’s going to be huge. And I think that some people are underestimating that.

Mike Vizard: All right, Champ, thanks for being on the show and sharing your insights, because it is a brave new world and nobody knows exactly what’s going to happen next.

Champ Bennett: Excellent. Thanks for your time.

Mike Vizard: All right, folks. Thank for watching the latest episode of the Digital CxO Leadership Insights series. I’m your host, Mike Vizard. You can find this one and other episodes on the website. And once again, thanks for watching.