AI voice generator from a voice sample in a podcast

Transistor Team

5 min

With all of the excitement around artificial intelligence voice cloning, text-to-speech, and GPT, many podcasters are asking:

Can you use an AI to create a podcast episode?

I wanted to put that question to the test, so I generated audio with a few popular AI voice tools.

My first experiment: a college kid cloned my voice

I couldn’t believe it when a college kid in my town synthesized my voice using AI (with just 30 seconds of source material from YouTube).

He built a Discord bot that, when prompted, would use my voice and ChatGPT to generate audio content. I prompted it to "create a podcast episode for a podcast about extreme rollerblading. Talk about the rollerblading scene in Stony Plain, Alberta," (which is my hometown). Here's the result:

The output was astonishingly close to my voice, capturing his cadence, pauses, and tone.

Welp that’s equally exciting and terrifying. – Andy Claremont

But the question remained: Is it entertaining enough for an entire episode?

Other AI podcast experiments

Joe Rogan and Steve Jobs AI episode

Podcast.ai produced this viral episode where AI-generated voices of Joe Rogan and Steve Jobs converse. Here's the clip:

And here's another audio sample from Podcast.ai, this time featuring an AI version of Zach Galifianakis talking about movies with Quentin Tarantino:

While the voice synthesis in both examples is impressive, the conversations still feel unnatural and stilted.

Mind Meets Machine podcast

I also found the Mind Meets Machine podcast, a unique experiment where a human co-host (Rob) interacts with an AI co-host (Ruby). In this example, they play a word-association game:

AI Bill Gates and Socrates

Another popular online clip was this AI version of Bill Gates and Socrates:

What do you think? Would you listen to a full-length podcast episodes with AI-generated content like this?

Review: AI voice generator tools

Genny

The first tool I tested was Genny from Lovo.ai. Their AI-powered platform specializes in generating human-like voices.

This clip contains two voice samples I created using the service:

Play.ht

The second generator I tested was Play.ht, which states on its website that its AI voices are "indistinguishable from humans." Here's the clip:

The Verdict: can text-to-speech produce full episodes?

While the technology behind AI voice generation is undeniably impressive, the current output lacks the human touch that makes podcasts so compelling.

Listeners seek podcasts for human connection, stories, drama, and news. The nuances, emotions, and authenticity that humans bring to the table are irreplaceable.

The technology with the most promise is probably ElevenLabs. Their synthetic voices were able to reproduce human tone and cadence in a way that sounds natural. (If you want to try cloning your voice, use their free tool). This is the tool that the aforementioned college student used to clone my voice.

Sure, it's not exactly you, but damn it's good enough. And it's so early. Incredible. – Jason Fried

I feel that, while the synthesized voices were remarkable, they still lacked the warmth and connection that human voices offer.

However, in these YouTube comments, folks like Arvid Kahl disagreed:

Honestly, I think we're almost there.

The example you showed with your voice is so incredibly close, and this is really just about having enough data to sufficiently emulate not just a voice, but also a style and an approach.

"Voice" has many meanings when it comes to text, and I think this is where the breakthrough will be. If ChatGPT gets better at mimicking my tone and audio AI gets better at synthesizing my sound, then I am quite afraid for my "job." The interesting part will be proof of humanity.

Already, more and more people are starting podcasts with AI-generated content. However, I haven't found many folks who listen to these shows.

The future of podcasting in an AI world

While the advancements in this technology are undeniably impressive, AI voice generators are not ready to take over the podcasting world yet.

At its core, podcasting is about human connection, storytelling, and the nuances of emotion that (for now) only a real voice can convey. It's about the slight inflections, the spontaneous laughter, the pauses filled with anticipation, and the genuine passion that resonates with listeners on a personal level.

AI-generated voices might be able to mimic human speech patterns, but they can't replicate the raw emotions, personal experiences, or unique perspectives that individual podcasters bring to the table. Many listeners are drawn to the personalities behind the microphone and will often say they have a connection with the host.

Moreover, while AI can produce content, it doesn't have the innate drive to create something truly compelling. Podcasts are more than just "content;" they reflect the creator's worldview, experiences, and emotions. And that's something no AI can truly replicate, no matter how advanced the technology becomes.

Share Your Thoughts

I'd love to hear your thoughts: do you believe AI-generated voices are the future of podcasting? You can leave a comment on:

You can also listen to this as a podcast episode.