Best AI transcription software for podcasts
We reviewed the top AI tools for generating show notes, transcripts, timestamps, title suggestions, and more
Transistor Team
Many new AI tools will automatically transcribe your podcast episodes, generate show notes, suggest titles, and more. The promise is that these tools will save you time: instead of having to write your titles and show notes manually, these tools will produce them for you.
I wanted to test these services and see if they deliver on that promise. In the past, I've tried using ChatGPT for doing this work but found it difficult (mostly because you have to split your transcript into separate prompts because of character limits). The idea of a tool built specifically for podcasters is interesting.
I used this MP3 and this MP3 as the samples for these tests. You can see all the outputs in this GitHub repo.
Here are the 5 best AI podcast transcription tools I tested:
1. Castmagic
Castmagic had the best overall interface and user experience and produced the most accurate transcript. In particular, their ability to do speaker identification (diarisation) was excellent.
The transcript that Castmagic produced was the most accurate of all the tools we tested, and required the least amount of editing. Here's a sample of their transcript output:
Castmagic also provides other AI-generated content like titles, keywords, speaker bios, introduction, and timestamps. Sometimes the initial suggestions required tuning, but overall I found these useful.
For example, here are the sample timestamps for this James Clear interview:
Ratings:
Transcription speed: ★★★★☆ (4 minutes and 42 seconds)
Transcription accuracy: ★★★★☆
Speaker diarisation: ★★★★★
User experience: ★★★★★
Usefulness of other generated output: ★★★☆☆
Cost: 200 minutes per month is $39/month (free trial available)
2. Podium
Podium's pitch is that they'll be your "AI copywriter for podcast show notes, articles, social posts, and more."
Unlike other services on this list, Podium produces a downloadable package of text files and doesn't have a UI for navigating and editing your transcript. This means you must manually modify the speaker names in the text file. (They also have an API for integrations, which may be more of a focus for them).
Overall, the initial transcript they produced was fairly accurate and required less editing than other services. However, as you'll see below, it occasionally cuts off the end of one speaker's line, and attributes it to the next speaker:
The highlights text they generated for my first episode wasn't especially useful, but they were better in my second test.
However, the show notes they generated were excellent. This included title suggestions and timestamp/chapter suggestions:
These were some of the few outputs that I felt might actually save me time when producing a new episode. It helped me quickly grab a great title, summary, and timestamps.
Podium was also the most affordable tool we tested.
Ratings:
Transcription speed: ★★★★☆ (4 minutes and 26 seconds)
Transcription accuracy: ★★★★★
Speaker diarisation: ★★★★☆
User experience: ★★★☆☆
Usefulness of other generated output: ★★★★☆
Cost: 180 minutes per month is $9/month
3. Podsqueeze
Of all the options, Podsqueeze generated the most useful outputs for Show Notes, Timestamps, Titles, Mentions, Sample Blog Posts, and Key Quotes.
Podsqueeze also had one of the fastest transcription speeds (it only took 2 minutes and 36 seconds before we could access the transcript).
However, their generated transcript had numerous errors, including misattributing big content blocks to the wrong speaker. For example, here's a comparison between Podsqueeze's transcription and speaker identification with Castmagic:
Ratings:
Transcription speed: ★★★★★ (2 minutes and 36 seconds)
Transcription accuracy: ★★★☆☆
Speaker diarisation: ★★★☆☆
User experience: ★★★★☆
Usefulness of other generated output: ★★★★☆
Cost: free plan for 50 minutes per month, 150 minutes per month is $15/month
4. Swell AI
Swell AI promises to "write detailed summaries, time-stamps, key topics, and more so you spend less time doing podcast SEO."
Swell's user interface wasn't as intuitive as other options we reviewed, and it had some problems identifying who was speaking at different times. It also took the longest to process the transcription.
However, it did provide some useful outputs for show notes.
Another cool feature of Swell is they provide an embeddable AI chatbot where listeners can ask questions about the episode:
Ratings:
Transcription speed: ★★★☆☆ (8 minutes and 43 seconds)
Transcription accuracy: ★★★☆☆
Speaker diarisation: ★★★☆☆
User experience: ★★★☆ ☆
Usefulness of other generated output: ★★★★☆
Cost: 300 minutes per month is $29/month
5. Descript
Descript isn't quite in the same category as these other tools, in that it doesn't generate automatic show notes, quotes, titles, and timestamps.
However, of all the tools we tested, Descript was the fastest (1 minute and 34 seconds) to produce a fairly accurate transcript.
Descript's software also makes it easy to detect speakers (it gives you an audio preview, and gets you to label each speaker).
While its speaker diarisation is fairly good, it often splits a speaker's sentences, cutting them off abruptly and attributing words to the next speaker:
And while it won't automatically generate show notes for you, it's a fully featured podcast editing tool, with the ability to share its interactive player or create video audiogram clips.
For folks just starting a podcast, Descript's built-in editing and transcription might be a good place to start.
Ratings
Transcription speed: ★★★★★ (1 minute and 34 seconds)
Transcription accuracy: ★★★★☆
Speaker diarisation: ★★★★☆
User experience: ★★★★★
Usefulness of other generated output: ★★★★☆
Cost: 600 minutes per month is $12/month
Do AI tools save you time generating podcast transcripts?
The biggest disadvantages to human transcription are that it takes a long time, and costs more. Generally, the fastest turnaround is a single day (12 hours) which will cost you $2.50 per minute. If you're OK waiting a week, the cost goes down to $1.50 per minute.
The advantage of human transcription is that it's generally more accurate.
AI tools, on the other hand, are much faster (a transcript can be generated in under 10 minutes), and much more affordable. For example, Descript will allow you to transcribe 10 hours of audio a month, for $12 (that's $0.02 per minute).
However, AI-generated transcripts generally contain more errors and still require some editing. It feels like the transcripts are "75% of the way there," but still require a human to fix that last 25% (which can be time-consuming!)
Can AI tools save you time preparing podcast show notes, timestamps, and posts?
I know how it feels to finish editing an episode and then want to rush to publish it.
Once you have your finished MP3, the biggest timesuck is figuring out episode titles, writing show notes, finding timestamps, and writing social media posts.
I did find that these tools were helpful in generating initial drafts of my titles, show notes, and writing content for Twitter, LinkedIn, my blog, etc.
It's nice (especially when you're tired) to have a service that makes recommendations, which you can edit and tweak. It does make the publishing process faster.
That being said, it still requires effort to choose the relevant content and edit it. We're not at the stage where we can have all of this on auto-pilot (have a machine automatically edit our episode, generate a title and show notes, and publish it for us).