Three ways to sync transcripts with DAI podcasts

Three ways to sync transcripts with DAI podcasts

The introduction of the podcast:transcript tag has been a massive leap forward, allowing podcast apps to display transcripts in sync with the audio. However, shows using dynamic ad insertion have not been able to benefit from this feature. In today’s world, with SiriusXM facing a lawsuit for not offering transcripts for its podcasts, all publishers need to prioritize transcripts, and podcast apps should strive for a best-in-class experience for their users.

Why can’t DAI podcasts use the transcript tag?

With each host implementing DAI in their own way, there are many flavors of dynamic ad insertion. To offer audience targeting or programmatic advertising, enterprise-grade hosts must decide which ads will be served after the podcast has received the request yet before the episode is sent to the listener.1

Two people standing beside each other could download the same episode and receive different ad placements and episode durations. If you served a transcript with the same timestamps to both, it would quickly become apparent that the timestamps and audio are drifting further out of sync with every ad break. To avoid this, the podcast would need to provide a unique transcript for each request resynced to account for the placement and duration of the ads.

1. Client-side syncing

Recently, Snipd announced that they were using AI to detect dynamic ads and resync the transcripts with the audio. It is especially impressive that they do this on-device every time a listener streams an episode. Snipd is transcribing all English podcasts themselves, but they aren’t respecting existing transcripts in the RSS feed. That’s a real shame if their automated transcription has any mistranscriptions that the podcasters could have addressed.

2. ID3 Tags

Any DAI podcast with chapters already adjusts the chapter markers in their ID3 tags to account for ad insertion. Similarly, the ID3 spec also includes a way to provide synchronized lyrics/text in MP3 files. Podcasts could provide resynced transcripts with each response. However, without podcast apps willing to display them, there hasn’t been an incentive for DAI podcasts to adopt this method.

Recently, Chris Quamme Rhoden (Co-founder & CTO of RadioPublic, acquired by Acast) suggested a new approach: Link headers. When an audio file is requested, a podcast could provide one or more Link headers attached to the audio response pointing to a transcript tailored specifically for that request. This has the advantage over MP3 transcripts of working with any media codec, avoids dealing with ID3 tags, and supports multiple transcript formats or translations per request. Publishers would need to serve dynamically generated transcripts tailored for each request, and podcast apps would need to cache them alongside the downloaded audio. This is the most promising solution, but it has the furthest to go to achieve widespread adoption.

Looking ahead

I hope to see the podcasting industry fully embrace synced transcripts to provide an accessible experience for all types of shows, regardless of how they’re monetized. So many great features are unlocked if we widely adopt high-quality transcripts, and it feels like that future’s in sight.

  1. Hosts like Buzzsprout, Transistor, and Captivate can support the transcript tag specifically because they don’t support Time of Download Decisioning in their flavor of DAI.