Media Extended

Transcript & Subtitle

How transcripts make media searchable, navigable, and quotable inside Obsidian

A transcript is a text file linked to a piece of media. It turns spoken words into text you can search, navigate by clicking, and quote in your notes.

What a transcript is

Media Extended treats subtitle files (.srt, .vtt, .ass, .ssa) as transcripts. Each file contains a list of cues: short text segments tied to a time range in the media. A cue might be a single sentence, a phrase, or a caption.

When a transcript is linked to media, the plugin opens it in a transcript view, an interactive panel where you can read along, jump to any moment, and pull quotes into your notes.

Where transcripts come from

Transcripts reach the plugin two ways: the plugin detects files sitting next to the media, or you link them explicitly in a media note's frontmatter. Hosting services like YouTube, Bilibili, and Coursera also provide subtitles when available. These appear alongside local tracks without any setup.

Sibling files

If a subtitle file sits in the same folder as a media file and shares its base name, the plugin pairs them automatically. For example, lecture.mp4 matches any of these:

Subtitle fileDetected language
lecture.srt(none)
lecture.en.srten
lecture.fr.vttfr
lecture.zh.asszh

The plugin checks the last dot-separated segment before the extension for a two-letter ISO 639-1 language code. Codes like en and fr are detected automatically. Longer locale strings like en-US are not inferred from filenames. Set those with a lang hash parameter in a frontmatter link instead.

You don't need to configure anything. The plugin scans the media file's folder each time a transcript view opens.

Linked in frontmatter

Media notes can list subtitle and caption tracks in their frontmatter using two fields:

FieldTrack kind
subtitlesSubtitles (translation of dialogue)
captionsCaptions (transcription of all audio, including sound effects)

Each entry is a wiki-link to a vault file or a URL. You can add hash parameters for language, label, and default status:

subtitles:
  - "[[lecture.en.srt#lang=en&label=English&default]]"
  - "[[lecture.fr.srt#lang=fr&label=French]]"

The field name determines the track's kind. When no lang parameter is set, the plugin infers language from the filename. See the Subtitle Track Properties reference for all available hash parameters.

You don't have to write these links by hand. The player's pane menu has an import dialog that builds the link for you — see Add a Subtitle to Media. The YouTube subtitle download also saves tracks directly into these frontmatter fields.

Subtitle resolution

When a transcript view opens, the plugin searches for subtitle files in this order:

  1. Frontmatter links in the media note
  2. Sibling files in the same folder with a matching name and a subtitle extension
  3. Subtitles from the hosting service (YouTube, Bilibili, Coursera)

If multiple files match, they all appear as available tracks. You can switch the active track within the transcript view.

Viewing modes

The transcript view has two layout modes. Each view remembers its mode independently.

Transcript mode (the default) groups cues into flowing paragraphs. Longer transcripts are easier to read this way: natural blocks of text instead of one line per cue. Timestamps appear in the left margin for each paragraph, and auto-scroll follows paragraphs rather than individual lines.

Subtitle mode shows one cue per line, each with its own timestamp. This layout works well for seeing exact timing boundaries or working with short, dense captions like song lyrics.

You can switch modes from the pane menu. The default is configurable in Settings > Transcript > Default view for subtitle tabs.

Per-view controls

Each transcript view has three toggles in the pane menu:

ToggleWhat it does
Transcript modeSwitch between paragraph and one-cue-per-line layout
Show timestampsShow or hide the timestamp gutter
Highlight current wordHighlight the word being spoken during playback

These settings persist per view across sessions.

How the transcript view connects to the player

The transcript view stays connected to a media player and responds to playback in real time. For this to work, it first needs to figure out which player it belongs to.

Finding the right player

When a transcript view opens, the plugin works backwards from the subtitle file to the media. It checks which media files reference this transcript (through frontmatter links or sibling file matching), then looks for an open pane playing one of those files.

If several panes have the same media open, the plugin picks the one you interacted with most recently. When you switch between panes, it re-evaluates and follows the newly focused player. You can turn sync off from the pane menu if you want the transcript to stay still.

Interacting with the transcript

Click any timestamp or cue to seek. The connected player jumps to that moment without opening a second copy. If no player is open yet, the plugin opens the linked media and seeks to the clicked position after loading.

While media plays, the transcript highlights the active cue and scrolls to keep it visible. This works with both local players and the web viewer. The sync toggle in the pane menu controls whether auto-scroll is active.

The transcript view supports in-view search (Ctrl/Cmd+F or Find... in the pane menu). While search is active, auto-scroll pauses so the view stays on the search results instead of following playback. Closing the search bar returns to the current playback position and resumes auto-scroll.

Text copied from the transcript carries metadata about the track and time range it came from. You can paste it into a note alongside a timestamp to create a quote that links back to the source moment. The Read Along with a Transcript tutorial walks through this workflow.

On this page