feat: add WebVTT converter#14
Conversation
|
Refinement pass pushed in Summary:
Verified locally:
|
|
Thanks for the PR! I like the goal of making transcripts easier to read, but I’m not fully convinced this should land in core as-is. My main concern is that WebVTT is already a plain-text format, and the most important semantic information in captions is the timing. This converter seems to turn it into a nicer transcript, but in doing so it drops or weakens some of that structure:
Since users/agents can already read or parse raw VTT on demand with normal text tooling, I’m trying to understand the core value of converting it to Markdown if the conversion is lossy. Could you explain the intended use case a bit more? In particular, do you see this as:
If we keep this in core, I’d lean toward preserving the caption timing more explicitly, for example, including both start and end timestamps in the timestamped section, and treating any deduped plain transcript as a secondary convenience rather than the canonical output. |
|
I'm using this to "archive" youtube videos i find interesting. I probably could store the vtt, but since my openclaw memory is in Markdown already, i'd like to preserve the coherence. |
Hey there, human here!
I wanted my openclaw to build itself a skill to download YouTube video transcripts, and it did so by downloading a vtt file from YouTube and adding a small python script to convert that to markdown.
I thought it might be a great addition to markit to support vtt to markdown files, with deduplication of rolling subtitles as used by YouTube.
So I vibe coded this PR, if you have any remarks or complaints, send them my way, I'll fix it! :)
Example
Input
sample.vtt:Run:
Output:
YouTube-style rolling captions are deduplicated, so cumulative cue fragments become one readable transcript instead of repeated text.