YouTube Video Transcript Extraction

How to Get YouTube Transcripts

  1. Install yt-dlp via Homebrew: brew install yt-dlp
  2. Download auto-generated subtitles:
    yt-dlp --write-auto-sub --sub-lang en --skip-download --sub-format vtt -o "/tmp/video-name" "https://www.youtube.com/watch?v=VIDEO_ID" 2>&1
  3. Clean the VTT file to readable text with Python:
    python3 -c "
    import re
    with open('/tmp/video-name.en.vtt') as f:
        lines = f.readlines()
    text_lines = []
    seen = set()
    for line in lines:
        line = line.strip()
        if not line or line.startswith('WEBVTT') or line.startswith('Kind:') or line.startswith('Language:'):
            continue
        if re.match(r'^\d{2}:\d{2}', line) or '-->' in line:
            continue
        clean = re.sub(r'<[^>]+>', '', line)
        if clean and clean not in seen:
            seen.add(clean)
            text_lines.append(clean)
    print(' '.join(text_lines))
    "

Notes

  • yt-dlp is installed at /usr/local/bin/yt-dlp (Homebrew)
  • Works with YouTube short URLs (youtu.be/...) and full URLs
  • Auto-generated captions available on most videos
  • Use --sub-lang en for English (change for other languages)
  • The oEmbed API can get video title/metadata without downloading: https://www.youtube.com/oembed?url=https://www.youtube.com/watch?v=VIDEO_ID&format=json