YouTube Transcript Extractor — Get Video Transcripts for AI, RAG & Content Repurposing

Extract transcripts, subtitles, and captions from any YouTube video. Supports playlists, channels, 100+ languages, and both manual and auto-generated captions. Perfect for AI training data, RAG pipelines, content repurposing, and SEO analysis. No YouTube API key required.

What It Does

flowchart LR
    A["YouTube URLs<br/>Videos, Playlists, Channels"] --> B["InnerTube API<br/>Android / iOS / TV clients"]
    B --> C["Caption Tracks<br/>100+ Languages"]
    C --> D["Structured Output<br/>Full Text + Timestamps"]
    D --> E["Your Pipeline<br/>RAG / AI / Content"]

    style A fill:#ff0000,color:#fff,stroke:none
    style B fill:#1a1a2e,color:#fff,stroke:none
    style C fill:#0f3460,color:#fff,stroke:none
    style D fill:#533483,color:#fff,stroke:none
    style E fill:#34a853,color:#fff,stroke:none

This extractor uses YouTube's internal InnerTube API to fetch caption tracks directly — no YouTube Data API key required, no OAuth, no quotas. It tries multiple client types (Android, iOS, TV) to maximize success rate, even for music videos and VEVO content.

What Data You Get

{
  "videoId": "dQw4w9WgXcQ",
  "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "title": "Rick Astley - Never Gonna Give You Up",
  "channelName": "Rick Astley",
  "viewCount": 1500000000,
  "publishDate": "2009-10-25",
  "language": "en",
  "isAutoGenerated": true,
  "hasTranscript": true,
  "transcriptText": "We're no strangers to love. You know the rules and so do I...",
  "wordCount": 284,
  "segments": [
    {
      "text": "We're no strangers to love",
      "start": 18.0,
      "duration": 3.5,
      "startFormatted": "0:18"
    }
  ],
  "availableLanguages": [
    { "code": "en", "name": "English", "isAutoGenerated": true },
    { "code": "es", "name": "Spanish", "isAutoGenerated": true }
  ]
}

Quick Start

cURL

curl "https://api.apify.com/v2/acts/george.the.developer~youtube-transcript-scraper/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -X POST \
  -d '{
    "urls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
    "language": "en",
    "outputFormat": "both",
    "includeMetadata": true
  }' \
  -H 'Content-Type: application/json'

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const run = await client.actor('george.the.developer/youtube-transcript-scraper').call({
    urls: [
        'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
        'https://www.youtube.com/playlist?list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf',
    ],
    language: 'en',
    outputFormat: 'full-text',
    includeMetadata: true,
    maxVideos: 50,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(video => {
    console.log(`${video.title} (${video.wordCount} words)`);
    console.log(video.transcriptText.substring(0, 200) + '...');
});

Python — Build a RAG Pipeline

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

# Extract transcripts from an entire playlist
run = client.actor("george.the.developer/youtube-transcript-scraper").call(run_input={
    "urls": ["https://www.youtube.com/playlist?list=YOUR_PLAYLIST_ID"],
    "language": "en",
    "outputFormat": "full-text",
    "includeMetadata": True,
    "maxVideos": 100,
})

# Build documents for RAG/vector store
documents = []
for video in client.dataset(run["defaultDatasetId"]).iterate_items():
    if video.get("hasTranscript"):
        documents.append({
            "text": video["transcriptText"],
            "metadata": {
                "source": video["videoUrl"],
                "title": video["title"],
                "channel": video["channelName"],
                "date": video.get("publishDate", ""),
            }
        })

print(f"Built {len(documents)} documents for RAG pipeline")

# Feed into your vector store (Pinecone, Weaviate, Chroma, etc.)
# for doc in documents:
#     vector_store.add(doc["text"], metadata=doc["metadata"])

Use Cases

AI Training Data — Build datasets from YouTube transcripts for LLM fine-tuning or NLP research
RAG Pipelines — Index video content in vector databases for retrieval-augmented generation
Content Repurposing — Turn videos into blog posts, newsletters, social media threads
SEO Analysis — Analyze competitor video content and keyword usage
Accessibility — Generate text versions of video content
Research — Analyze speeches, lectures, interviews, and educational content at scale
Podcast Notes — Auto-generate show notes from video podcasts

Input Parameters

Parameter	Type	Default	Description
`urls`	string[]	required	YouTube video, playlist, or channel URLs
`language`	string	`en`	Preferred language code
`includeTimestamps`	boolean	true	Include start time per segment
`outputFormat`	string	`both`	`full-text`, `segments`, or `both`
`maxVideos`	number	50	Max videos to process (1-5000)
`includeMetadata`	boolean	true	Include title, channel, views, etc.
`maxConcurrency`	number	5	Concurrent requests

Supported URL Formats

https://www.youtube.com/watch?v=VIDEO_ID
https://youtu.be/VIDEO_ID
https://www.youtube.com/playlist?list=PLAYLIST_ID
https://www.youtube.com/@ChannelName
https://www.youtube.com/channel/CHANNEL_ID
https://www.youtube.com/shorts/VIDEO_ID

Run on Apify

Run this actor on Apify — extract transcripts from hundreds of videos in minutes.

Also Available on RapidAPI

Prefer a standard REST API? This extractor is also available on RapidAPI with simple API key authentication:

Free tier: 30 requests/month
Pro: $19/month (500 requests)
Ultra: $49/month (2,000 requests)
Mega: $129/month (10,000 requests)

Limitations

Not all YouTube videos have captions/transcripts. The extractor reports hasTranscript: false for videos without available captions.
Auto-generated captions may contain errors (especially for technical jargon or non-English content).
This tool does not bypass age restrictions or geo-blocked content. Using a proxy can help with geo-restrictions.

Related Tools

Google News Scraper — Monitor brand mentions across news sources
LinkedIn Employee Scraper — Extract employee data from any company
Website Contact Scraper — Find emails & contacts from any website
US Tariff Lookup — Look up import duty rates & HS codes

License

ISC License. See LICENSE for details.

Built by george.the.developer on Apify.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube Transcript Extractor — Get Video Transcripts for AI, RAG & Content Repurposing

What It Does

What Data You Get

Quick Start

cURL

Node.js

Python — Build a RAG Pipeline

Use Cases

Input Parameters

Supported URL Formats

Run on Apify

Also Available on RapidAPI

Limitations

Related Tools

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

YouTube Transcript Extractor — Get Video Transcripts for AI, RAG & Content Repurposing

What It Does

What Data You Get

Quick Start

cURL

Node.js

Python — Build a RAG Pipeline

Use Cases

Input Parameters

Supported URL Formats

Run on Apify

Also Available on RapidAPI

Limitations

Related Tools

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages