Extract transcripts, subtitles, and captions from any YouTube video. Supports playlists, channels, 100+ languages, and both manual and auto-generated captions. Perfect for AI training data, RAG pipelines, content repurposing, and SEO analysis. No YouTube API key required.
flowchart LR
A["YouTube URLs<br/>Videos, Playlists, Channels"] --> B["InnerTube API<br/>Android / iOS / TV clients"]
B --> C["Caption Tracks<br/>100+ Languages"]
C --> D["Structured Output<br/>Full Text + Timestamps"]
D --> E["Your Pipeline<br/>RAG / AI / Content"]
style A fill:#ff0000,color:#fff,stroke:none
style B fill:#1a1a2e,color:#fff,stroke:none
style C fill:#0f3460,color:#fff,stroke:none
style D fill:#533483,color:#fff,stroke:none
style E fill:#34a853,color:#fff,stroke:none
This extractor uses YouTube's internal InnerTube API to fetch caption tracks directly — no YouTube Data API key required, no OAuth, no quotas. It tries multiple client types (Android, iOS, TV) to maximize success rate, even for music videos and VEVO content.
{
"videoId": "dQw4w9WgXcQ",
"videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up",
"channelName": "Rick Astley",
"viewCount": 1500000000,
"publishDate": "2009-10-25",
"language": "en",
"isAutoGenerated": true,
"hasTranscript": true,
"transcriptText": "We're no strangers to love. You know the rules and so do I...",
"wordCount": 284,
"segments": [
{
"text": "We're no strangers to love",
"start": 18.0,
"duration": 3.5,
"startFormatted": "0:18"
}
],
"availableLanguages": [
{ "code": "en", "name": "English", "isAutoGenerated": true },
{ "code": "es", "name": "Spanish", "isAutoGenerated": true }
]
}curl "https://api.apify.com/v2/acts/george.the.developer~youtube-transcript-scraper/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
-X POST \
-d '{
"urls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
"language": "en",
"outputFormat": "both",
"includeMetadata": true
}' \
-H 'Content-Type: application/json'import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('george.the.developer/youtube-transcript-scraper').call({
urls: [
'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
'https://www.youtube.com/playlist?list=PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf',
],
language: 'en',
outputFormat: 'full-text',
includeMetadata: true,
maxVideos: 50,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(video => {
console.log(`${video.title} (${video.wordCount} words)`);
console.log(video.transcriptText.substring(0, 200) + '...');
});from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
# Extract transcripts from an entire playlist
run = client.actor("george.the.developer/youtube-transcript-scraper").call(run_input={
"urls": ["https://www.youtube.com/playlist?list=YOUR_PLAYLIST_ID"],
"language": "en",
"outputFormat": "full-text",
"includeMetadata": True,
"maxVideos": 100,
})
# Build documents for RAG/vector store
documents = []
for video in client.dataset(run["defaultDatasetId"]).iterate_items():
if video.get("hasTranscript"):
documents.append({
"text": video["transcriptText"],
"metadata": {
"source": video["videoUrl"],
"title": video["title"],
"channel": video["channelName"],
"date": video.get("publishDate", ""),
}
})
print(f"Built {len(documents)} documents for RAG pipeline")
# Feed into your vector store (Pinecone, Weaviate, Chroma, etc.)
# for doc in documents:
# vector_store.add(doc["text"], metadata=doc["metadata"])- AI Training Data — Build datasets from YouTube transcripts for LLM fine-tuning or NLP research
- RAG Pipelines — Index video content in vector databases for retrieval-augmented generation
- Content Repurposing — Turn videos into blog posts, newsletters, social media threads
- SEO Analysis — Analyze competitor video content and keyword usage
- Accessibility — Generate text versions of video content
- Research — Analyze speeches, lectures, interviews, and educational content at scale
- Podcast Notes — Auto-generate show notes from video podcasts
| Parameter | Type | Default | Description |
|---|---|---|---|
urls |
string[] | required | YouTube video, playlist, or channel URLs |
language |
string | en |
Preferred language code |
includeTimestamps |
boolean | true | Include start time per segment |
outputFormat |
string | both |
full-text, segments, or both |
maxVideos |
number | 50 | Max videos to process (1-5000) |
includeMetadata |
boolean | true | Include title, channel, views, etc. |
maxConcurrency |
number | 5 | Concurrent requests |
https://www.youtube.com/watch?v=VIDEO_IDhttps://youtu.be/VIDEO_IDhttps://www.youtube.com/playlist?list=PLAYLIST_IDhttps://www.youtube.com/@ChannelNamehttps://www.youtube.com/channel/CHANNEL_IDhttps://www.youtube.com/shorts/VIDEO_ID
Run this actor on Apify — extract transcripts from hundreds of videos in minutes.
Prefer a standard REST API? This extractor is also available on RapidAPI with simple API key authentication:
- Free tier: 30 requests/month
- Pro: $19/month (500 requests)
- Ultra: $49/month (2,000 requests)
- Mega: $129/month (10,000 requests)
- Not all YouTube videos have captions/transcripts. The extractor reports
hasTranscript: falsefor videos without available captions. - Auto-generated captions may contain errors (especially for technical jargon or non-English content).
- This tool does not bypass age restrictions or geo-blocked content. Using a proxy can help with geo-restrictions.
- Google News Scraper — Monitor brand mentions across news sources
- LinkedIn Employee Scraper — Extract employee data from any company
- Website Contact Scraper — Find emails & contacts from any website
- US Tariff Lookup — Look up import duty rates & HS codes
ISC License. See LICENSE for details.
Built by george.the.developer on Apify.