YouTube IP V6 is the current presentation-ready, Streamlit-deployable release. It carries forward the V5 product surface—seven sidebar destinations, public-only Channel Insights, bundled CSV benchmarking, and live YouTube / Gemini / OpenAI workflows—while refreshing the Streamlit entrypoint (streamlit_app.py → dashboard/app.py), built-in multipage navigation (st.navigation), dependency hardening (for example NumPy, PyArrow, and Google API client stacks on Windows and Cloud), and UI polish.
Live app (V6):
Earlier V5 deployment (historical): youtube-ip-v5.streamlit.app
Quick jump:
| If you want to understand... | Start here |
|---|---|
| current runtime and page flows | Architecture |
| deployment targets, secrets, and version mapping | Deployment And Versions |
| the full project story from V1 through V5 and into V6 | Project Brief |
| how to explain Outlier Finder in a demo or presentation | Outlier Finder Presentation Notes |
| a GCP-friendly env template | .env.gcp.example |
| Metric | Current Count / State |
|---|---|
| Deployed versions documented here | 6 (V1–V6) |
| Live Streamlit app links (including historical) | 6 |
| Current V6 sidebar destinations | 7 |
| Current primary data paths | 2 |
| Current Channel Insights topic modes | 2 |
| Current live provider families | 3 (YouTube, Gemini, OpenAI) |
| Optional model-artifact path | 1 (BERTopic beta) |
- V6 keeps the same two core data paths as V5: bundled GitHub CSVs for benchmarking and live API pulls for channel, outlier, tool, and thumbnail workflows.
Channel Insightsremains public-only with two topic modes: default heuristics and optional BERTopic beta.Thumbnails,Ytuber, andToolscontinue the creative and AI-assisted workflows from V5.- V6 focuses on reliable deployment: explicit
run()entry fromstreamlit_app.py, Streamlit’sst.navigation/pg.run()router, clearer dependency pins (pyarrow,google-api-python-client,google-auth-httplib2), and layout/CSS fixes for the global hero header. - V5 already removed the V4
Assistantand Google OAuth owner overlays; V6 does not bring them back.
The original goal of the project was simple: help small-to-mid-sized YouTube creators make better content decisions with better intelligence than YouTube Studio alone provides. The early versions focused on public metadata, cross-channel benchmarking, semantic topic modeling, and AI-assisted recommendations so the team could answer questions like:
- What content themes actually perform across comparable channels?
- Which topics, formats, and publishing patterns correlate with stronger performance?
- How can a creator move from raw channel data to a usable action plan for titles, thumbnails, and next videos?
That core question stayed constant across every version. What changed was the way the app packaged the answer.
Short history before the deeper tables below.
flowchart LR
V1["V1<br/>analytics prototype"] --> V2["V2<br/>creator-suite expansion"]
V2 --> V3["V3<br/>multi-page productization"]
V3 --> V4["V4<br/>deepest intelligence layer"]
V4 --> V5["V5<br/>cleanup + presentation-ready shell"]
V5 --> V6["V6<br/>deploy + router refresh"]
V1 --- A1["Public metadata, BERTopic framing, recommendation concept"]
V2 --- A2["Ytuber suite, workflow expansion, AI generation, deploy practicality"]
V3 --- A3["Clear page architecture, stronger runtime docs, outlier workflow"]
V4 --- A4["Channel Insights, Assistant, Google OAuth, owner analytics, BERTopic beta"]
V5 --- A5["Assistant removed, OAuth removed, public-only insights, retained AI suite"]
V6 --- A6["st.navigation multipage, entry run loop, deps + header UX for Cloud/local"]
| Version | Live App | Main Goal | Headline Additions | Major Simplifications / Later Changes | Status |
|---|---|---|---|---|---|
V1 |
youtube-stats-ip.streamlit.app | prove the analytics + recommendation concept | public YouTube analytics framing, BERTopic modeling direction, initial Streamlit dashboard, thumbnail dashboard | later versions expanded beyond the original compact dashboard | historical prototype |
V2 |
youtube-stats-ip-v2.streamlit.app | expand into a creator operating system | creator workflow framing, richer app shell, advanced Ytuber suite, deploy guidance | some modules were later split, simplified, or removed for maintainability | historical expansion |
V3 |
youtube-ip-v3.streamlit.app | turn the project into a clearer product | five-page product shell, strong runtime architecture, outlier workflow, better repo map | later versions added deeper insights and then simplified again | historical productization |
V4 |
youtube-ip-v4.streamlit.app | deepen intelligence and tracked-channel analysis | Channel Insights, sidebar Assistant, Google OAuth owner analytics, optional BERTopic beta | V5 removed the Assistant and Google OAuth to reduce deployment complexity | historical deep-intelligence release |
V5 |
youtube-ip-v5.streamlit.app | keep the best parts and document the journey | public-only Channel Insights, retained AI suite pages, Thumbnails rename, consolidated docs, optional BERTopic beta | lighter than V4, easier to reason about, presentation-ready documentation | historical release |
V6 |
ip-youtube-creator-insight-2026.streamlit.app | ship a dependable Cloud + local runtime on top of V5 | Streamlit st.navigation router, streamlit_app → run() entry, explicit PyArrow / Google client deps, sidebar + hero layout fixes |
product scope matches V5; focus is operational reliability | current release |
| Area | V1 | V2 | V3 | V4 | V5 | V6 |
|---|---|---|---|---|---|---|
| Core problem | creator strategy from public data | same | same | same | same | same |
| Bundled dataset benchmarking | present | present | strong | present | present | present |
| Creator workspace | early | expanded heavily | strong | present | present | present |
| Outlier research | emerging | present in suite | strong standalone page | strong | strong | strong |
| Thumbnail generation | present | expanded | present | present | present | present |
| Channel Insights tracked snapshots | absent | absent | absent | added | retained | retained |
| Sidebar Assistant | absent | absent | absent | added | removed | removed |
| Google OAuth / owner analytics | absent | absent | absent | added | removed | removed |
| Optional BERTopic model path | conceptual | conceptual | documented stack direction | added to runtime | retained | retained |
| Presentation-quality consolidated docs | basic | practical | stronger | strong | strongest | strongest |
Streamlit multipage router (st.navigation) |
— | — | — | — | custom / evolving | built-in |
- Public-data-first creator intelligence stayed central from V1 onward.
- Bundled CSV benchmarking survived as
Channel Analysis. - Live public-channel analysis survived and matured into
Channel Insights. - AI-assisted creative generation survived through
ThumbnailsandYtuber. - Outlier research survived and remains a distinct workflow.
- Optional BERTopic topic modeling survived, but as a guarded beta path rather than a required dependency.
- Streamlit deployment remained the delivery surface for every released version.
| Capability | Highest-Version Form | Why It Was Valuable | Why It Was Reduced Or Removed In V5 / Unchanged In V6 |
|---|---|---|---|
Sidebar Assistant |
V4 | gave global help and retrieval-driven guidance across pages | added surface area, more maintenance, and more cognitive load than the lighter shell needed; still absent in V6 |
| Google OAuth + owner analytics overlay | V4 | enabled private metrics such as impressions, CTR, watch time, and retention | raised deploy complexity, secrets burden, and public-vs-owner workflow branching; still absent in V6 |
| Heavier mixed recommendation UI | V3 / V4 | combined strategy guidance with creative tooling | overlapped with Channel Analysis and Channel Insights; V5 narrows page 3 to Thumbnails (unchanged in V6) |
| Broader “everything page” behavior in earlier suites | V2 | useful during exploration and experimentation | harder to document and reason about than clearly separated workflows |
| Always-growing feature surface | V2-V4 | helped the team test many ideas quickly | eventually created duplication, deployment weight, and documentation drift |
The current V6 sidebar order is:
Channel AnalysisChannel InsightsThumbnailsOutlier FinderYtuberToolsDeployment
This is the high-level page map. The detailed runtime handoffs for each page live in Architecture.
| Page | What Problem It Solves | Main Inputs | Main Outputs | Runtime Type |
|---|---|---|---|---|
Channel Analysis |
benchmark bundled datasets and compare portfolio-level performance | committed CSV data in data/youtube api data/ |
KPI cards, trend charts, channel/video rankings | dataset-backed |
Channel Insights |
analyze one tracked public channel over time | live YouTube Data API pulls, snapshot history, optional BERTopic | topic trends, format patterns, outliers, next-topic ideas | mixed |
Thumbnails |
generate and export thumbnails without mixing broader strategy UI | Gemini/OpenAI image calls, public thumbnail URLs | generated concepts, preview cards, downloads | mixed |
Outlier Finder |
find breakout videos in a niche | live YouTube API scans and outlier scoring | scored outlier tables, breakout snapshots, AI research | mixed |
Ytuber |
run a creator-focused live workspace | live channel data, AI generation, creator tools | audits, planner outputs, AI Studio results | mixed |
Tools |
inspect and export public YouTube assets | YouTube metadata, transcripts, yt-dlp, ffmpeg | previews, transcript exports, audio/video/thumbnail downloads | API-backed |
Deployment |
show deployment/setup guidance inside the app | static in-app instructions | repo, branch, secrets, deployment notes | static |
This section is about the live app today, not the historical versions.
| Page | User Goal | Main Inputs | Main Services Used | Main Outputs | Runtime Type |
|---|---|---|---|---|---|
Channel Analysis |
compare bundled channel/video benchmarks | committed CSV files, filters, date ranges | pandas transforms, visualization helpers | KPI cards, monthly trends, top channels, top videos | dataset-backed |
Channel Insights |
analyze a tracked public channel over time | channel URL/handle/ID, optional beta topic mode, snapshot refresh actions | public_channel_service, channel_insights_service, channel_snapshot_store, topic_model_runtime, model_artifact_service |
topic trends, format metrics, outliers, next-topic ideas, history | mixed |
Thumbnails |
generate new thumbnail concepts or export a public one | title/context/style prompts, provider/model choice, YouTube video URL or ID | thumbnail_generator.py, thumbnail_hub_service.py |
generated images, preview cards, prepared downloads | mixed |
Outlier Finder |
surface overperforming videos in a niche | niche query, filters, optional AI research trigger | outliers_finder.py, outlier_ai.py, provider-key helpers |
outlier cards, scored result tables, breakout charts, AI insight cards | mixed |
Ytuber |
open a live creator workspace for one channel | channel query, live refresh toggle, segmented workspace module selection | YouTube API loaders, keyword/title scoring helpers, thumbnail generator, outlier handoff logic | audit views, keyword tables, AI Studio outputs, planner and benchmark results | mixed |
Tools |
inspect public YouTube assets and prepare downloads | single URL, batch URLs, playlist URL, operation choice | youtube_tools.py, transcript_service.py, yt-dlp, ffmpeg |
metadata previews, transcripts, audio/video/thumbnail artifacts, batch/playlist results | API-backed |
Deployment |
understand how to run and deploy the app | none; in-app reference content | app shell guidance in dashboard/app.py |
deployment instructions, repo/branch/secrets notes | static |
flowchart TD
A["Bundled CSV data"] --> B["Channel Analysis"]
A --> C["Historical benchmark context"]
S["Streamlit secrets / env"] --> K["src/utils/api_keys.py"]
K --> Y["YouTube Data API"]
K --> G["Gemini / OpenAI"]
Y --> D["Channel Insights"]
Y --> E["Outlier Finder"]
Y --> F["Ytuber"]
Y --> H["Tools"]
Y --> I["Thumbnails URL export"]
G --> J["Thumbnails generation"]
G --> L["Ytuber AI Studio"]
G --> M["Outlier AI research"]
D --> N["Snapshots + topic metrics + recommendations"]
E --> O["Outlier results + breakout charts"]
F --> P["Creator workspace modules"]
H --> Q["Prepared public asset downloads"]
J --> R["Generated thumbnail images"]
I --> T["Prepared thumbnail download"]
In practice, V6 works as three layers:
Data: bundled CSV benchmarking plus live provider/API callsService: normalization, scoring, topic assignment, artifact prepUI: Streamlit cards, charts, tabs, downloads, and guided workflows
For the full section-by-section mechanics, see Architecture.
These are the main interactive surfaces a user actually navigates once a page is open.
| Page | Surface Type | Count | Current Surfaces | What They Do |
|---|---|---|---|---|
Channel Analysis |
main analytics canvas | 1 |
dataset filters + charts/tables | benchmark bundled CSV data |
Channel Insights |
tabs | 6 |
Overview, Topic Trends, Formats & Patterns, Outliers, Next Topics, History |
turn tracked public-channel snapshots into interpretable strategy signals |
Thumbnails |
tabs | 2 |
Generate, Download From URL |
create new thumbnails or export a public one |
Outlier Finder |
post-search sections | 4 |
Top Outliers In This Scan, Breakout Snapshot, AI Research, How This Works |
score breakout videos first, then interpret them |
Ytuber |
segmented modules | 8 |
AI Studio, Overview, Channel Audit, Keyword Intel, Outliers Finder, Title & SEO Lab, Competitor Benchmark, Content Planner |
open a live creator workspace around one channel |
Tools |
tabs | 3 |
Single, Batch, Playlist |
inspect and prepare public YouTube asset downloads |
Deployment |
in-app reference view | 1 |
deployment/setup guidance | explain how to run and deploy the app |
For the detailed tab-by-tab and module-by-module behavior, see:
- Architecture: Channel Insights
- Architecture: Thumbnails
- Architecture: Outlier Finder
- Outlier Finder Presentation Notes
- Architecture: Ytuber
- Architecture: Tools
flowchart TD
A["Bundled GitHub CSVs<br/>data/youtube api data/*.csv"] --> B["streamlit_app.py"]
U["User actions in the Streamlit UI"] --> B
B --> C["dashboard/app.py run + st.navigation"]
C --> D["sidebar navigation"]
D --> E["7 V6 page views"]
S["Streamlit secrets / env vars"] --> F["src/utils/api_keys.py"]
F --> G["YouTube Data API v3"]
F --> H["Gemini / OpenAI"]
A --> I["Dataset-backed analysis path"]
G --> J["Live public-channel and niche-research path"]
H --> K["AI generation path"]
I --> L["pandas transforms + visualization helpers"]
J --> L
K --> L
J --> M["Channel Insights service path"]
M --> M1["public workspace + feature frame"]
M1 --> M2["heuristic topics or BERTopic beta"]
M2 --> M3["metrics + outliers + recommendations + snapshots"]
L --> N["tables, charts, cards, downloads"]
M3 --> N
The architecture resolves into three repeatable patterns:
Dataset path: GitHub CSVs -> pandas transforms -> benchmark visualsLive API path: secrets/env -> provider clients -> normalized payloads -> interactive pagesModel path: Channel Insights feature frame -> heuristic or BERTopic topics -> downstream metrics and recommendations
For the deeper page-by-page breakdown, see Architecture.
flowchart LR
A["Streamlit secrets / env"] --> B["src/utils/api_keys.py"]
B --> C["Select / rotate provider key"]
C --> D["YouTube Data API"]
C --> E["Gemini"]
C --> F["OpenAI"]
D --> G["Channel Insights / Outlier Finder / Ytuber / Tools / Thumbnail URL flow"]
E --> H["Thumbnails / Ytuber AI Studio / Outlier AI"]
F --> H
G --> I["service-layer normalization"]
H --> I
I --> J["pandas frames / scored payloads / artifact prep"]
J --> K["Rendered Streamlit UI"]
This is the live V6 runtime path today:
- bundled CSVs power
Channel Analysis - live YouTube API calls power
Channel Insights,Outlier Finder,Ytuber,Tools, and thumbnail URL export - Gemini/OpenAI power thumbnail generation, AI Studio, and Outlier AI research
- the same secret names work in Streamlit secrets or GCP-style injected environment variables
For the deeper API/service explanation, see Architecture.
Channel Insights is where the most advanced modeling work lands in V5/V6. Every refresh starts with the same public-channel workspace, then branches into one of two topic assignment modes:
Heuristic Topics- default mode
- built from title, tags, and description tokenization
- always available
Model-Backed Topics (Beta)- optional BERTopic semantic grouping
- activated only when beta mode is selected and the model manifest/artifact path is configured
- falls back to heuristics if anything fails
flowchart TD
A["Channel Insights UI"] --> B["refresh_channel_insights(...)"]
B --> C["load_public_channel_workspace(...)"]
C --> D["ensure_public_channel_frame(...)"]
D --> E["add_channel_video_features(...)"]
E --> F["_apply_requested_topic_mode(...)"]
F --> G["assign_topic_labels(...)"]
F --> H["apply_optional_topic_model(...)"]
H -->|artifact missing / invalid / transform failure| G
G --> I["primary_topic + topic_labels + topic_source='heuristic'"]
H --> J["model_topic_id + model_topic_label_raw + model_topic_label"]
J --> K["primary_topic + topic_labels + topic_source='bertopic_global'"]
I --> L["shared scoring + metrics + outliers + snapshots"]
K --> L
For the full topic-mode branch, tab flow, and artifact-state details, see Architecture: Channel Insights.
| Item | Value |
|---|---|
| Original repo | matt-foor/purdue-youtube-ip |
| V6 development branch | youtube-ip-v6 |
| V5 historical branch | youtube-ip-v5 |
| Public V6 live app | ip-youtube-creator-insight-2026.streamlit.app |
| Earlier documented deploy (V5) | royayushkr/Youtube-IP-V5 on main (see Deployment And Versions) |
| Main file path (Streamlit Cloud) | streamlit_app.py |
| Required secret families | YOUTUBE, GEMINI, OPENAI |
| Optional secret family | MODEL_ARTIFACTS_* for BERTopic beta |
flowchart LR
A["GitHub repo"] --> B["V6 branch / fork"]
B --> C["Streamlit or GCP environment"]
C --> D["Secrets / env vars"]
D --> E["streamlit_app.py"]
E --> F["dashboard/app.py run + navigation"]
F --> G["7-page V6 app shell"]
Deployment notes in one screen:
- V6 stays public-only for
Channel Insights(same as V5) - the app reads
st.secretsfirst, then environment variables - BERTopic beta only activates when
MODEL_ARTIFACTS_ENABLED=trueand the manifest URL is configured - the GCP-oriented env template is
.env.gcp.example, which is meant to be copied into Cloud Run, GCE, or Secret Manager-backed environment configuration rather than committed as a real secret file
For the full deployment matrix and secrets history, see Deployment And Versions.
The short version of the project story is:
V1proved the public-data analytics and recommendation conceptV2expanded into a broader creator operating systemV3clarified the product shell and runtime structureV4added the deepest intelligence layer withChannel Insights, Assistant, Google OAuth, and BERTopic betaV5kept the strongest workflows, removed the heaviest operational complexity, and documented the system clearly for presentation and deploymentV6is the current Streamlit release: same V5 capabilities, with a hardened router, dependencies, and deployment path (live app)
What V5 removed on purpose (still true in V6):
- sidebar
Assistant - Google OAuth and owner-only analytics overlays
- heavier mixed recommendation behavior on page 3
What V5 kept on purpose (still true in V6):
- dataset benchmarking
- public tracked-channel insights
- thumbnail generation and export
- outlier research
- the live
Ytuberworkspace - optional BERTopic beta modeling
What V6 adds on top of V5:
- dependable local and Cloud runs (correct Python wheels, PyArrow, Google client stack)
- Streamlit-native multipage navigation and a single
run()entry fromstreamlit_app.pyso every rerun executes the router
For the full narrative brief and retrospective, see Project Brief.
- Architecture for the full runtime pipeline, page map, and topic-model integration details
- Deployment And Versions for branch targets, secrets evolution, and version/deployment comparisons
- Project Brief for the narrative project story, original goals, what changed, and how V5 led to V6
The original V1 principles still apply in V6:
- use public data responsibly
- respect provider terms of service
- avoid exposing personal data
- prefer explainable insights over black-box claims
- make AI-generated outputs additive to analysis, not a replacement for it
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
streamlit run streamlit_app.py