Skip to content

feat(for-you): re-introduce /users/{id}/feed/for-you with lean 3-source pipeline#817

Open
dylanjeffers wants to merge 2 commits into
mainfrom
claude/eager-wilson-e22923
Open

feat(for-you): re-introduce /users/{id}/feed/for-you with lean 3-source pipeline#817
dylanjeffers wants to merge 2 commits into
mainfrom
claude/eager-wilson-e22923

Conversation

@dylanjeffers
Copy link
Copy Markdown
Contributor

Summary

Brings back GET /v1/users/{id}/feed/for-you (removed in #807) with a leaner pipeline that drops the source that was causing the prod timeouts.

The killer in the original was the similar_artists CTE — a 1-hop saves-graph self-join that produced a 301M-row merge for power users. Dropped entirely. The lean pipeline keeps the other three sources:

Source What it pulls Cap
in_network Tracks uploaded in the last 14 days by users I follow 200
trending Top week-trending from track_trending_scores 100
underground Week-trending tracks whose owner has < 1500 follower & following count 50

Same ranking formula as before:

recency_score    = exp(-ln(2) * age_hours / 48)
engagement_score = ln(1 + 3*saves + 2*reposts + 1*plays) / 12
social_boost     = 1.0 + least(affinity/4, 1.0)
source_weight    = {in_network: 1.20, trending: 1.00, underground: 0.95}
final_score = (0.55*recency + 0.45*engagement) * social_boost * source_weight

Same Go-side diversity pass (per-artist ROW_NUMBER() cap + 5-position lookahead to break consecutive-same-artist runs).

Perf guardrails

Retained from #805 / #806:

  • follow_set capped at 500 most-recently-followed.
  • my_artist_affinity sub-selects capped (200 saves / 200 reposts / 500 plays, all by recency).

Additional:

  • my_artist_affinity inner JOIN tracks is now further restricted to tracks created in the last 90 days — old uploads can't pull the CTE wide.
  • New partial index idx_track_trending_scores_for_you on track_trending_scores (score DESC, track_id) covering the TRACKS / pnagD / week / null-genre slice. Without it, EXPLAIN showed a fixed ~12s scan of track_trending_scores for every request, regardless of caller.

Files

File What
api/v1_users_feed_for_you.go Handler + the 3-source candidate-pool SQL (no similar_artists)
api/v1_users_feed_for_you_test.go Basic tests: valid user_id required, empty feed for new user, pagination
api/server.go Route re-registration
api/auth_middleware.go Re-add the /feed/for-you exemption (query user_id is advisory; path :userId controls personalization)
api/swagger/swagger-v1.yaml Re-add the endpoint spec
ddl/migrations/0198_track_trending_scores_for_you_idx.sql The missing partial index

Test plan

  • go build ./api/... clean
  • go vet ./api/... clean
  • go test -c ./api/... compiles
  • After deploy: hit /v1/users/{id}/feed/for-you?user_id={id}&limit=5 for a power-user account that previously timed out — should now return 200 in < 2s.
  • Eyeball the mix of in-network vs trending vs underground on a real account.

🤖 Generated with Claude Code

dylanjeffers and others added 2 commits May 15, 2026 12:47
…uthors

The previous shadow-ban filter on the contest discovery list used
`aggregate_user.score < 0` (AAO output). Two problems:

  1. `aggregate_user` has no index covering `score`, so the CTE forced a
     full seq scan on every cold call. /v1/events/remix-contests?status=all
     was hanging ~22s cold-cache (warm: ~100ms).
  2. The AAO signal is a separate moderation lane from the community
     karma-reports system that already governs comment visibility.
     The two can drift.

Fix: align the contest filter with the comment-visibility filter. A host
is shadow-banned from contest discovery if they authored a comment that
crossed the same `high_karma_reporters` threshold (sum of reporters'
follower_count >= karmaCommentCountThreshold) that hides the comment
itself on v1_track_comments / v1_event_comments. The new CTE
`karma_reported_authors` lifts the comment-level signal up to user_id.

`muted_by_karma` is unchanged — still filters hosts muted by high-karma
users.

`comment_reports` is a small table indexed on `comment_id`, and the new
CTE only adds a hash-join on comments (PK lookup per hkr row), so the
cost is bounded by report volume rather than user-table size — no
sequential scan over millions of aggregate_user rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…urce pipeline

Brings back the For You feed endpoint that was removed in #807. The
killer in the original was the similar_artists CTE — a 1-hop saves-graph
self-join that produced a 301M-row merge for power users. Dropped
entirely; the lean pipeline keeps the other three sources:

  - in_network  (followed-creator uploads, last 14 days, LIMIT 200)
  - trending    (track_trending_scores week, LIMIT 100)
  - underground (week-trending, sub-1500 follower/following, LIMIT 50)

Same ranking formula and same Go-side diversity pass. Perf guardrails
from #805/#806 are retained (follow_set LIMIT 500, affinity sub-selects
capped at 200/200/500), and my_artist_affinity additionally filters its
tracks join to the last 90 days so old uploads can't pull the CTE wide.

A partial index on track_trending_scores covering
(type='TRACKS', version='pnagD', time_range='week', genre IS NULL) is
added in 0198 — without it the trending/underground sources cost a
fixed ~12s table scan per request regardless of caller.

Auth middleware exemption for /feed/for-you (from #804) is re-added:
the route's query user_id is a viewer hint for response decoration only;
path :userId controls personalization.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant