Index any codebase for persistent Claude context — minimal token overhead between sessions (~500 tokens for CLAUDE.md boot).
Every new Claude session starts blank. You re-explain your architecture, your conventions, your stack — burning hundreds of tokens just to get Claude up to speed. For large codebases, this context tax is constant and expensive.
repo-indexer runs a structured 6-phase analysis of your codebase and writes the results into a tiered memory system that scales across sessions with near-zero overhead:
L0: Claude Native Memory → repo roster, patterns (~100 tokens, always present)
L1: CLAUDE.md → boot loader only (<500 tokens, auto-loaded every session)
L2: .claude/memory/*.md → deep context files (loaded on-demand, only when needed)
L3: Conversation History → full analysis output (searchable, costs 0 tokens until used)
Claude loads L0 + L1 automatically. L2 and L3 are retrieved only when the task demands it. Files are pointers, not stores.
# Install the plugin
/plugin marketplace add jyshnkr/repo-indexer
# Install the skill
/plugin install repo-indexerThen in any project directory:
index this repo
Pulls latest from release > main > master to ensure analysis is current.
Automatically classifies the codebase:
- Monorepo —
pnpm-workspace.yaml,turbo.json,packages/,apps/ - Microservices — multiple Dockerfiles,
docker-composewith 3+ services - Single App — default when no strong signals are present
- Library —
pyproject.toml,Cargo.toml,setup.py,go.mod, orsrc/-only layout (noapps/) Heuristic note: a single weak signal may still default to Single App.
Analyzes 9 areas systematically:
- Config files (package.json, pyproject.toml, Cargo.toml, go.mod)
- Entry points (main, CLI, server bootstrap)
- Directory structure (depth 3)
- Core modules (business logic, services, models)
- API surface (routes, endpoints, schemas)
- Data layer (models, migrations, ORM)
- External dependencies (third-party integrations)
- Build/deploy (Dockerfile, CI/CD, Makefile)
- Tests (structure, fixtures, patterns)
- Full analysis written to conversation (L3) with
### SEARCH KEYWORDSfor retrieval - Minimal
.claude/file tree created at repo root (L2) CLAUDE.mdcreated as a <500 token boot loader (L1)
python3 skills/repo-indexer/scripts/estimate-tokens.pyValidates budgets using a heuristic token estimate (CLAUDE.md must be under 500 tokens).
python3 skills/repo-indexer/scripts/generate-memory-update.pySuggests 2–3 lines to add to Claude's native memory so the next session starts with repo awareness — no CLAUDE.md load required.
| Layer | Budget | When Loaded |
|---|---|---|
| L0: Native Memory | ~100–300 tokens | Always (free) |
| L1: CLAUDE.md | < 500 tokens | Every session start |
| L2: memory/*.md | < 10,000 tokens total | On-demand only |
| L3: Conversation History | 0 tokens | When searched |
Total auto-loaded per session: < 800 tokens. Everything else costs nothing until you need it. Token counts are estimated via a bytes-per-token heuristic; treat these as guardrails, not exact model counts.
"Index this repo" → Full 6-phase workflow. Claude knows your project before you ask your first question.
"Set up Claude context for this project" → Same workflow. Optimized for team onboarding — every developer gets instant Claude context.
"Help me understand this codebase"
→ Checks existing Claude memory and past conversations first. If prior indexing found, uses it. If .claude/ exists, compares with current codebase, flags inconsistencies, updates incrementally.
After indexing, your repo gets:
your-project/
├── CLAUDE.md # <500 token boot loader (L1)
└── .claude/
├── memory/
│ ├── architecture.md # System design, diagrams, key flows (L2)
│ ├── conventions.md # Naming, patterns, git workflow (L2)
│ └── glossary.md # Domain terms, acronyms (L2)
├── plans/ # Empty — user-managed
└── checkpoints/ # Empty — user-managed
The <!-- USER --> marker in each file preserves your own notes through re-indexing.
All scripts live under skills/repo-indexer/. Paths in the table below are relative to that directory.
| Script | Purpose |
|---|---|
scripts/git-sync.sh |
Deterministic branch sync (release > main > master) |
scripts/detect-repo-type.py |
Classify repo as monorepo/microservices/single_app/library |
scripts/estimate-tokens.py |
Validate token budgets for all .claude/ files |
scripts/generate-memory-update.py |
Generate native memory update suggestions |
All scripts use Python stdlib only — no external dependencies.
See skills/repo-indexer/references/repo-types.md for type-specific indexing strategies and CLAUDE.md templates.
| Type | Detection Signals |
|---|---|
| Monorepo | pnpm-workspace.yaml, turbo.json, nx.json, lerna.json, packages/ dir |
| Microservices | 3+ build: entries in docker-compose, multiple Dockerfiles |
| Single App | Default when no strong signals are present |
| Library | setup.py, pyproject.toml, Cargo.toml, go.mod, or src/-only layout |
See skills/repo-indexer/references/troubleshooting.md for solutions to:
- Skill not triggering
- Git sync failures
- Token budget exceeded
- Memory not persisting across sessions
- Conversation search not finding prior context
See CONTRIBUTING.md.
Issues and PRs welcome. When opening an issue, please include:
- Your repo type (monorepo/microservices/single_app/library)
- The phase where the problem occurred
- Output from the relevant script
MIT — see LICENSE.