PatrickSys
diff --git a/‎README.md‎
Lines changed: 6 additions & 2 deletions b/‎README.md‎
Lines changed: 6 additions & 2 deletions
diff --git a/‎docs/benchmark.md‎
Lines changed: 90 additions & 0 deletions b/‎docs/benchmark.md‎
Lines changed: 90 additions & 0 deletions
diff --git a/‎docs/capabilities.md‎
Lines changed: 2 additions & 0 deletions b/‎docs/capabilities.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/cli.md‎
Lines changed: 37 additions & 0 deletions b/‎docs/cli.md‎
Lines changed: 37 additions & 0 deletions
diff --git a/‎docs/comparison-table.md‎
Lines changed: 33 additions & 0 deletions b/‎docs/comparison-table.md‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎docs/demo.md‎
Lines changed: 120 additions & 0 deletions b/‎docs/demo.md‎
Lines changed: 120 additions & 0 deletions
diff --git a/‎docs/registry-sync-checklist.md‎
Lines changed: 42 additions & 0 deletions b/‎docs/registry-sync-checklist.md‎
Lines changed: 42 additions & 0 deletions
@@ -1,6 +1,6 @@
 # codebase-context
 
-## Local-first second brain for AI agents working on your codebase
+## Stop paying for AI agents to explore your codebase. codebase-context pre-maps the architecture, conventions, and team memory so they don't have to.
 
 [![npm version](https://img.shields.io/npm/v/codebase-context)](https://www.npmjs.com/package/codebase-context) [![license](https://img.shields.io/npm/l/codebase-context)](./LICENSE) [![node](https://img.shields.io/node/v/codebase-context)](./package.json)
 
@@ -20,6 +20,8 @@ Here's what codebase-context does:
 
 One tool call returns all of it. Local-first - your code never leaves your machine by default.
 
+See the [v2.0.0 benchmark](./docs/benchmark.md) for the discovery suite results and current gate truth.
+
 ### What it looks like
 
 Real CLI output against `angular-spotify`, the repo used for the launch screenshots.
@@ -36,7 +38,7 @@ This is the part most tools miss: what the team is doing now, what it is moving
 
 When the agent searches with edit intent, it gets a compact decision card: confidence, whether it's safe to proceed, which patterns apply, the best example, and which files are likely to be affected.
 
-More CLI examples in [`docs/cli.md`](./docs/cli.md).
+More CLI examples in [`docs/cli.md`](./docs/cli.md). Full walkthrough: [`docs/demo.md`](./docs/demo.md).
 
 ## Quick Start
 
@@ -222,6 +224,8 @@ These are the behaviors that make the most difference day-to-day. Copy, trim wha
 
 ## Links
 
+- [Benchmark](./docs/benchmark.md) — v2.0.0 discovery suite results and gate truth
+- [Demo](./docs/demo.md) — real CLI walkthrough
 - [Client Setup](./docs/client-setup.md) — per-client config, HTTP setup, local build testing
 - [Capabilities Reference](./docs/capabilities.md) — tool API, retrieval pipeline, decision card schema
 - [CLI Gallery](./docs/cli.md) — formatted command output examples
 
@@ -0,0 +1,90 @@
+# Discovery Benchmark
+
+This page documents the current public proof slice for `v2.0.0`.
+It is a discovery benchmark, not an implementation-quality benchmark.
+
+## Scope
+
+- Frozen fixtures:
+  - `tests/fixtures/discovery-angular-spotify.json`
+  - `tests/fixtures/discovery-excalidraw.json`
+  - `tests/fixtures/discovery-benchmark-protocol.json`
+- Frozen repos used in the current proof run:
+  - `repos/angular-spotify`
+  - `repos/excalidraw`
+- Current gate artifact:
+  - `results/gate-evaluation.json`
+- Comparator evidence:
+  - `results/comparator-evidence.json`
+
+## How To Reproduce
+
+Run the repo-local proof artifacts from the current `master` checkout:
+
+```bash
+node scripts/run-eval.mjs repos/angular-spotify --mode=discovery --fixture-a=tests/fixtures/discovery-angular-spotify.json --skip-reindex --output=results/codebase-context-angular-spotify.json
+node scripts/run-eval.mjs repos/excalidraw --mode=discovery --fixture-a=tests/fixtures/discovery-excalidraw.json --skip-reindex --output=results/codebase-context-excalidraw.json
+node scripts/benchmark-comparators.mjs --repos repos/angular-spotify,repos/excalidraw --output results/comparator-evidence.json
+node scripts/run-eval.mjs repos/angular-spotify repos/excalidraw --mode=discovery --fixture-a=tests/fixtures/discovery-angular-spotify.json --fixture-b=tests/fixtures/discovery-excalidraw.json --competitor-results=results/comparator-evidence.json --skip-reindex --output=results/gate-evaluation.json
+```
+
+## Current Result
+
+From `results/gate-evaluation.json`:
+
+- `status`: `pending_evidence`
+- `suiteStatus`: `complete`
+- `claimAllowed`: `false`
+- `totalTasks`: `24`
+- `averageUsefulness`: `0.75`
+- `averageEstimatedTokens`: `903.7083333333334`
+- `bestExampleUsefulnessRate`: `0.125`
+
+Repo-level outputs from the same rerun:
+
+| Repo | Tasks | Avg usefulness | Avg estimated tokens | Best-example usefulness |
+| --- | ---: | ---: | ---: | ---: |
+| `angular-spotify` | 12 | 0.8333 | 1080.6667 | 0.25 |
+| `excalidraw` | 12 | 0.6667 | 726.75 | 0 |
+
+## Gate Truth
+
+The gate is intentionally still blocked.
+
+- The combined suite now covers both public repos.
+- The release claim is still disallowed because comparator evidence remains incomplete.
+- Missing evidence currently includes:
+  - raw Claude Code baseline metrics
+  - GrepAI metrics
+  - jCodeMunch metrics
+  - codebase-memory-mcp metrics
+  - CodeGraphContext metrics
+
+## Comparator Reality
+
+The current comparator artifact records setup failures, not benchmark wins.
+
+| Comparator | Status | Current reason |
+| --- | --- | --- |
+| `codebase-memory-mcp` | `setup_failed` | Installer path still points to the external shell installer |
+| `jCodeMunch` | `setup_failed` | MCP server closes during startup |
+| `GrepAI` | `setup_failed` | Local Go binary and Ollama model path not present |
+| `CodeGraphContext` | `setup_failed` | MCP server closes during startup |
+| `raw Claude Code` | `setup_failed` | Local `claude` CLI baseline is not installed/authenticated in this environment |
+
+`CodeGraphContext` is explicitly part of the frozen comparison frame. It is not omitted from the public story just because the lane still fails to start.
+
+## Important Limitations
+
+- This benchmark measures discovery usefulness and payload cost only.
+- It does not measure implementation correctness, patch quality, or end-to-end task completion.
+- Comparator setup is still environment-sensitive, so the gate remains `pending_evidence`.
+- The reranker cache is currently corrupted on this machine. During the proof rerun, search fell back to original ordering after `Protobuf parsing failed` while still completing the harness.
+- `averageFirstRelevantHit` remains `null` in the current gate output because this compact response surface does not expose a comparable ranked-hit metric across the incomplete comparator set.
+
+## What This Proof Can Support
+
+- It can support claims about the shipped discovery surfaces and their current measured outputs on the frozen public tasks.
+- It can support claims that the proof gate is still blocked by comparator evidence.
+- It cannot support claims that `codebase-context` beats the named comparators today.
+- It cannot support claims about edit success, code quality, or implementation speed.
@@ -280,6 +280,8 @@ Notes:
 
 ## Evaluation Harness
 
+Current public proof bundle: [`docs/benchmark.md`](../docs/benchmark.md) and [`docs/comparison-table.md`](../docs/comparison-table.md).
+
 Reproducible evaluation is shipped as a CLI entrypoint backed by shared scoring/reporting code.
 
 - **Command:** `npm run eval -- <codebaseA> [codebaseB] --mode retrieval|discovery [--competitor-results <path>]` (builds first, then runs `scripts/run-eval.mjs`)
 
@@ -2,6 +2,7 @@
 
 `codebase-context` exposes its tools as a local CLI so humans can:
 
+- Get the conventions map before exploring or editing (`map`)
 - Onboard themselves onto an unfamiliar repo
 - Debug what the MCP server is doing
 - Use outputs in CI/scripts (via `--json`)
@@ -30,6 +31,7 @@ CODEBASE_CONTEXT_ASCII=1 npx -y codebase-context patterns
 
 ## Commands
 
+- `map` — conventions map: architecture layers, patterns, golden files
 - `metadata` — tech stack overview
 - `patterns` — team conventions + adoption/trends
 - `search --query <q>` — ranked results; add `--intent edit` for a preflight card
@@ -42,6 +44,41 @@ CODEBASE_CONTEXT_ASCII=1 npx -y codebase-context patterns
 
 ---
 
+## `map`
+
+```bash
+npx -y codebase-context map
+```
+
+The conventions map — run this first on an unfamiliar repo. Shows architecture layers, active patterns with adoption rates and trend direction, and the golden files the team treats as the strongest examples. This is also what the MCP server delivers to AI agents via the `codebase://context` resource on first call.
+
+Example output (truncated):
+
+```text
+┌─ Codebase Map ── angular-spotify ────────────────────────────────────┐
+│                                                                      │
+│ Architecture: feature-based · 3 layers                               │
+│ 47 files · 6 patterns · 3 golden files                               │
+│                                                                      │
+│ LAYERS                                                               │
+│   core/      – shared services + DI                                  │
+│   features/  – domain modules                                        │
+│   shared/    – reusable components                                   │
+│                                                                      │
+│ TOP PATTERNS                                                         │
+│      Angular standalone components   92%  ↑ Rising                  │
+│      RxJS reactive patterns          78%                             │
+│   ↓  NgModules                        8%  Declining                  │
+│                                                                      │
+│ GOLDEN FILES                                                         │
+│   src/features/player/player.component.ts                            │
+│   src/core/auth/auth.service.ts                                      │
+│                                                                      │
+└──────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
 ## `metadata`
 
 ```bash
 
@@ -0,0 +1,33 @@
+# Comparator Summary
+
+This table summarizes the current comparator evidence from `results/comparator-evidence.json`.
+It is a setup-status table first, not a marketing scoreboard.
+
+| Comparator | Intended role in gate | Current status | Evidence summary |
+| --- | --- | --- | --- |
+| `raw Claude Code` | Baseline for payload cost and at least one usefulness comparison | `setup_failed` | The local `claude` CLI baseline is unavailable in this environment, so the gate records missing baseline metrics. |
+| `GrepAI` | Named MCP comparator | `setup_failed` | Requires the GrepAI binary plus a local Ollama embedding setup that is not present in this proof environment. |
+| `jCodeMunch` | Named MCP comparator | `setup_failed` | The MCP server still closes on startup during the current rerun, so no comparable discovery metrics were produced. |
+| `codebase-memory-mcp` | Named MCP comparator | `setup_failed` | The documented install path still depends on the external shell installer instead of a working local benchmark path. |
+| `CodeGraphContext` | Graph-native comparator in the relaunch frame | `setup_failed` | The MCP server still closes on startup during the current rerun, so this lane remains missing evidence. |
+
+## Reading This Table
+
+- `setup_failed` means the lane was attempted and did not reach a credible metric-producing state.
+- A missing metric is not treated as a win for `codebase-context`.
+- The combined gate in `results/gate-evaluation.json` remains `pending_evidence` until these lanes produce real metrics.
+
+## Current codebase-context result
+
+For reference, the current combined discovery output across `angular-spotify` and `excalidraw` is:
+
+| Metric | codebase-context |
+| --- | ---: |
+| `totalTasks` | 24 |
+| `averageUsefulness` | 0.75 |
+| `averagePayloadBytes` | 3613.6667 |
+| `averageEstimatedTokens` | 903.7083 |
+| `bestExampleUsefulnessRate` | 0.125 |
+| `gate.status` | `pending_evidence` |
+
+Those numbers are not compared here as head-to-head wins because the comparator lanes above did not produce matching metrics.
@@ -0,0 +1,120 @@
+# Demo Script
+
+This walkthrough uses real CLI output captured against `repos/angular-spotify` during the Phase 10 proof rerun.
+Run it from the repo root with `CODEBASE_ROOT` pointed at the frozen sample repo.
+
+## 1. Start With The Conventions Map
+
+```bash
+$env:CODEBASE_ROOT='C:\Users\bitaz\Repos\codebase-context\repos\angular-spotify'
+node dist/index.js map --json
+```
+
+Captured output excerpt:
+
+```json
+{
+  "project": "angular-spotify",
+  "architecture": {
+    "layers": [
+      { "name": "libs", "fileCount": 252 },
+      { "name": "apps", "fileCount": 6 }
+    ]
+  },
+  "activePatterns": [
+    { "name": "Effect", "adoption": "100%", "trend": "Rising" },
+    { "name": "Standalone", "adoption": "100%", "trend": "Rising" },
+    { "name": "RxJS", "adoption": "98%", "trend": "Rising" }
+  ],
+  "bestExamples": [
+    { "file": "src/lib/card.component.ts", "score": 4, "reason": "Angular TestBed" }
+  ]
+}
+```
+
+What this shows:
+
+- The first call gives a compact conventions map instead of raw grep output.
+- The response already includes architecture layers, active patterns, and a concrete best example.
+
+## 2. Search With Edit Intent
+
+```bash
+$env:CODEBASE_ROOT='C:\Users\bitaz\Repos\codebase-context\repos\angular-spotify'
+node dist/index.js search --query "auth headers" --intent edit --limit 3 --json
+```
+
+Captured output excerpt:
+
+```json
+{
+  "status": "success",
+  "searchQuality": { "status": "ok", "confidence": 1 },
+  "preflight": {
+    "ready": true,
+    "warnings": [
+      "Index is aging (>24h) — results may not reflect recent changes"
+    ],
+    "patterns": {
+      "do": [
+        "Constructor injection — 85% adoption",
+        "Standalone — 100% adoption",
+        "RxJS — 98% adoption"
+      ]
+    },
+    "bestExample": "src/lib/card.component.ts"
+  },
+  "results": [
+    {
+      "file": "C:\\Users\\bitaz\\Repos\\codebase-context\\repos\\angular-spotify\\libs\\web\\auth\\util\\src\\lib\\interceptors\\auth.interceptor.ts:10-42",
+      "type": "interceptor:core"
+    }
+  ]
+}
+```
+
+What this shows:
+
+- Search remains the second step after the map.
+- `intent=edit` adds preflight evidence instead of forcing a separate call.
+- The response stays compact while still surfacing a best example and impact hints.
+
+## 3. Check A Team Pattern Directly
+
+```bash
+$env:CODEBASE_ROOT='C:\Users\bitaz\Repos\codebase-context\repos\angular-spotify'
+node dist/index.js patterns --category state --json
+```
+
+Captured output excerpt:
+
+```json
+{
+  "patterns": {
+    "stateManagement": {
+      "primary": {
+        "name": "RxJS",
+        "frequency": "98%",
+        "trend": "Rising"
+      },
+      "alsoDetected": [
+        {
+          "name": "Signals",
+          "frequency": "2%",
+          "trend": "Rising"
+        }
+      ]
+    }
+  }
+}
+```
+
+What this shows:
+
+- The tool distinguishes dominant patterns from emerging ones.
+- The map/search story is backed by direct pattern evidence rather than generic prose.
+
+## Caveats
+
+- These excerpts were captured from the current local proof run and will change if the frozen sample repo or index state changes.
+- The benchmark gate is still `pending_evidence`, so this walkthrough demonstrates shipped behavior, not a released performance claim.
@@ -0,0 +1,42 @@
+# Registry Sync Checklist
+
+Use this checklist before publishing any Phase 10-facing metadata or registry copy.
+The purpose is to keep the public surface aligned with the current proof bundle.
+
+## Required Artifacts
+
+- `results/gate-evaluation.json` exists and still reports the current gate truth.
+- `results/comparator-evidence.json` exists and still records every failed lane honestly.
+- `docs/benchmark.md` matches the current gate numbers and limitations.
+- `docs/comparison-table.md` matches the current comparator statuses, including `CodeGraphContext`.
+- `docs/demo.md` uses real CLI output, not invented snippets.
+
+## Public Surfaces To Sync
+
+- `README.md`
+- `package.json`
+- `docs/capabilities.md`
+- `docs/client-setup.md`
+- `docs/cli.md`
+- npm package description and keywords derived from `package.json`
+
+## Required Truth Checks
+
+- If the gate is `pending_evidence`, say so explicitly.
+- If any comparator lane is `setup_failed`, say so explicitly.
+- Do not claim benchmark wins against `raw Claude Code`, `GrepAI`, `jCodeMunch`, `codebase-memory-mcp`, or `CodeGraphContext` without real metrics in `results/comparator-evidence.json`.
+- Do not claim implementation quality from this discovery benchmark.
+- Do not omit the current reranker fallback limitation if the proof run still shows `Protobuf parsing failed`.
+
+## Before Registry Or README Updates
+
+- Re-run the four proof commands from `docs/benchmark.md` if the evidence artifacts look stale.
+- Reconfirm that `results/gate-evaluation.json` still reports `claimAllowed: false` before writing relaunch copy.
+- Reconfirm that `results/gate-evaluation.json` still reports `suiteStatus: complete`.
+- Reconfirm that `CodeGraphContext` remains represented in the comparison table even if the lane still fails.
+
+## Release Stop Conditions
+
+- Stop if the proof docs drift from the JSON artifacts.
+- Stop if public copy implies a pass while the gate still says `pending_evidence`.
+- Stop if registry metadata still uses broader positioning than the proof bundle can support.