PatrickSys
diff --git a/‎CHANGELOG.md‎
Lines changed: 19 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎docs/benchmark.md‎
Lines changed: 20 additions & 18 deletions b/‎docs/benchmark.md‎
Lines changed: 20 additions & 18 deletions
diff --git a/‎package.json‎
Lines changed: 1 addition & 1 deletion b/‎package.json‎
Lines changed: 1 addition & 1 deletion
@@ -2,6 +2,25 @@
 
 ## Unreleased
 
+## [2.1.0](https://github.com/PatrickSys/codebase-context/compare/v2.0.0...v2.1.0) (2026-04-13)
+
+### Features
+
+- **search:** surface chunk intelligence directly in `search_codebase` results, including symbol identity, scope, signature preview, and compact/full response budgeting
+- **map:** upgrade the conventions map with structural skeleton sections and add `map --export` so the compact map can be written to `CODEBASE_MAP.md`
+
+### Bug Fixes
+
+- **metadata:** require real dependency evidence plus multiple framework indicators before labeling a repo as Next.js or another specialized framework
+- **reranker:** auto-heal corrupted cross-encoder cache entries and surface degraded reranker state in `searchQuality.rerankerStatus`
+- **benchmarks:** harden comparator lanes for cross-platform execution and keep setup failures explicit instead of silently turning them into claims
+
+### Documentation
+
+- publish the v2.1.0 discovery benchmark rerun with the current gate output: `pending_evidence`, `claimAllowed: false`, `24` frozen tasks, `0.75` average usefulness, and `1822.25` average estimated tokens
+- document the current comparator truth instead of stale assumptions: the public proof still has no real comparator lane data on this host, so benchmark win claims remain blocked
+- note the new `searchQuality.tokenEstimate` advisory contract: estimates are based on the pre-advisory response payload and warnings only appear above the 4K-token threshold
+
 ### Features
 
 - **mcp:** rework multi-project routing so one MCP server can serve multiple projects instead of one hardcoded server entry per repo
 
@@ -1,6 +1,6 @@
 # Discovery Benchmark
 
-This page documents the current public proof slice for `v2.0.0`.
+This page documents the current public proof slice for `v2.1.0`.
 It is a discovery benchmark, not an implementation-quality benchmark.
 
 ## Scope
@@ -37,49 +37,51 @@ From `results/gate-evaluation.json`:
 - `claimAllowed`: `false`
 - `totalTasks`: `24`
 - `averageUsefulness`: `0.75`
-- `averageEstimatedTokens`: `903.7083333333334`
+- `averagePayloadBytes`: `7287.625`
+- `averageEstimatedTokens`: `1822.25`
+- `averageFirstRelevantHit`: `null`
 - `bestExampleUsefulnessRate`: `0.125`
 
 Repo-level outputs from the same rerun:
 
-| Repo | Tasks | Avg usefulness | Avg estimated tokens | Best-example usefulness |
-| --- | ---: | ---: | ---: | ---: |
-| `angular-spotify` | 12 | 0.8333 | 1080.6667 | 0.25 |
-| `excalidraw` | 12 | 0.6667 | 726.75 | 0 |
+| Repo | Tasks | Avg usefulness | Avg payload bytes | Avg estimated tokens | Best-example usefulness |
+| --- | ---: | ---: | ---: | ---: | ---: |
+| `angular-spotify` | 12 | 0.8333 | 8553 | 2138 | 0.25 |
+| `excalidraw` | 12 | 0.6667 | 6023 | 1506 | 0 |
 
 ## Gate Truth
 
 The gate is intentionally still blocked.
 
-- The combined suite now covers both public repos.
+- The combined suite covers both frozen public repos.
 - The release claim is still disallowed because comparator evidence remains incomplete.
 - Missing evidence currently includes:
   - raw Claude Code baseline metrics
-  - GrepAI metrics
-  - jCodeMunch metrics
-  - codebase-memory-mcp metrics
-  - CodeGraphContext metrics
+  - GrepAI comparator metrics
+  - jCodeMunch comparator metrics
+  - codebase-memory-mcp comparator metrics
+  - CodeGraphContext comparator metrics
 
 ## Comparator Reality
 
 The current comparator artifact records setup failures, not benchmark wins.
 
 | Comparator | Status | Current reason |
 | --- | --- | --- |
-| `codebase-memory-mcp` | `setup_failed` | Installer path still points to the external shell installer |
-| `jCodeMunch` | `setup_failed` | MCP server closes during startup |
-| `GrepAI` | `setup_failed` | Local Go binary and Ollama model path not present |
-| `CodeGraphContext` | `setup_failed` | MCP server closes during startup |
-| `raw Claude Code` | `setup_failed` | Local `claude` CLI baseline is not installed/authenticated in this environment |
+| `codebase-memory-mcp` | `ok` | The lane now executes on this host, but the captured outputs are near-empty (`19` bytes / `5` tokens on average, `0` usefulness), so the gate still treats it as missing evidence |
+| `jCodeMunch` | `setup_failed` | MCP handshake still closes during startup on this host (`MCP error -32000: Connection closed`) |
+| `GrepAI` | `setup_failed` | Local Go binary and Ollama model path are not present |
+| `CodeGraphContext` | `setup_failed` | MCP handshake still closes during startup on this host (`MCP error -32000: Connection closed`); database prerequisite remains unresolved |
+| `raw Claude Code` | `ok` | The baseline now runs, but the captured outputs remain non-useful (`66.08` bytes / `17.17` tokens on average, `0` usefulness), so the gate still treats it as missing evidence |
 
-`CodeGraphContext` is explicitly part of the frozen comparison frame. It is not omitted from the public story just because the lane still fails to start.
+`CodeGraphContext` remains part of the comparison frame. It is not removed from the public story just because the lane still fails to start.
 
 ## Important Limitations
 
 - This benchmark measures discovery usefulness and payload cost only.
 - It does not measure implementation correctness, patch quality, or end-to-end task completion.
 - Comparator setup is still environment-sensitive, so the gate remains `pending_evidence`.
-- The reranker cache is currently corrupted on this machine. During the proof rerun, search fell back to original ordering after `Protobuf parsing failed` while still completing the harness.
+- Current search payload costs are higher than the older v2.0.0 proof slice because the v2.1.0 surface now includes richer map structure and `searchQuality.tokenEstimate` advisories.
 - `averageFirstRelevantHit` remains `null` in the current gate output because this compact response surface does not expose a comparable ranked-hit metric across the incomplete comparator set.
 
 ## What This Proof Can Support
 
@@ -1,6 +1,6 @@
 {
   "name": "codebase-context",
-  "version": "1.9.0",
+  "version": "2.1.0",
   "description": "Pre-maps your codebase architecture, conventions, and team memory so AI agents navigate with precision instead of exploring. Local-first MCP server with AST-backed hybrid search.",
   "type": "module",
   "main": "./dist/lib.js",
Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "codebase-context",`
`3`		`- "version": "1.9.0",`
	`3`	`+ "version": "2.1.0",`
`4`	`4`	`"description": "Pre-maps your codebase architecture, conventions, and team memory so AI agents navigate with precision instead of exploring. Local-first MCP server with AST-backed hybrid search.",`
`5`	`5`	`"type": "module",`
`6`	`6`	`"main": "./dist/lib.js",`