Skip to content

Commit 440ef7a

Browse files
committed
chore(release): prepare v2.1.0
1 parent 6c19628 commit 440ef7a

7 files changed

Lines changed: 1072 additions & 190 deletions

File tree

CHANGELOG.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,25 @@
22

33
## Unreleased
44

5+
## [2.1.0](https://github.com/PatrickSys/codebase-context/compare/v2.0.0...v2.1.0) (2026-04-13)
6+
7+
### Features
8+
9+
- **search:** surface chunk intelligence directly in `search_codebase` results, including symbol identity, scope, signature preview, and compact/full response budgeting
10+
- **map:** upgrade the conventions map with structural skeleton sections and add `map --export` so the compact map can be written to `CODEBASE_MAP.md`
11+
12+
### Bug Fixes
13+
14+
- **metadata:** require real dependency evidence plus multiple framework indicators before labeling a repo as Next.js or another specialized framework
15+
- **reranker:** auto-heal corrupted cross-encoder cache entries and surface degraded reranker state in `searchQuality.rerankerStatus`
16+
- **benchmarks:** harden comparator lanes for cross-platform execution and keep setup failures explicit instead of silently turning them into claims
17+
18+
### Documentation
19+
20+
- publish the v2.1.0 discovery benchmark rerun with the current gate output: `pending_evidence`, `claimAllowed: false`, `24` frozen tasks, `0.75` average usefulness, and `1822.25` average estimated tokens
21+
- document the current comparator truth instead of stale assumptions: the public proof still has no real comparator lane data on this host, so benchmark win claims remain blocked
22+
- note the new `searchQuality.tokenEstimate` advisory contract: estimates are based on the pre-advisory response payload and warnings only appear above the 4K-token threshold
23+
524
### Features
625

726
- **mcp:** rework multi-project routing so one MCP server can serve multiple projects instead of one hardcoded server entry per repo

docs/benchmark.md

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Discovery Benchmark
22

3-
This page documents the current public proof slice for `v2.0.0`.
3+
This page documents the current public proof slice for `v2.1.0`.
44
It is a discovery benchmark, not an implementation-quality benchmark.
55

66
## Scope
@@ -37,49 +37,51 @@ From `results/gate-evaluation.json`:
3737
- `claimAllowed`: `false`
3838
- `totalTasks`: `24`
3939
- `averageUsefulness`: `0.75`
40-
- `averageEstimatedTokens`: `903.7083333333334`
40+
- `averagePayloadBytes`: `7287.625`
41+
- `averageEstimatedTokens`: `1822.25`
42+
- `averageFirstRelevantHit`: `null`
4143
- `bestExampleUsefulnessRate`: `0.125`
4244

4345
Repo-level outputs from the same rerun:
4446

45-
| Repo | Tasks | Avg usefulness | Avg estimated tokens | Best-example usefulness |
46-
| --- | ---: | ---: | ---: | ---: |
47-
| `angular-spotify` | 12 | 0.8333 | 1080.6667 | 0.25 |
48-
| `excalidraw` | 12 | 0.6667 | 726.75 | 0 |
47+
| Repo | Tasks | Avg usefulness | Avg payload bytes | Avg estimated tokens | Best-example usefulness |
48+
| --- | ---: | ---: | ---: | ---: | ---: |
49+
| `angular-spotify` | 12 | 0.8333 | 8553 | 2138 | 0.25 |
50+
| `excalidraw` | 12 | 0.6667 | 6023 | 1506 | 0 |
4951

5052
## Gate Truth
5153

5254
The gate is intentionally still blocked.
5355

54-
- The combined suite now covers both public repos.
56+
- The combined suite covers both frozen public repos.
5557
- The release claim is still disallowed because comparator evidence remains incomplete.
5658
- Missing evidence currently includes:
5759
- raw Claude Code baseline metrics
58-
- GrepAI metrics
59-
- jCodeMunch metrics
60-
- codebase-memory-mcp metrics
61-
- CodeGraphContext metrics
60+
- GrepAI comparator metrics
61+
- jCodeMunch comparator metrics
62+
- codebase-memory-mcp comparator metrics
63+
- CodeGraphContext comparator metrics
6264

6365
## Comparator Reality
6466

6567
The current comparator artifact records setup failures, not benchmark wins.
6668

6769
| Comparator | Status | Current reason |
6870
| --- | --- | --- |
69-
| `codebase-memory-mcp` | `setup_failed` | Installer path still points to the external shell installer |
70-
| `jCodeMunch` | `setup_failed` | MCP server closes during startup |
71-
| `GrepAI` | `setup_failed` | Local Go binary and Ollama model path not present |
72-
| `CodeGraphContext` | `setup_failed` | MCP server closes during startup |
73-
| `raw Claude Code` | `setup_failed` | Local `claude` CLI baseline is not installed/authenticated in this environment |
71+
| `codebase-memory-mcp` | `ok` | The lane now executes on this host, but the captured outputs are near-empty (`19` bytes / `5` tokens on average, `0` usefulness), so the gate still treats it as missing evidence |
72+
| `jCodeMunch` | `setup_failed` | MCP handshake still closes during startup on this host (`MCP error -32000: Connection closed`) |
73+
| `GrepAI` | `setup_failed` | Local Go binary and Ollama model path are not present |
74+
| `CodeGraphContext` | `setup_failed` | MCP handshake still closes during startup on this host (`MCP error -32000: Connection closed`); database prerequisite remains unresolved |
75+
| `raw Claude Code` | `ok` | The baseline now runs, but the captured outputs remain non-useful (`66.08` bytes / `17.17` tokens on average, `0` usefulness), so the gate still treats it as missing evidence |
7476

75-
`CodeGraphContext` is explicitly part of the frozen comparison frame. It is not omitted from the public story just because the lane still fails to start.
77+
`CodeGraphContext` remains part of the comparison frame. It is not removed from the public story just because the lane still fails to start.
7678

7779
## Important Limitations
7880

7981
- This benchmark measures discovery usefulness and payload cost only.
8082
- It does not measure implementation correctness, patch quality, or end-to-end task completion.
8183
- Comparator setup is still environment-sensitive, so the gate remains `pending_evidence`.
82-
- The reranker cache is currently corrupted on this machine. During the proof rerun, search fell back to original ordering after `Protobuf parsing failed` while still completing the harness.
84+
- Current search payload costs are higher than the older v2.0.0 proof slice because the v2.1.0 surface now includes richer map structure and `searchQuality.tokenEstimate` advisories.
8385
- `averageFirstRelevantHit` remains `null` in the current gate output because this compact response surface does not expose a comparable ranked-hit metric across the incomplete comparator set.
8486

8587
## What This Proof Can Support

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "codebase-context",
3-
"version": "1.9.0",
3+
"version": "2.1.0",
44
"description": "Pre-maps your codebase architecture, conventions, and team memory so AI agents navigate with precision instead of exploring. Local-first MCP server with AST-backed hybrid search.",
55
"type": "module",
66
"main": "./dist/lib.js",

0 commit comments

Comments
 (0)