Skip to content

Commit 840670c

Browse files
committed
chore: trim release metadata from PR #98
1 parent 100d291 commit 840670c

2 files changed

Lines changed: 19 additions & 21 deletions

File tree

docs/benchmark.md

Lines changed: 18 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Discovery Benchmark
22

3-
This page documents the current public proof slice for `v2.1.0`.
3+
This page documents the current public proof slice for `v2.0.0`.
44
It is a discovery benchmark, not an implementation-quality benchmark.
55

66
## Scope
@@ -37,51 +37,49 @@ From `results/gate-evaluation.json`:
3737
- `claimAllowed`: `false`
3838
- `totalTasks`: `24`
3939
- `averageUsefulness`: `0.75`
40-
- `averagePayloadBytes`: `7287.625`
41-
- `averageEstimatedTokens`: `1822.25`
42-
- `averageFirstRelevantHit`: `null`
40+
- `averageEstimatedTokens`: `903.7083333333334`
4341
- `bestExampleUsefulnessRate`: `0.125`
4442

4543
Repo-level outputs from the same rerun:
4644

47-
| Repo | Tasks | Avg usefulness | Avg payload bytes | Avg estimated tokens | Best-example usefulness |
48-
| --- | ---: | ---: | ---: | ---: | ---: |
49-
| `angular-spotify` | 12 | 0.8333 | 8553 | 2138 | 0.25 |
50-
| `excalidraw` | 12 | 0.6667 | 6023 | 1506 | 0 |
45+
| Repo | Tasks | Avg usefulness | Avg estimated tokens | Best-example usefulness |
46+
| --- | ---: | ---: | ---: | ---: |
47+
| `angular-spotify` | 12 | 0.8333 | 1080.6667 | 0.25 |
48+
| `excalidraw` | 12 | 0.6667 | 726.75 | 0 |
5149

5250
## Gate Truth
5351

5452
The gate is intentionally still blocked.
5553

56-
- The combined suite covers both frozen public repos.
54+
- The combined suite now covers both public repos.
5755
- The release claim is still disallowed because comparator evidence remains incomplete.
5856
- Missing evidence currently includes:
5957
- raw Claude Code baseline metrics
60-
- GrepAI comparator metrics
61-
- jCodeMunch comparator metrics
62-
- codebase-memory-mcp comparator metrics
63-
- CodeGraphContext comparator metrics
58+
- GrepAI metrics
59+
- jCodeMunch metrics
60+
- codebase-memory-mcp metrics
61+
- CodeGraphContext metrics
6462

6563
## Comparator Reality
6664

6765
The current comparator artifact records setup failures, not benchmark wins.
6866

6967
| Comparator | Status | Current reason |
7068
| --- | --- | --- |
71-
| `codebase-memory-mcp` | `ok` | The lane now executes on this host, but the captured outputs are near-empty (`19` bytes / `5` tokens on average, `0` usefulness), so the gate still treats it as missing evidence |
72-
| `jCodeMunch` | `setup_failed` | MCP handshake still closes during startup on this host (`MCP error -32000: Connection closed`) |
73-
| `GrepAI` | `setup_failed` | Local Go binary and Ollama model path are not present |
74-
| `CodeGraphContext` | `setup_failed` | MCP handshake still closes during startup on this host (`MCP error -32000: Connection closed`); database prerequisite remains unresolved |
75-
| `raw Claude Code` | `ok` | The baseline now runs, but the captured outputs remain non-useful (`66.08` bytes / `17.17` tokens on average, `0` usefulness), so the gate still treats it as missing evidence |
69+
| `codebase-memory-mcp` | `setup_failed` | Installer path still points to the external shell installer |
70+
| `jCodeMunch` | `setup_failed` | MCP server closes during startup |
71+
| `GrepAI` | `setup_failed` | Local Go binary and Ollama model path not present |
72+
| `CodeGraphContext` | `setup_failed` | MCP server closes during startup |
73+
| `raw Claude Code` | `setup_failed` | Local `claude` CLI baseline is not installed/authenticated in this environment |
7674

77-
`CodeGraphContext` remains part of the comparison frame. It is not removed from the public story just because the lane still fails to start.
75+
`CodeGraphContext` is explicitly part of the frozen comparison frame. It is not omitted from the public story just because the lane still fails to start.
7876

7977
## Important Limitations
8078

8179
- This benchmark measures discovery usefulness and payload cost only.
8280
- It does not measure implementation correctness, patch quality, or end-to-end task completion.
8381
- Comparator setup is still environment-sensitive, so the gate remains `pending_evidence`.
84-
- Current search payload costs are higher than the older v2.0.0 proof slice because the v2.1.0 surface now includes richer map structure and `searchQuality.tokenEstimate` advisories.
82+
- The reranker cache is currently corrupted on this machine. During the proof rerun, search fell back to original ordering after `Protobuf parsing failed` while still completing the harness.
8583
- `averageFirstRelevantHit` remains `null` in the current gate output because this compact response surface does not expose a comparable ranked-hit metric across the incomplete comparator set.
8684

8785
## What This Proof Can Support

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "codebase-context",
3-
"version": "2.1.0",
3+
"version": "1.9.0",
44
"description": "Pre-maps your codebase architecture, conventions, and team memory so AI agents navigate with precision instead of exploring. Local-first MCP server with AST-backed hybrid search.",
55
"type": "module",
66
"main": "./dist/lib.js",

0 commit comments

Comments
 (0)