Skip to content

Commit 3c91c12

Browse files
committed
chore(v2): publish proof bundle and sync public surface for v2.0.0 relaunch
Ships the Phase 10 proof artifacts: discovery benchmark doc, comparator evidence table, demo script, and registry sync checklist. Updates README hero to map-first framing, removes second-brain keyword from package.json, and refreshes proof result JSONs from the latest local run. Gate stays pending_evidence — all comparator lanes setup_failed, claimAllowed false.
1 parent ad5db8b commit 3c91c12

12 files changed

Lines changed: 395 additions & 67 deletions

README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# codebase-context
22

3-
## Local-first second brain for AI agents working on your codebase
3+
## Stop paying for AI agents to explore your codebase. codebase-context pre-maps the architecture, conventions, and team memory so they don't have to.
44

55
[![npm version](https://img.shields.io/npm/v/codebase-context)](https://www.npmjs.com/package/codebase-context) [![license](https://img.shields.io/npm/l/codebase-context)](./LICENSE) [![node](https://img.shields.io/node/v/codebase-context)](./package.json)
66

@@ -20,6 +20,8 @@ Here's what codebase-context does:
2020

2121
One tool call returns all of it. Local-first - your code never leaves your machine by default.
2222

23+
See the [v2.0.0 benchmark](./docs/benchmark.md) for the discovery suite results and current gate truth.
24+
2325
### What it looks like
2426

2527
Real CLI output against `angular-spotify`, the repo used for the launch screenshots.
@@ -36,7 +38,7 @@ This is the part most tools miss: what the team is doing now, what it is moving
3638

3739
When the agent searches with edit intent, it gets a compact decision card: confidence, whether it's safe to proceed, which patterns apply, the best example, and which files are likely to be affected.
3840

39-
More CLI examples in [`docs/cli.md`](./docs/cli.md).
41+
More CLI examples in [`docs/cli.md`](./docs/cli.md). Full walkthrough: [`docs/demo.md`](./docs/demo.md).
4042

4143
## Quick Start
4244

@@ -222,6 +224,8 @@ These are the behaviors that make the most difference day-to-day. Copy, trim wha
222224

223225
## Links
224226

227+
- [Benchmark](./docs/benchmark.md) — v2.0.0 discovery suite results and gate truth
228+
- [Demo](./docs/demo.md) — real CLI walkthrough
225229
- [Client Setup](./docs/client-setup.md) — per-client config, HTTP setup, local build testing
226230
- [Capabilities Reference](./docs/capabilities.md) — tool API, retrieval pipeline, decision card schema
227231
- [CLI Gallery](./docs/cli.md) — formatted command output examples

docs/benchmark.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Discovery Benchmark
2+
3+
This page documents the current public proof slice for `v2.0.0`.
4+
It is a discovery benchmark, not an implementation-quality benchmark.
5+
6+
## Scope
7+
8+
- Frozen fixtures:
9+
- `tests/fixtures/discovery-angular-spotify.json`
10+
- `tests/fixtures/discovery-excalidraw.json`
11+
- `tests/fixtures/discovery-benchmark-protocol.json`
12+
- Frozen repos used in the current proof run:
13+
- `repos/angular-spotify`
14+
- `repos/excalidraw`
15+
- Current gate artifact:
16+
- `results/gate-evaluation.json`
17+
- Comparator evidence:
18+
- `results/comparator-evidence.json`
19+
20+
## How To Reproduce
21+
22+
Run the repo-local proof artifacts from the current `master` checkout:
23+
24+
```bash
25+
node scripts/run-eval.mjs repos/angular-spotify --mode=discovery --fixture-a=tests/fixtures/discovery-angular-spotify.json --skip-reindex --output=results/codebase-context-angular-spotify.json
26+
node scripts/run-eval.mjs repos/excalidraw --mode=discovery --fixture-a=tests/fixtures/discovery-excalidraw.json --skip-reindex --output=results/codebase-context-excalidraw.json
27+
node scripts/benchmark-comparators.mjs --repos repos/angular-spotify,repos/excalidraw --output results/comparator-evidence.json
28+
node scripts/run-eval.mjs repos/angular-spotify repos/excalidraw --mode=discovery --fixture-a=tests/fixtures/discovery-angular-spotify.json --fixture-b=tests/fixtures/discovery-excalidraw.json --competitor-results=results/comparator-evidence.json --skip-reindex --output=results/gate-evaluation.json
29+
```
30+
31+
## Current Result
32+
33+
From `results/gate-evaluation.json`:
34+
35+
- `status`: `pending_evidence`
36+
- `suiteStatus`: `complete`
37+
- `claimAllowed`: `false`
38+
- `totalTasks`: `24`
39+
- `averageUsefulness`: `0.75`
40+
- `averageEstimatedTokens`: `903.7083333333334`
41+
- `bestExampleUsefulnessRate`: `0.125`
42+
43+
Repo-level outputs from the same rerun:
44+
45+
| Repo | Tasks | Avg usefulness | Avg estimated tokens | Best-example usefulness |
46+
| --- | ---: | ---: | ---: | ---: |
47+
| `angular-spotify` | 12 | 0.8333 | 1080.6667 | 0.25 |
48+
| `excalidraw` | 12 | 0.6667 | 726.75 | 0 |
49+
50+
## Gate Truth
51+
52+
The gate is intentionally still blocked.
53+
54+
- The combined suite now covers both public repos.
55+
- The release claim is still disallowed because comparator evidence remains incomplete.
56+
- Missing evidence currently includes:
57+
- raw Claude Code baseline metrics
58+
- GrepAI metrics
59+
- jCodeMunch metrics
60+
- codebase-memory-mcp metrics
61+
- CodeGraphContext metrics
62+
63+
## Comparator Reality
64+
65+
The current comparator artifact records setup failures, not benchmark wins.
66+
67+
| Comparator | Status | Current reason |
68+
| --- | --- | --- |
69+
| `codebase-memory-mcp` | `setup_failed` | Installer path still points to the external shell installer |
70+
| `jCodeMunch` | `setup_failed` | MCP server closes during startup |
71+
| `GrepAI` | `setup_failed` | Local Go binary and Ollama model path not present |
72+
| `CodeGraphContext` | `setup_failed` | MCP server closes during startup |
73+
| `raw Claude Code` | `setup_failed` | Local `claude` CLI baseline is not installed/authenticated in this environment |
74+
75+
`CodeGraphContext` is explicitly part of the frozen comparison frame. It is not omitted from the public story just because the lane still fails to start.
76+
77+
## Important Limitations
78+
79+
- This benchmark measures discovery usefulness and payload cost only.
80+
- It does not measure implementation correctness, patch quality, or end-to-end task completion.
81+
- Comparator setup is still environment-sensitive, so the gate remains `pending_evidence`.
82+
- The reranker cache is currently corrupted on this machine. During the proof rerun, search fell back to original ordering after `Protobuf parsing failed` while still completing the harness.
83+
- `averageFirstRelevantHit` remains `null` in the current gate output because this compact response surface does not expose a comparable ranked-hit metric across the incomplete comparator set.
84+
85+
## What This Proof Can Support
86+
87+
- It can support claims about the shipped discovery surfaces and their current measured outputs on the frozen public tasks.
88+
- It can support claims that the proof gate is still blocked by comparator evidence.
89+
- It cannot support claims that `codebase-context` beats the named comparators today.
90+
- It cannot support claims about edit success, code quality, or implementation speed.

docs/capabilities.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -280,6 +280,8 @@ Notes:
280280

281281
## Evaluation Harness
282282

283+
Current public proof bundle: [`docs/benchmark.md`](../docs/benchmark.md) and [`docs/comparison-table.md`](../docs/comparison-table.md).
284+
283285
Reproducible evaluation is shipped as a CLI entrypoint backed by shared scoring/reporting code.
284286

285287
- **Command:** `npm run eval -- <codebaseA> [codebaseB] --mode retrieval|discovery [--competitor-results <path>]` (builds first, then runs `scripts/run-eval.mjs`)

docs/cli.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
`codebase-context` exposes its tools as a local CLI so humans can:
44

5+
- Get the conventions map before exploring or editing (`map`)
56
- Onboard themselves onto an unfamiliar repo
67
- Debug what the MCP server is doing
78
- Use outputs in CI/scripts (via `--json`)
@@ -30,6 +31,7 @@ CODEBASE_CONTEXT_ASCII=1 npx -y codebase-context patterns
3031

3132
## Commands
3233

34+
- `map` — conventions map: architecture layers, patterns, golden files
3335
- `metadata` — tech stack overview
3436
- `patterns` — team conventions + adoption/trends
3537
- `search --query <q>` — ranked results; add `--intent edit` for a preflight card
@@ -42,6 +44,41 @@ CODEBASE_CONTEXT_ASCII=1 npx -y codebase-context patterns
4244

4345
---
4446

47+
## `map`
48+
49+
```bash
50+
npx -y codebase-context map
51+
```
52+
53+
The conventions map — run this first on an unfamiliar repo. Shows architecture layers, active patterns with adoption rates and trend direction, and the golden files the team treats as the strongest examples. This is also what the MCP server delivers to AI agents via the `codebase://context` resource on first call.
54+
55+
Example output (truncated):
56+
57+
```text
58+
┌─ Codebase Map ── angular-spotify ────────────────────────────────────┐
59+
│ │
60+
│ Architecture: feature-based · 3 layers │
61+
│ 47 files · 6 patterns · 3 golden files │
62+
│ │
63+
│ LAYERS │
64+
│ core/ – shared services + DI │
65+
│ features/ – domain modules │
66+
│ shared/ – reusable components │
67+
│ │
68+
│ TOP PATTERNS │
69+
│ Angular standalone components 92% ↑ Rising │
70+
│ RxJS reactive patterns 78% │
71+
│ ↓ NgModules 8% Declining │
72+
│ │
73+
│ GOLDEN FILES │
74+
│ src/features/player/player.component.ts │
75+
│ src/core/auth/auth.service.ts │
76+
│ │
77+
└──────────────────────────────────────────────────────────────────────┘
78+
```
79+
80+
---
81+
4582
## `metadata`
4683

4784
```bash

docs/comparison-table.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Comparator Summary
2+
3+
This table summarizes the current comparator evidence from `results/comparator-evidence.json`.
4+
It is a setup-status table first, not a marketing scoreboard.
5+
6+
| Comparator | Intended role in gate | Current status | Evidence summary |
7+
| --- | --- | --- | --- |
8+
| `raw Claude Code` | Baseline for payload cost and at least one usefulness comparison | `setup_failed` | The local `claude` CLI baseline is unavailable in this environment, so the gate records missing baseline metrics. |
9+
| `GrepAI` | Named MCP comparator | `setup_failed` | Requires the GrepAI binary plus a local Ollama embedding setup that is not present in this proof environment. |
10+
| `jCodeMunch` | Named MCP comparator | `setup_failed` | The MCP server still closes on startup during the current rerun, so no comparable discovery metrics were produced. |
11+
| `codebase-memory-mcp` | Named MCP comparator | `setup_failed` | The documented install path still depends on the external shell installer instead of a working local benchmark path. |
12+
| `CodeGraphContext` | Graph-native comparator in the relaunch frame | `setup_failed` | The MCP server still closes on startup during the current rerun, so this lane remains missing evidence. |
13+
14+
## Reading This Table
15+
16+
- `setup_failed` means the lane was attempted and did not reach a credible metric-producing state.
17+
- A missing metric is not treated as a win for `codebase-context`.
18+
- The combined gate in `results/gate-evaluation.json` remains `pending_evidence` until these lanes produce real metrics.
19+
20+
## Current codebase-context result
21+
22+
For reference, the current combined discovery output across `angular-spotify` and `excalidraw` is:
23+
24+
| Metric | codebase-context |
25+
| --- | ---: |
26+
| `totalTasks` | 24 |
27+
| `averageUsefulness` | 0.75 |
28+
| `averagePayloadBytes` | 3613.6667 |
29+
| `averageEstimatedTokens` | 903.7083 |
30+
| `bestExampleUsefulnessRate` | 0.125 |
31+
| `gate.status` | `pending_evidence` |
32+
33+
Those numbers are not compared here as head-to-head wins because the comparator lanes above did not produce matching metrics.

docs/demo.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Demo Script
2+
3+
This walkthrough uses real CLI output captured against `repos/angular-spotify` during the Phase 10 proof rerun.
4+
Run it from the repo root with `CODEBASE_ROOT` pointed at the frozen sample repo.
5+
6+
## 1. Start With The Conventions Map
7+
8+
```bash
9+
$env:CODEBASE_ROOT='C:\Users\bitaz\Repos\codebase-context\repos\angular-spotify'
10+
node dist/index.js map --json
11+
```
12+
13+
Captured output excerpt:
14+
15+
```json
16+
{
17+
"project": "angular-spotify",
18+
"architecture": {
19+
"layers": [
20+
{ "name": "libs", "fileCount": 252 },
21+
{ "name": "apps", "fileCount": 6 }
22+
]
23+
},
24+
"activePatterns": [
25+
{ "name": "Effect", "adoption": "100%", "trend": "Rising" },
26+
{ "name": "Standalone", "adoption": "100%", "trend": "Rising" },
27+
{ "name": "RxJS", "adoption": "98%", "trend": "Rising" }
28+
],
29+
"bestExamples": [
30+
{ "file": "src/lib/card.component.ts", "score": 4, "reason": "Angular TestBed" }
31+
]
32+
}
33+
```
34+
35+
What this shows:
36+
37+
- The first call gives a compact conventions map instead of raw grep output.
38+
- The response already includes architecture layers, active patterns, and a concrete best example.
39+
40+
## 2. Search With Edit Intent
41+
42+
```bash
43+
$env:CODEBASE_ROOT='C:\Users\bitaz\Repos\codebase-context\repos\angular-spotify'
44+
node dist/index.js search --query "auth headers" --intent edit --limit 3 --json
45+
```
46+
47+
Captured output excerpt:
48+
49+
```json
50+
{
51+
"status": "success",
52+
"searchQuality": { "status": "ok", "confidence": 1 },
53+
"preflight": {
54+
"ready": true,
55+
"warnings": [
56+
"Index is aging (>24h) — results may not reflect recent changes"
57+
],
58+
"patterns": {
59+
"do": [
60+
"Constructor injection — 85% adoption",
61+
"Standalone — 100% adoption",
62+
"RxJS — 98% adoption"
63+
]
64+
},
65+
"bestExample": "src/lib/card.component.ts"
66+
},
67+
"results": [
68+
{
69+
"file": "C:\\Users\\bitaz\\Repos\\codebase-context\\repos\\angular-spotify\\libs\\web\\auth\\util\\src\\lib\\interceptors\\auth.interceptor.ts:10-42",
70+
"type": "interceptor:core"
71+
}
72+
]
73+
}
74+
```
75+
76+
What this shows:
77+
78+
- Search remains the second step after the map.
79+
- `intent=edit` adds preflight evidence instead of forcing a separate call.
80+
- The response stays compact while still surfacing a best example and impact hints.
81+
82+
## 3. Check A Team Pattern Directly
83+
84+
```bash
85+
$env:CODEBASE_ROOT='C:\Users\bitaz\Repos\codebase-context\repos\angular-spotify'
86+
node dist/index.js patterns --category state --json
87+
```
88+
89+
Captured output excerpt:
90+
91+
```json
92+
{
93+
"patterns": {
94+
"stateManagement": {
95+
"primary": {
96+
"name": "RxJS",
97+
"frequency": "98%",
98+
"trend": "Rising"
99+
},
100+
"alsoDetected": [
101+
{
102+
"name": "Signals",
103+
"frequency": "2%",
104+
"trend": "Rising"
105+
}
106+
]
107+
}
108+
}
109+
}
110+
```
111+
112+
What this shows:
113+
114+
- The tool distinguishes dominant patterns from emerging ones.
115+
- The map/search story is backed by direct pattern evidence rather than generic prose.
116+
117+
## Caveats
118+
119+
- These excerpts were captured from the current local proof run and will change if the frozen sample repo or index state changes.
120+
- The benchmark gate is still `pending_evidence`, so this walkthrough demonstrates shipped behavior, not a released performance claim.

docs/registry-sync-checklist.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Registry Sync Checklist
2+
3+
Use this checklist before publishing any Phase 10-facing metadata or registry copy.
4+
The purpose is to keep the public surface aligned with the current proof bundle.
5+
6+
## Required Artifacts
7+
8+
- `results/gate-evaluation.json` exists and still reports the current gate truth.
9+
- `results/comparator-evidence.json` exists and still records every failed lane honestly.
10+
- `docs/benchmark.md` matches the current gate numbers and limitations.
11+
- `docs/comparison-table.md` matches the current comparator statuses, including `CodeGraphContext`.
12+
- `docs/demo.md` uses real CLI output, not invented snippets.
13+
14+
## Public Surfaces To Sync
15+
16+
- `README.md`
17+
- `package.json`
18+
- `docs/capabilities.md`
19+
- `docs/client-setup.md`
20+
- `docs/cli.md`
21+
- npm package description and keywords derived from `package.json`
22+
23+
## Required Truth Checks
24+
25+
- If the gate is `pending_evidence`, say so explicitly.
26+
- If any comparator lane is `setup_failed`, say so explicitly.
27+
- Do not claim benchmark wins against `raw Claude Code`, `GrepAI`, `jCodeMunch`, `codebase-memory-mcp`, or `CodeGraphContext` without real metrics in `results/comparator-evidence.json`.
28+
- Do not claim implementation quality from this discovery benchmark.
29+
- Do not omit the current reranker fallback limitation if the proof run still shows `Protobuf parsing failed`.
30+
31+
## Before Registry Or README Updates
32+
33+
- Re-run the four proof commands from `docs/benchmark.md` if the evidence artifacts look stale.
34+
- Reconfirm that `results/gate-evaluation.json` still reports `claimAllowed: false` before writing relaunch copy.
35+
- Reconfirm that `results/gate-evaluation.json` still reports `suiteStatus: complete`.
36+
- Reconfirm that `CodeGraphContext` remains represented in the comparison table even if the lane still fails.
37+
38+
## Release Stop Conditions
39+
40+
- Stop if the proof docs drift from the JSON artifacts.
41+
- Stop if public copy implies a pass while the gate still says `pending_evidence`.
42+
- Stop if registry metadata still uses broader positioning than the proof bundle can support.

0 commit comments

Comments
 (0)