Skip to content

Commit 6c11a53

Browse files
greynewellclaude
andcommitted
docs(benchmark): remove three-file shards from blog post
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 4a62e66 commit 6c11a53

1 file changed

Lines changed: 9 additions & 11 deletions

File tree

benchmark/results/blog-post-draft.md

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 50% cheaper. 4× faster. Same correct answer.
22

3-
We ran a test: give Claude Code the same task four ways — naked, with a hand-crafted prompt, with our auto-generated prompt, and with a different shard format. All had to make 8 failing tests pass in a 270k-line codebase. Same model. Same starting point.
3+
We ran a test: give Claude Code the same task three ways — naked, with a hand-crafted prompt, and with our auto-generated prompt. All had to make 8 failing tests pass in a 270k-line codebase. Same model. Same starting point.
44

55
Here's what happened.
66

@@ -14,8 +14,6 @@ A **shard** is the card catalog entry for a source file. It's a tiny file that l
1414

1515
`supermodel analyze` builds all the shards at once by scanning your repo and mapping out every function call and dependency. After that, every AI session starts with the map already drawn.
1616

17-
The "Three-file shards" column in the results below tested an older format that split each shard across three files instead of one. It did worse — more files to open, more turns spent loading context.
18-
1917
---
2018

2119
## The setup
@@ -43,16 +41,16 @@ No plugins. No special AI tools. Just better context up front.
4341

4442
## Results
4543

46-
| | Naked Claude | + Supermodel (crafted) | + Supermodel (auto) | Three-file shards |
47-
|---------------------|-------------|------------------------|---------------------|-------------------|
48-
| **Cost** | $0.22 | $0.13 | $0.11 | $0.25 |
49-
| **Turns** | 13 | 7 | 7 | 16 |
50-
| **Duration** | 95s | 24s | 30s | 72s |
51-
| **Tests passed** | ✓ YES | ✓ YES | ✓ YES | ✓ YES |
44+
| | Naked Claude | + Supermodel (crafted) | + Supermodel (auto) |
45+
|---------------------|-------------|------------------------|---------------------|
46+
| **Cost** | $0.22 | $0.13 | $0.11 |
47+
| **Turns** | 13 | 7 | 7 |
48+
| **Duration** | 95s | 24s | 30s |
49+
| **Tests passed** | ✓ YES | ✓ YES | ✓ YES |
5250

5351
**40–50% cheaper. 3–4× faster. 46% fewer turns.**
5452

55-
All four got the right answer. The only difference was how much digging each one had to do first.
53+
All three got the right answer. The only difference was how much digging each one had to do first.
5654

5755
"Crafted" is a hand-written CLAUDE.md with Django-specific hints. "Auto" is what `supermodel skill` generates — a generic prompt that works on any repo. The auto prompt was actually *cheaper* than the hand-crafted one in this run, at $0.11 vs $0.13.
5856

@@ -157,7 +155,7 @@ Run the analysis once. Save on every task after.
157155
## Resources
158156

159157
- **CLI:** [github.com/supermodeltools/cli](https://github.com/supermodeltools/cli)
160-
- **Raw benchmark logs:** [benchmark_results.zip](https://github.com/supermodeltools/cli/raw/main/benchmark/results/benchmark_results.zip) — full transcript for all four runs (naked, crafted, auto, three-file)
158+
- **Raw benchmark logs:** [benchmark_results.zip](https://github.com/supermodeltools/cli/raw/main/benchmark/results/benchmark_results.zip) — full transcript for all three runs (naked, crafted, auto)
161159

162160
---
163161

0 commit comments

Comments
 (0)