You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: benchmark/results/blog-post-draft.md
+9-11Lines changed: 9 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# 50% cheaper. 4× faster. Same correct answer.
2
2
3
-
We ran a test: give Claude Code the same task four ways — naked, with a hand-crafted prompt, with our auto-generated prompt, and with a different shard format. All had to make 8 failing tests pass in a 270k-line codebase. Same model. Same starting point.
3
+
We ran a test: give Claude Code the same task three ways — naked, with a hand-crafted prompt, and with our auto-generated prompt. All had to make 8 failing tests pass in a 270k-line codebase. Same model. Same starting point.
4
4
5
5
Here's what happened.
6
6
@@ -14,8 +14,6 @@ A **shard** is the card catalog entry for a source file. It's a tiny file that l
14
14
15
15
`supermodel analyze` builds all the shards at once by scanning your repo and mapping out every function call and dependency. After that, every AI session starts with the map already drawn.
16
16
17
-
The "Three-file shards" column in the results below tested an older format that split each shard across three files instead of one. It did worse — more files to open, more turns spent loading context.
18
-
19
17
---
20
18
21
19
## The setup
@@ -43,16 +41,16 @@ No plugins. No special AI tools. Just better context up front.
All four got the right answer. The only difference was how much digging each one had to do first.
53
+
All three got the right answer. The only difference was how much digging each one had to do first.
56
54
57
55
"Crafted" is a hand-written CLAUDE.md with Django-specific hints. "Auto" is what `supermodel skill` generates — a generic prompt that works on any repo. The auto prompt was actually *cheaper* than the hand-crafted one in this run, at $0.11 vs $0.13.
58
56
@@ -157,7 +155,7 @@ Run the analysis once. Save on every task after.
-**Raw benchmark logs:**[benchmark_results.zip](https://github.com/supermodeltools/cli/raw/main/benchmark/results/benchmark_results.zip) — full transcript for all four runs (naked, crafted, auto, three-file)
158
+
-**Raw benchmark logs:**[benchmark_results.zip](https://github.com/supermodeltools/cli/raw/main/benchmark/results/benchmark_results.zip) — full transcript for all three runs (naked, crafted, auto)
0 commit comments