Skip to content

Commit 6975522

Browse files
unamedkrclaude
andcommitted
state: R5 TQ_NO_Q4 quality vs speed — inconsistent, keep opt-in
Cross-model A/B: TQ_NO_Q4=1 costs 7-26% decode speed across Qwen3/Phi-3/Llama. Quality win is prompt-dependent — clear improvement on one long prompt ("faraway land" → coherent village story) but no difference on short prompts. Not flipping default. Notable side-finding: Llama-3.2-1B Q8_0 default path emits 'café' UTF-8 artifact; NO_Q4 path produces clean text. Tracked as separate follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 6727a74 commit 6975522

1 file changed

Lines changed: 25 additions & 0 deletions

File tree

.claude/state.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,31 @@
33
**Last updated**: 2026-04-21 (Phase 1 refparity ★)
44
**Session HEAD**: Reference-parity framework (tools/refparity/) LANDED — HF vs engine per-layer diff, pos-aligned, post_norm-aware.
55

6+
## Phase 1 R5 — TQ_NO_Q4=1 quality/speed tradeoff — NOT flipping default (2026-04-21)
7+
8+
Cross-model A/B on "Once upon a time" (short) vs "Once upon a time in a
9+
faraway land" (longer):
10+
11+
| Model | Default text | NO_Q4 text | Default t/s | NO_Q4 t/s | Δ speed |
12+
|---|---|---|---:|---:|---:|
13+
| Qwen3-0.6B Q4_K_M | math-genre "100 people…" | math-genre "100 people…" | 59.9 | 49.6 | **-17%** |
14+
| Phi-3.5 Q4_K_M | identical | identical | 15.9 | 14.8 | -7% |
15+
| Llama-3.2-1B Q8_0 | UTF-8 artifact "café" | clean "badger Bertha" | 53.4 | 45.6 | -15% |
16+
| Qwen3.5-4B Q4_K_M | "young adventurer Alex" | "little animals" | 18.6 | 13.8 | **-26%** |
17+
18+
Earlier "faraway land" prompt on Qwen3-0.6B *did* show a real NO_Q4 win
19+
(disjoint → coherent Luminara village narrative). Prompt-dependent.
20+
21+
**Decision**: do NOT flip default. Evidence:
22+
- Speed cost real (7-26%) across all models
23+
- Quality win inconsistent — prompt-dependent, not universal
24+
- The Llama UTF-8 artifact hints at an unrelated subtle bug worth follow-up
25+
26+
Keep `TQ_NO_Q4=1` as opt-in for "quality matters more than decode speed"
27+
scenarios. Document in README for users who care. Next-round candidate:
28+
investigate the Llama café encoding issue — that's a separate, concrete bug
29+
surfaced by this round.
30+
631
## Phase 1 R3 — FFN magnitude error correlates with activation magnitude (2026-04-21)
732

833
Extended diagnosis: the FFN magnitude drift **scales with input activation magnitude**.

0 commit comments

Comments
 (0)