File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change 33** Last updated** : 2026-04-21 (Phase 1 refparity ★)
44** Session HEAD** : Reference-parity framework (tools/refparity/) LANDED — HF vs engine per-layer diff, pos-aligned, post_norm-aware.
55
6+ ## Phase 1 R5 — TQ_NO_Q4=1 quality/speed tradeoff — NOT flipping default (2026-04-21)
7+
8+ Cross-model A/B on "Once upon a time" (short) vs "Once upon a time in a
9+ faraway land" (longer):
10+
11+ | Model | Default text | NO_Q4 text | Default t/s | NO_Q4 t/s | Δ speed |
12+ | ---| ---| ---| ---:| ---:| ---:|
13+ | Qwen3-0.6B Q4_K_M | math-genre "100 people…" | math-genre "100 people…" | 59.9 | 49.6 | ** -17%** |
14+ | Phi-3.5 Q4_K_M | identical | identical | 15.9 | 14.8 | -7% |
15+ | Llama-3.2-1B Q8_0 | UTF-8 artifact "café" | clean "badger Bertha" | 53.4 | 45.6 | -15% |
16+ | Qwen3.5-4B Q4_K_M | "young adventurer Alex" | "little animals" | 18.6 | 13.8 | ** -26%** |
17+
18+ Earlier "faraway land" prompt on Qwen3-0.6B * did* show a real NO_Q4 win
19+ (disjoint → coherent Luminara village narrative). Prompt-dependent.
20+
21+ ** Decision** : do NOT flip default. Evidence:
22+ - Speed cost real (7-26%) across all models
23+ - Quality win inconsistent — prompt-dependent, not universal
24+ - The Llama UTF-8 artifact hints at an unrelated subtle bug worth follow-up
25+
26+ Keep ` TQ_NO_Q4=1 ` as opt-in for "quality matters more than decode speed"
27+ scenarios. Document in README for users who care. Next-round candidate:
28+ investigate the Llama café encoding issue — that's a separate, concrete bug
29+ surfaced by this round.
30+
631## Phase 1 R3 — FFN magnitude error correlates with activation magnitude (2026-04-21)
732
833Extended diagnosis: the FFN magnitude drift ** scales with input activation magnitude** .
You can’t perform that action at this time.
0 commit comments