Commit b6b5f09
pillar1.5(R1): restore QK-norm for pure Qwen3 (R40 was over-broad)
R40 disabled QK-norm for all "qwen" arch GGUFs. That was correct for
Qwen3.5/3.6 HYBRID (DeltaNet + self-attn, delta_n_heads > 0) — those
degrade when QK-norm is applied to their 10 self-attn layers.
But pure Qwen3 (0.6B..32B, self-attn only) REQUIRES q_norm/k_norm.
Without them, long-prompt attention (pos>=1) has unnormalized Q·K
scores, residual stream explodes at layer 2 (norm 5396 vs HF 10),
output is UTF-8 garbage.
Found via HF reference diff methodology (tools/pillar1/diff_layers.py,
also added here):
- Layer-by-layer cosine/L2 at pos=1 with 144-token input
- Layer 0 cosine 0.98 at pos=0 but 0.74 at pos=1 → attention at pos>=1 broken
- h2 norm: ours 5396 vs HF 10 → catastrophic residual stream
- With TQ_FORCE_QK_NORM=1: h2 norm normalizes to ~11 (close to HF)
Fix: restrict the QK-norm disable to `delta_n_heads > 0` only. Drop
the over-broad GGUF-arch name match. Pure Qwen3 now applies q_norm/
k_norm per-head as HF does.
Real-world output (per-token prefill, 50-word synthetic prompt):
BEFORE: "lenameuously... catch�Williamson" UTF-8 garbage
AFTER: " word11223: word3: Word length?" pattern-matching English
Regression: 15/15 test_models + 4/4 test_tokenizer PASS.
Known remaining: batched prefill path (tq_forward_batch) still broken
independently. That path DOES apply QK-norm unconditionally (line 3615)
but still produces garbage — separate bug for follow-on R2+.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 1adc6d2 commit b6b5f09
2 files changed
Lines changed: 70 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1201 | 1201 | | |
1202 | 1202 | | |
1203 | 1203 | | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
| 1212 | + | |
1204 | 1213 | | |
1205 | | - | |
1206 | | - | |
1207 | | - | |
1208 | | - | |
1209 | | - | |
| 1214 | + | |
1210 | 1215 | | |
1211 | | - | |
| 1216 | + | |
1212 | 1217 | | |
1213 | 1218 | | |
1214 | 1219 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
0 commit comments