You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pillar1(R3) ★: BPE stale-entry bug — ROOT CAUSE of all Qwen3 drift
Single-line fix in src/engine/tq_tokenizer.c BPE heap merge loop:
if (top.gen != gen[top.pos]) continue;
+ if (tokens[top.pos] < 0) continue; <-- THIS LINE
int ri = next[top.pos];
if (ri >= n_tokens || tokens[ri] < 0) continue;
Bug: when position P dies as the RIGHT neighbor of some merge (its
tokens[P] set to -1), gen[P] was never bumped. Old heap entries at
position P slip through the gen check, resurrect the dead slot by
overwriting tokens[P], and corrupt the linked list — producing wrong
merged tokens with duplicated/lost characters.
Symptom: our engine encoded "Hello" as [32713='Hel', 654='ll'] =
literally "Helll" (5 chars: H,e,l,l,l — extra 'l', missing 'o').
HF encoded correctly as [9707='Hello']. This single tokenization
corruption was the structural root cause of:
- Qwen3-0.6B 1-word prompts producing UTF-8 garbage
- Qwen3.5/3.6 "quicck bbrrown fox" character doubling
- Qwen3.6-35B ≥40-word prompts → garbage (now coherent)
- Phi-3.5 "2+2?" hallucinating "answer" instead of math
- Dozens of rounds of transformer/MoE investigation (26-50)
After fix:
- Qwen3.6-35B "Once upon a time... young programmer" (40+ words)
→ coherent narrative "The idea intrigued him so much that he
decided to create his very own version... named it 'Hamster Run'"
- Qwen3.6-35B short programming prompt → perfect Python code
- Llama-3.2-3B 100-tok long-form → fully coherent
- Phi-3.5 "What is 2+2?" → "The sum of 2 and 2 is equal to four"
(actual correct math, was matching broken 'answer' word before)
Regression: 15/15 PASS. Phi-3.5 test updated "answer" → "sum" to
match the now-correct factual answer.
Methodology: HF reference diff (Pillar 1 R1-R2) revealed the token
mismatch. Debug added env-gated per-layer hidden state dump
(TQ_DUMP_HIDDEN=dir) in tq_forward; kept as debugging infrastructure
for future reference-diff work.
quant.h (single-header) uses naive O(n²) BPE merge, not affected by
this bug. Only split-source engine had the heap-based regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments