Skip to content

Commit 1adc6d2

Browse files
unamedkrclaude
andcommitted
pillar1(R8): isolate long-seq bug to batched prefill path (partial)
Extended TQ_DUMP_HIDDEN to support TQ_DUMP_POS={0|N|all} for per- position dumps (infrastructure for the next-step per-layer diff). A/B on Qwen3-0.6B with 144-token synthetic input: batched prefill: UTF-8 garbage (definitively broken) per-token prefill: ASCII but still wrong (secondary issue) KV fp32 (both): same patterns — KV compression NOT the cause Primary follow-on target: tq_forward_batch non-MoE path. Per-token path has a separate, milder issue (likely RoPE/attention at larger positions). Both need the HF per-layer diff to pinpoint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent db4d87a commit 1adc6d2

2 files changed

Lines changed: 36 additions & 2 deletions

File tree

bench/results/2026-04-20_longseq_transformer_bug.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,28 @@ Confirmed: our Qwen3-0.6B produces token 9707 for "Hello" (matching HF).
5656
4. **Attention dispatch branch at seq_len > 128 / > 256**
5757
- Some kernels have special-case paths for long sequences.
5858

59+
## R8 partial isolation: batched prefill is the primary offender
60+
61+
A/B on Qwen3-0.6B with 50-synthetic-word prompt (144 tokens):
62+
63+
| Path | Output |
64+
|---|---|
65+
| Batched (`tq_forward_batch`, default) | `alyticsанciea��...` (UTF-8 garbage) |
66+
| Per-token (`TQ_NO_BATCH_PREFILL=1`) | `" =, on up = a,="` (ASCII, broken but less) |
67+
| KV fp32 + batched | `���isonswana...` (still garbage, KV quant not root cause) |
68+
| KV fp32 + per-token | same as per-token above |
69+
70+
Interpretation:
71+
- Batched path is definitively broken — produces pure UTF-8 byte garbage.
72+
- Per-token path is also producing wrong output on natural prose, but
73+
with ASCII characters rather than byte-level chaos. Some subtle
74+
accumulation issue separate from the batched bug.
75+
- KV compression is NOT the cause; fp32 KV shows identical pattern.
76+
77+
Primary target for follow-on: `tq_forward_batch` for non-MoE models.
78+
Secondary: per-token path on natural prose (may be RoPE / attention
79+
accumulation at larger pos).
80+
5981
## Next steps (methodology)
6082

6183
Apply Pillar 1 methodology to transformer forward:

src/engine/tq_transformer.c

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2489,11 +2489,23 @@ static void self_attn_forward(tq_model_t* model, tq_state_t* s, int l, int pos)
24892489
* Dumps only at pos=0 (first token of prefill/generation) to avoid
24902490
* overwriting across prefill tokens. */
24912491
static void tq_dump_hidden(const char* name, const float* data, int n, int pos) {
2492-
if (pos != 0) return;
24932492
const char* dir = getenv("TQ_DUMP_HIDDEN");
24942493
if (!dir) return;
2494+
/* TQ_DUMP_POS=N selects a single position; default 0 preserves old
2495+
* behavior. TQ_DUMP_POS=all dumps every position (expensive: 28 × N
2496+
* files per forward pass). */
2497+
const char* pos_env = getenv("TQ_DUMP_POS");
2498+
if (pos_env && strcmp(pos_env, "all") == 0) {
2499+
/* dump all — append position to filename */
2500+
} else {
2501+
int target = pos_env ? atoi(pos_env) : 0;
2502+
if (pos != target) return;
2503+
}
24952504
char path[512];
2496-
snprintf(path, sizeof(path), "%s/%s.bin", dir, name);
2505+
if (pos_env && strcmp(pos_env, "all") == 0)
2506+
snprintf(path, sizeof(path), "%s/%s_p%d.bin", dir, name, pos);
2507+
else
2508+
snprintf(path, sizeof(path), "%s/%s.bin", dir, name);
24972509
FILE* f = fopen(path, "wb");
24982510
if (!f) return;
24992511
fwrite(data, sizeof(float), (size_t)n, f);

0 commit comments

Comments
 (0)