Skip to content

Commit d1c6057

Browse files
unamedkrclaude
andcommitted
debug(deltanet): TQ_DELTA_PROBE — locates L0 as the outlier layer
Adds per-layer state L2 norm dump env (comma-separated list of layer-0 call counts). Thread-local, zero-cost when unset. Applied on Qwen3.6-35B IQ4_XS "Once upon a time in a faraway land": call=50 L0=127.0 L1=41.3 L2=17.9 L8=31.1 others 7-24 call=100 L0=150.7 L1=42.4 L2=16.7 L8=31.5 others 7-24 call=115 L0=154.9 L1=40.8 L2=16.6 L8=30.6 others 7-24 ← 117-tok loop start call=120 L0=154.5 L1=40.8 L2=16.7 L8=30.5 others 7-24 L0's DeltaNet recurrent state sits at 3-10× every other layer's norm. Grew 127→155 over 100 tokens (+22%) while others stayed ±10%. R16 proved 117-tok repetition loop IS state-driven. R17 localizes the suspect to L0's recurrent state specifically — either a_log decay param is scaled differently at L0, or our implementation has an L0-specific bug. Next round: inspect a_log weight + compare our decay math to refs/llama.cpp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b061e7d commit d1c6057

2 files changed

Lines changed: 58 additions & 0 deletions

File tree

.claude/state.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,38 @@
33
**Last updated**: 2026-04-21 (Phase 1 refparity ★)
44
**Session HEAD**: Reference-parity framework (tools/refparity/) LANDED — HF vs engine per-layer diff, pos-aligned, post_norm-aware.
55

6+
## ★★ Phase 1 R17 — L0 DeltaNet state is 10× the others (2026-04-21) ★★
7+
8+
Added `TQ_DELTA_PROBE=pos1,pos2,...` env in `deltanet_forward` to dump
9+
per-layer state L2 norm at listed layer-0 call counts.
10+
11+
Measurement on Qwen3.6-35B IQ4_XS "Once upon a time in a faraway land":
12+
13+
| call | L0 | L1 | L2 | L8 | L14 | L26 | typical rest |
14+
|---:|---:|---:|---:|---:|---:|---:|---:|
15+
| 50 | **127.0** | 41.3 | 17.9 | 31.1 | 24.7 | 21.5 | 7-17 |
16+
| 100 | **150.7** | 42.4 | 16.7 | 31.5 | 24.1 | 21.3 | 7-18 |
17+
| 115 | **154.9** | 40.8 | 16.6 | 30.6 | 23.9 | 21.1 | 8-17 |
18+
| 118 | **154.7** | 42.3 | 17.7 | 30.8 | 23.9 | 21.0 | 8-17 |
19+
| 120 | **154.5** | 40.8 | 16.7 | 30.5 | 23.8 | 20.8 | 8-17 |
20+
21+
**L0 is 3-10× everything else.** Not a transient spike at the drift
22+
boundary — L0 sat at ~155 for tokens 100-120, while the "It could do
23+
math!" loop kicked in at token 117. So L0's high steady-state IS the
24+
chronic condition; it must be interacting badly with attention's KV or
25+
downstream layers.
26+
27+
L0 grew call 50→115 from 127→155 (+22%), while others stayed ±10%. So
28+
L0 also lacks proper decay relative to other layers, though growth has
29+
slowed by call 100 (suggesting partial steady-state).
30+
31+
**Hypothesis**: L0's decay param (`a_log`) either has a different scale
32+
vs upstream layers, OR our implementation is applying decay wrong at L0
33+
specifically. Next round: dump L0's `a_log` vs L1's, and compare our
34+
decay math to refs/llama.cpp qwen3_next DeltaNet.
35+
36+
`TQ_DELTA_PROBE` stays as a permanent diagnostic env.
37+
638
## ★ Phase 1 R16 — DeltaNet state CAUSALLY proves 35B drift (2026-04-21) ★
739

840
Added `TQ_DELTA_RESET_EVERY=N` env ablation in `deltanet_forward` — zeroes

src/engine/tq_transformer.c

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -633,6 +633,32 @@ static void deltanet_forward(tq_model_t* model, tq_state_t* s, int l) {
633633
static __thread int _delta_call_count = 0;
634634
const char* _rst = getenv("TQ_DELTA_RESET_EVERY");
635635
if (l == 0) _delta_call_count++;
636+
637+
/* Probe: TQ_DELTA_PROBE=pos1,pos2,... prints per-layer state L2 norm
638+
* at the listed layer-0 call counts. Helps localize which layer's
639+
* recurrent state explodes first ahead of the drift cliff. */
640+
const char* _probe = getenv("TQ_DELTA_PROBE");
641+
if (_probe) {
642+
int match = 0;
643+
const char* p = _probe;
644+
while (*p) {
645+
int v = atoi(p);
646+
if (v == _delta_call_count) { match = 1; break; }
647+
while (*p && *p != ',') p++;
648+
if (*p == ',') p++;
649+
}
650+
if (match) {
651+
size_t layer_size = (size_t)dn * dk * dv;
652+
double ss = 0.0;
653+
for (size_t i = 0; i < layer_size; i++) {
654+
float v = state[i];
655+
ss += (double)v * v;
656+
}
657+
float nrm = (float)sqrt(ss);
658+
fprintf(stderr, "[delta-probe] call=%d L%d state_norm=%.4f\n",
659+
_delta_call_count, l, nrm);
660+
}
661+
}
636662
if (_rst && l == 0) {
637663
int rst_n = atoi(_rst);
638664
if (rst_n > 0 && _delta_call_count > 1 && (_delta_call_count % rst_n) == 0) {

0 commit comments

Comments
 (0)