debug(deltanet): TQ_DELTA_RESET_LAYER — per-layer reset ablation

unamedkr · claude · unamedkr · commit 65e4a2d2cf07 · 2026-04-21T16:41:11.000+09:00
Bisection result on Qwen3.6-35B 117-tok repetition loop:

  reset L0 only  @ call=120: STILL loops at 117 ("anything"→"math")
  reset L8 only  @ call=120: STILL loops at 117
  reset L20 only @ call=120: STILL loops at 117
  reset L38 only @ call=120: STILL loops at 117
  reset ALL      @ call=120: breaks loop (R16 baseline, different output)

Conclusion: the drift pathology is distributed across all 30 DeltaNet
layers, not localized to any single layer. No one-liner fix.

Keep the diagnostic envs for future reference-port work. Strategic
direction: either a 4B-class DeltaNet HF reference for full-parity diff,
or a reimplementation port from the PyTorch reference.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/.claude/state.md b/.claude/state.md
@@ -3,6 +3,38 @@
 **Last updated**: 2026-04-21 (Phase 1 refparity ★)
 **Session HEAD**: Reference-parity framework (tools/refparity/) LANDED — HF vs engine per-layer diff, pos-aligned, post_norm-aware.
 
+## Phase 1 R19 — Single-layer reset is not enough — drift is distributed (2026-04-21)
+
+Added `TQ_DELTA_RESET_LAYER=N` env to bisect which DeltaNet layer drives
+the 117-tok repetition loop. Combined with `TQ_DELTA_RESET_EVERY=120` to
+force reset right at the drift boundary.
+
+Tested on Qwen3.6-35B IQ4_XS "Once upon a time in a faraway land":
+
+| reset layer | post-117 text |
+|:---:|:---|
+| L0 only | "It could do math! It could do math! It could do anything! It could" → STILL loop at 117 |
+| L8 only | "It could do math! It could do math! It could do math!" → loop at 117 |
+| L20 only | "It could do math!" ×3 → loop at 117 |
+| L38 only | "It could do math!" ×3 → loop at 117 |
+| ALL layers (R16 baseline) | "0 Comments \| Views: 4,986 views — 'The Great Adventure'" |
+
+R19 conclusion: **no single DeltaNet layer carries the drift signal alone**
+— clearing any one leaves the repetition cliff intact. Only the full
+30-layer reset breaks the "It could do math!" lock. So the 117-tok
+pathology is a **distributed multi-layer interaction**, not a single-layer
+bug amenable to a one-liner fix.
+
+Diagnostic infrastructure stays: `TQ_DELTA_PROBE`, `TQ_DELTA_RESET_EVERY`,
+`TQ_DELTA_RESET_LAYER` — future rounds can reset *ranges* or chain
+ablations more surgically.
+
+**Strategic step-back**: 35B DeltaNet drift looks unlikely to yield to
+short ablation rounds. Bigger-hammer approaches (full refparity against a
+4B-class DeltaNet HF model, or reference reimplementation port) are the
+remaining paths. Leave the diagnostic envs in place and move on to other
+deliverables this session.
+
 ## Phase 1 R18 — False alarm on a_log double-transform (2026-04-21)
 
 Dug into `ssm_a` values to test whether our `-expf(delta_a_log)` was a
diff --git a/src/engine/tq_transformer.c b/src/engine/tq_transformer.c
@@ -662,12 +662,25 @@ static void deltanet_forward(tq_model_t* model, tq_state_t* s, int l) {
         if (_rst && l == 0) {
             int rst_n = atoi(_rst);
             if (rst_n > 0 && _delta_call_count > 1 && (_delta_call_count % rst_n) == 0) {
-                size_t total_delta = (size_t)c->n_layers * dn * dk * dv;
-                memset(s->delta_state, 0, total_delta * sizeof(float));
-                size_t total_conv = (size_t)c->n_layers * qkv_dim * conv_buf_len;
-                memset(s->conv_state, 0, total_conv * sizeof(float));
+                /* TQ_DELTA_RESET_LAYER=N (default -1 = all layers).
+                 * Clears just that layer's slice so we can bisect which
+                 * layer actually drives the drift. */
+                const char* _only = getenv("TQ_DELTA_RESET_LAYER");
+                int only_layer = _only ? atoi(_only) : -1;
+                if (only_layer >= 0 && only_layer < c->n_layers) {
+                    memset(s->delta_state + (size_t)only_layer * dn * dk * dv,
+                           0, (size_t)dn * dk * dv * sizeof(float));
+                    memset(s->conv_state + (size_t)only_layer * qkv_dim * conv_buf_len,
+                           0, (size_t)qkv_dim * conv_buf_len * sizeof(float));
+                } else {
+                    size_t total_delta = (size_t)c->n_layers * dn * dk * dv;
+                    memset(s->delta_state, 0, total_delta * sizeof(float));
+                    size_t total_conv = (size_t)c->n_layers * qkv_dim * conv_buf_len;
+                    memset(s->conv_state, 0, total_conv * sizeof(float));
+                }
                 if (getenv("TQ_DEBUG"))
-                    fprintf(stderr, "[delta-reset] call=%d\n", _delta_call_count);
+                    fprintf(stderr, "[delta-reset] call=%d layer=%d\n",
+                            _delta_call_count, only_layer);
             }
         }
     }