Skip to content

Commit a05d4e4

Browse files
unamedkrclaude
andcommitted
debug(deltanet): R18 null — a_log formula IS correct for Unsloth GGUF
Tested removing our -expf(delta_a_log) on hypothesis the GGUF stored pre-transformed -exp(A_log) like llama.cpp kimi-linear.cpp:142 suggests. Result: "Paris" factual probe collapses to "T" (immediate garbage) on Qwen3.6-35B IQ4_XS. Rolled back. Conclusion: this Unsloth UD-IQ4_XS GGUF stores RAW A_log (Unsloth's conversion differs from kimi-linear's convert_hf_to_gguf.py convention). Our -expf(delta_a_log) is correct for THIS gguf. L0's outlier DeltaNet state norm (~155) is by design — heads with large negative a_log have weak decay intentionally. Not a kernel bug. The 117-tok loop cause is elsewhere in the DeltaNet path. TQ_DELTA_PROBE and TQ_DELTA_RESET_EVERY envs remain available for future ablations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d1c6057 commit a05d4e4

1 file changed

Lines changed: 30 additions & 0 deletions

File tree

.claude/state.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,36 @@
33
**Last updated**: 2026-04-21 (Phase 1 refparity ★)
44
**Session HEAD**: Reference-parity framework (tools/refparity/) LANDED — HF vs engine per-layer diff, pos-aligned, post_norm-aware.
55

6+
## Phase 1 R18 — False alarm on a_log double-transform (2026-04-21)
7+
8+
Dug into `ssm_a` values to test whether our `-expf(delta_a_log)` was a
9+
double-transform:
10+
11+
| layer | ssm_a range | mean |
12+
|---|---|---:|
13+
| L0 | [-72.33, -0.02] | -10.84 |
14+
| L1 | [-27.03, -0.07] | -2.67 |
15+
16+
llama.cpp qwen35moe.cpp:238 uses `ssm_a` directly with comment "-A_log.exp()
17+
* softplus", and kimi-linear.cpp:142 explicitly says "No need to -exp(a_log)
18+
because it was done in convert_hf_to_gguf.py" — suggesting GGUF should
19+
already be pre-transformed.
20+
21+
**Ablation**: removed our `-expf(a_log)` → gate = softplus × a direct.
22+
Result: 35B output collapses to garbage immediately. "Paris" probe fails.
23+
24+
**Conclusion**: this Unsloth UD-IQ4_XS GGUF stores RAW A_log, NOT the
25+
pre-transformed `-exp()`. Our engine's `-expf(delta_a_log)` is correct for
26+
this GGUF. Rolled back the attempted "fix".
27+
28+
L0 steady-state norm ~155 is by design (heads with large a_log have weak
29+
decay intentionally). Not a kernel bug.
30+
31+
**Next hypothesis**: R16 proved SOMETHING in DeltaNet state matters for the
32+
117-token loop, but R17's L0-outlier signature is design, not bug. Re-run
33+
ablation with TQ_DELTA_RESET selectively per-layer to find the actual
34+
causal layer. Also consider KV cache pattern at drift boundary.
35+
636
## ★★ Phase 1 R17 — L0 DeltaNet state is 10× the others (2026-04-21) ★★
737

838
Added `TQ_DELTA_PROBE=pos1,pos2,...` env in `deltanet_forward` to dump

0 commit comments

Comments
 (0)