Commit 65e4a2d
debug(deltanet): TQ_DELTA_RESET_LAYER — per-layer reset ablation
Bisection result on Qwen3.6-35B 117-tok repetition loop:
reset L0 only @ call=120: STILL loops at 117 ("anything"→"math")
reset L8 only @ call=120: STILL loops at 117
reset L20 only @ call=120: STILL loops at 117
reset L38 only @ call=120: STILL loops at 117
reset ALL @ call=120: breaks loop (R16 baseline, different output)
Conclusion: the drift pathology is distributed across all 30 DeltaNet
layers, not localized to any single layer. No one-liner fix.
Keep the diagnostic envs for future reference-port work. Strategic
direction: either a 4B-class DeltaNet HF reference for full-parity diff,
or a reimplementation port from the PyTorch reference.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent a05d4e4 commit 65e4a2d
2 files changed
Lines changed: 50 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
6 | 38 | | |
7 | 39 | | |
8 | 40 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
662 | 662 | | |
663 | 663 | | |
664 | 664 | | |
665 | | - | |
666 | | - | |
667 | | - | |
668 | | - | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
669 | 681 | | |
670 | | - | |
| 682 | + | |
| 683 | + | |
671 | 684 | | |
672 | 685 | | |
673 | 686 | | |
| |||
0 commit comments