Commit da825bf
fix(chat-cache): comprehensive audit — 7 hidden bugs eliminated
After PRs #48-#51 the chat KV cache reuse path was a complex multi-layer
system. Audited every code path for hidden bugs and fixed all of them.
## Bugs found and fixed
1. **Slow-path fallback corrupted KV state** [P0]
tq_generate_chat_text's overflow fallback called tq_generate_continue
on the SAME state that already had old KV at positions [0..prefix_pos).
New prefill would write [0..n_new) leaving stale [n_new..prefix_pos)
that subsequent generation might read. Replaced with -2 return code:
the caller decides (server returns HTTP 413, WASM auto-resets the
chat and shows a status message).
2. **WASM reset_chat partial cleanup** [P1]
wasm_reset_chat called quant_chat(NULL) but did not reset
g_output_pos / g_output[0] / g_stream_count, so the next generation
would append to stale text from the previous chat. Now resets all.
3. **wasm_generate (sync path) missed g_stream_count reset** [P1]
The async path zeroed it, the sync path did not. Aligned both.
4. **Wheel header _quant.h stale** [P0]
bindings/python/quantcpp/_quant.h is .gitignore'd and the next pip
build would have used quant.h from before PR #51 (no
tq_generate_chat_text). Synced to current quant.h.
5. **Overflow surface — WASM** [P1]
Added n == -2 detection in wasm_generate / wasm_generate_async.
Auto-reset chat and call js_on_status with a clear error message
so the JS side can show "Context full — chat reset".
6. **Overflow surface — server** [P1]
Added gen_rc == -2 detection in both streaming and non-streaming
handlers. Server resets the session's KV state + cached_text + tokens
and returns HTTP 413 with an OpenAI-compatible error JSON.
7. **tq_generate_continue cached_text drift documentation** [P2]
Added a header comment explaining that tq_generate_continue is the
lower-level API and doesn't track cached_text. Higher-level callers
must use tq_generate_chat_text for cached_text safety.
## Audited but safe
- Server session concurrency: get_or_create_session is called inside
inference_mutex, so LRU bookkeeping is serialized.
- json_extract_string buffer safety: respects buf_size - 1 bound.
- WASM g_output overflow: tokens dropped from local buffer but
js_on_token still fires, so JS side gets all output. Acceptable.
## Verified end-to-end
alice/bob interleaved 5 turns each (real assistant replay):
alice: 339 → 514 ms (~50 ms/turn growth from O(n) attention)
bob: 310 → 518 ms (similar)
No regressions; all turns hit the FAST text-prefix path after turn 1.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 49c6605 commit da825bf
5 files changed
Lines changed: 89 additions & 23 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15943 | 15943 | | |
15944 | 15944 | | |
15945 | 15945 | | |
| 15946 | + | |
| 15947 | + | |
| 15948 | + | |
| 15949 | + | |
15946 | 15950 | | |
15947 | 15951 | | |
15948 | 15952 | | |
15949 | 15953 | | |
15950 | | - | |
15951 | | - | |
15952 | | - | |
| 15954 | + | |
| 15955 | + | |
| 15956 | + | |
| 15957 | + | |
| 15958 | + | |
15953 | 15959 | | |
15954 | | - | |
15955 | | - | |
15956 | | - | |
15957 | | - | |
15958 | | - | |
| 15960 | + | |
15959 | 15961 | | |
15960 | 15962 | | |
15961 | 15963 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
603 | 603 | | |
604 | 604 | | |
605 | 605 | | |
606 | | - | |
| 606 | + | |
607 | 607 | | |
608 | 608 | | |
609 | 609 | | |
610 | 610 | | |
611 | 611 | | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
612 | 617 | | |
613 | 618 | | |
614 | 619 | | |
| |||
918 | 923 | | |
919 | 924 | | |
920 | 925 | | |
921 | | - | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
922 | 937 | | |
923 | 938 | | |
924 | | - | |
925 | 939 | | |
926 | 940 | | |
927 | | - | |
928 | | - | |
929 | | - | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
930 | 946 | | |
931 | | - | |
932 | | - | |
933 | | - | |
934 | | - | |
935 | | - | |
936 | | - | |
| 947 | + | |
937 | 948 | | |
938 | 949 | | |
939 | 950 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
779 | 779 | | |
780 | 780 | | |
781 | 781 | | |
782 | | - | |
| 782 | + | |
783 | 783 | | |
784 | 784 | | |
785 | 785 | | |
786 | 786 | | |
787 | 787 | | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
788 | 799 | | |
789 | 800 | | |
790 | 801 | | |
| |||
817 | 828 | | |
818 | 829 | | |
819 | 830 | | |
820 | | - | |
| 831 | + | |
821 | 832 | | |
822 | 833 | | |
823 | 834 | | |
824 | 835 | | |
825 | 836 | | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
826 | 855 | | |
827 | 856 | | |
828 | 857 | | |
| |||
Binary file not shown.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
102 | 113 | | |
103 | 114 | | |
104 | 115 | | |
| |||
107 | 118 | | |
108 | 119 | | |
109 | 120 | | |
110 | | - | |
| 121 | + | |
111 | 122 | | |
112 | 123 | | |
113 | 124 | | |
| |||
116 | 127 | | |
117 | 128 | | |
118 | 129 | | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
119 | 138 | | |
120 | 139 | | |
121 | 140 | | |
| |||
125 | 144 | | |
126 | 145 | | |
127 | 146 | | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
128 | 152 | | |
129 | 153 | | |
130 | 154 | | |
| |||
0 commit comments