Commit 58a9d48
fix(quant.h): sync BPE encode/decode UTF-8 fix from split-source
quant.h single-header had the same two BPE bugs as split-source (tq_tokenizer.c):
- decode: codepoints U+00A1-U+00FF emitted as raw UTF-8 bytes → double encoding
- encode: direct bytes ≥ 0x80 emitted as raw byte → invalid UTF-8 mismatch
Synced both fixes. scripts/check_sync.sh passes. single_header_example
builds and runs clean.
See commits 9c53491 (decode) and 58d3925 (encode) for root-cause analysis.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 58d3925 commit 58a9d48
1 file changed
Lines changed: 25 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8279 | 8279 | | |
8280 | 8280 | | |
8281 | 8281 | | |
| 8282 | + | |
| 8283 | + | |
| 8284 | + | |
| 8285 | + | |
| 8286 | + | |
| 8287 | + | |
| 8288 | + | |
8282 | 8289 | | |
8283 | | - | |
| 8290 | + | |
8284 | 8291 | | |
8285 | 8292 | | |
8286 | 8293 | | |
| |||
8327 | 8334 | | |
8328 | 8335 | | |
8329 | 8336 | | |
8330 | | - | |
8331 | | - | |
8332 | | - | |
| 8337 | + | |
| 8338 | + | |
| 8339 | + | |
| 8340 | + | |
| 8341 | + | |
| 8342 | + | |
| 8343 | + | |
| 8344 | + | |
| 8345 | + | |
| 8346 | + | |
| 8347 | + | |
| 8348 | + | |
| 8349 | + | |
| 8350 | + | |
| 8351 | + | |
| 8352 | + | |
| 8353 | + | |
8333 | 8354 | | |
8334 | 8355 | | |
8335 | 8356 | | |
| |||
0 commit comments