You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
New TQ_TYPE_TURBO_KV_5B following the Variant F architecture (single-stage
RHT + Lloyd-Max codebook + ‖x‖, no QJL). 32-level codebook adds one bit of
precision per element vs turbo_kv_4b for the cost of 16 bytes per block.
Llama 3.2 3B PPL on bench/data/ppl_1k.txt (FP32 baseline = 13.56):
turbo_kv_4b 14.28 (+5.3%)
turbo_kv_5b 13.60 (+0.34%) ← near-lossless ⭐
SmolLM2 135M PPL (FP32 baseline = 18.62):
turbo_kv_4b 19.70 (+5.8%)
turbo_kv_5b 18.94 (+1.7%)
Block layout (88 bytes, vs 72 for 4b):
norm(2) + residual_norm(2) + inv_std(2) + _pad(2) + mse_5bit(80)
128 elements * 5 bits = 640 bits = 80 bytes for indices
Changes:
- tq_codebook.c: extend codebook table to b=5, add 32 Lloyd-Max-Gaussian
centroids (Max 1960 Table I), bounds check 1..5
- tq_types.h: TQ_TYPE_TURBO_KV_5B enum, block_tq_turbo_kv_5b struct,
size assertion
- tq_turbo_kv.c: pack_5bit/unpack_5bit helpers (5 bits/element, LSB-first
bit-stream packing), quantize/dequantize/attention impls following the
same Variant F pattern
- tq_traits.c: register TQ_TRAITS[TQ_TYPE_TURBO_KV_5B], add format spec case
- tools/quant.c: CLI parser accepts -k turbo_kv_5b
- integrations/llamacpp/tq_kv_cache.cpp: GGML_TYPE_TQ_TURBO_KV_5B + table
entry + wrappers + count bump
- tests/test_turbo_kv.cpp: FormatSpec test updated to drop the
HAS_RESIDUAL assertion (Variant F removed it from 3b/4b too)
All 35 tests pass.
Closes one of the follow-ups in issue #15.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments