You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Vulkan GPU auto-activation for KV cache operations
When built with TQ_BUILD_VULKAN=ON and a Vulkan device is available,
KV cache quantize/attention functions are automatically routed to GPU
compute shaders via runtime traits override.
Changes:
- tools/quant.c: call tq_init_vulkan_backend() on startup
- tq_vulkan_init.c: add tq_vulkan_override_traits() — replaces CPU
function pointers in TQ_TRAITS[] with Vulkan GPU versions
- tq_traits.c: make TQ_TRAITS[] non-const for runtime override
- tq_types.h: update extern declaration to match
The full forward pass (matmul, FFN, norms) still runs on CPU.
Vulkan handles KV quantize + dequant + attention kernels.
34/34 tests passing.
Addresses #9
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0 commit comments