Environment
- GPU: NVIDIA GeForce RTX 5070 Ti Laptop GPU (Blackwell, compute capability 12.0)
- Driver: 595.79 (CUDA 13.2)
- OS: Windows 11
- Python: 3.14
- PyTorch: 2.11.0+cu130
Problem
bitsandbytes does not work on RTX 5070 Ti out of the box:
- sm_120 not in supported architecture list - CUDA kernels not compiled for Blackwell
- CUDA 13.0+ runtime - Driver 595.79 reports CUDA 13.2, bitsandbytes may need CUDA 13.x compatible wheels
- Import error:
RuntimeError: CUDA error: no kernel image is available for execution on the device when loading 4-bit/8-bit quantized models
- Windows support - bitsandbytes Windows wheels are limited; custom builds require CUDA Toolkit 12.x which conflicts with CUDA 13.0+ driver requirements
Workaround
Currently, quantization must be done on a supported GPU (Ampere/Hopper) and the quantized weights transferred to the RTX 5070 Ti for inference. This is not ideal for Windows users who want end-to-end local quantization.
Question
Is there a roadmap for Blackwell (sm_120) support in bitsandbytes? RTX 5070/5080/5090 are the first consumer Blackwell GPUs and many users will want to quantize models locally.
Happy to help test sm_120 CUDA kernel compilation if there's a development branch.
Environment
Problem
bitsandbytes does not work on RTX 5070 Ti out of the box:
RuntimeError: CUDA error: no kernel image is available for execution on the devicewhen loading 4-bit/8-bit quantized modelsWorkaround
Currently, quantization must be done on a supported GPU (Ampere/Hopper) and the quantized weights transferred to the RTX 5070 Ti for inference. This is not ideal for Windows users who want end-to-end local quantization.
Question
Is there a roadmap for Blackwell (sm_120) support in bitsandbytes? RTX 5070/5080/5090 are the first consumer Blackwell GPUs and many users will want to quantize models locally.
Happy to help test sm_120 CUDA kernel compilation if there's a development branch.