[Windows] RTX 5070 Ti (Blackwell sm_120) - quantization support missing

### Environment
- GPU: NVIDIA GeForce RTX 5070 Ti Laptop GPU (Blackwell, compute capability 12.0)
- Driver: 595.79 (CUDA 13.2)
- OS: Windows 11
- Python: 3.14
- PyTorch: 2.11.0+cu130

### Problem
bitsandbytes does not work on RTX 5070 Ti out of the box:

1. **sm_120 not in supported architecture list** - CUDA kernels not compiled for Blackwell
2. **CUDA 13.0+ runtime** - Driver 595.79 reports CUDA 13.2, bitsandbytes may need CUDA 13.x compatible wheels
3. **Import error**: `RuntimeError: CUDA error: no kernel image is available for execution on the device` when loading 4-bit/8-bit quantized models
4. **Windows support** - bitsandbytes Windows wheels are limited; custom builds require CUDA Toolkit 12.x which conflicts with CUDA 13.0+ driver requirements

### Workaround
Currently, quantization must be done on a supported GPU (Ampere/Hopper) and the quantized weights transferred to the RTX 5070 Ti for inference. This is not ideal for Windows users who want end-to-end local quantization.

### Question
Is there a roadmap for Blackwell (sm_120) support in bitsandbytes? RTX 5070/5080/5090 are the first consumer Blackwell GPUs and many users will want to quantize models locally.

Happy to help test sm_120 CUDA kernel compilation if there's a development branch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Windows] RTX 5070 Ti (Blackwell sm_120) - quantization support missing #1937

Environment

Problem

Workaround

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Windows] RTX 5070 Ti (Blackwell sm_120) - quantization support missing #1937

Description

Environment

Problem

Workaround

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions