Skip to content

[Windows] RTX 5070 Ti (Blackwell sm_120) - quantization support missing #1937

@loongmiaow-pixel

Description

@loongmiaow-pixel

Environment

  • GPU: NVIDIA GeForce RTX 5070 Ti Laptop GPU (Blackwell, compute capability 12.0)
  • Driver: 595.79 (CUDA 13.2)
  • OS: Windows 11
  • Python: 3.14
  • PyTorch: 2.11.0+cu130

Problem

bitsandbytes does not work on RTX 5070 Ti out of the box:

  1. sm_120 not in supported architecture list - CUDA kernels not compiled for Blackwell
  2. CUDA 13.0+ runtime - Driver 595.79 reports CUDA 13.2, bitsandbytes may need CUDA 13.x compatible wheels
  3. Import error: RuntimeError: CUDA error: no kernel image is available for execution on the device when loading 4-bit/8-bit quantized models
  4. Windows support - bitsandbytes Windows wheels are limited; custom builds require CUDA Toolkit 12.x which conflicts with CUDA 13.0+ driver requirements

Workaround

Currently, quantization must be done on a supported GPU (Ampere/Hopper) and the quantized weights transferred to the RTX 5070 Ti for inference. This is not ideal for Windows users who want end-to-end local quantization.

Question

Is there a roadmap for Blackwell (sm_120) support in bitsandbytes? RTX 5070/5080/5090 are the first consumer Blackwell GPUs and many users will want to quantize models locally.

Happy to help test sm_120 CUDA kernel compilation if there's a development branch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    CUDAIssues and PRs related to the CUDA backend, excluding installation/support help.Waiting for InfoWindows

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions