From 655b04e47a2078ce24993d6a561ac1c68992a562 Mon Sep 17 00:00:00 2001 From: neil-the-nowledgable <254185769+neil-the-nowledgable@users.noreply.github.com> Date: Thu, 7 May 2026 14:46:58 -0400 Subject: [PATCH] docs(install): note aarch64 wheels are aarch64-sbsa, not L4T (Jetson) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a WARNING callout after the Linux aarch64 row of the PyPI build-targets table, explaining that: 1. Wheels are built on aarch64-sbsa runners (standard CUDA Toolkit), not the L4T / JetPack runtime that Jetson Orin / Xavier / Thor (on CUDA 12) use. 2. The mismatch surfaces as 'Error named symbol not found in /src/csrc/ops.cu' on the first CUDA op — a symbol-resolution error, NOT a kernel-image-for- device error. The cubins ARE binary-compatible with the device per Ampere-family binary compat (sm_80 SASS runs on sm_87 hardware natively). 3. Working options on Jetson: on-device source build, or third-party prebuilt from Jetson AI Lab. References #1218 and #1930 for the original error reports, and #1939 for the empirical confirmation that the fault is the toolchain delta, not the arch list (sm_80-only cubin built on-device runs cleanly on sm_87 hardware). Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/source/installation.mdx | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/docs/source/installation.mdx b/docs/source/installation.mdx index b9d48603a..d9d5ad315 100644 --- a/docs/source/installation.mdx +++ b/docs/source/installation.mdx @@ -66,6 +66,14 @@ Use `pip` or `uv` to install the latest release: pip install bitsandbytes ``` +> [!WARNING] +> **NVIDIA Jetson (L4T / JetPack) — source build required.** The `Linux aarch64` wheels above are built on aarch64-sbsa runners (server-class ARM with the standard CUDA Toolkit). They are **not compatible** with the L4T runtime on Jetson devices (Orin Nano / NX / AGX, Xavier, Thor on CUDA 12), even though both are aarch64 and even though the cubins are binary-compatible with the device's compute capability (e.g., `sm_80` cubin runs on `sm_87` hardware via Ampere-family binary compat — see [NVIDIA's docs on binary compatibility](https://developer.nvidia.com/blog/understanding-ptx-the-assembly-language-of-cuda-gpu-computing/#binary_compatibility)). The mismatch is at the CUDA library / ABI layer (JetPack ships its own CUDA Toolkit and system libraries), and surfaces as a runtime symbol-resolution error like `Error named symbol not found in /src/csrc/ops.cu` on the first CUDA op. +> +> **Two working options on Jetson:** +> +> 1. **Source build on-device.** Use the [Compile from Source](#cuda-compile) instructions below, passing your device's compute capability explicitly (sm_87 for Orin family, sm_72 for Xavier). On an Orin Nano Super: `cmake -DCOMPUTE_BACKEND=cuda -DCOMPUTE_CAPABILITY=87 . && make -j4 && pip install .` +> 2. **Third-party prebuilt** from [Jetson AI Lab's package index](https://pypi.jetson-ai-lab.io/) (e.g., `pypi.jetson-ai-lab.io/jp6/cu126/bitsandbytes/`). + ### Compile from Source[[cuda-compile]] > [!TIP]