From 655b04e47a2078ce24993d6a561ac1c68992a562 Mon Sep 17 00:00:00 2001
From: neil-the-nowledgable
 <254185769+neil-the-nowledgable@users.noreply.github.com>
Date: Thu, 7 May 2026 14:46:58 -0400
Subject: [PATCH] docs(install): note aarch64 wheels are aarch64-sbsa, not L4T
 (Jetson)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds a WARNING callout after the Linux aarch64 row of the PyPI build-targets
table, explaining that:

1. Wheels are built on aarch64-sbsa runners (standard CUDA Toolkit), not the
   L4T / JetPack runtime that Jetson Orin / Xavier / Thor (on CUDA 12) use.
2. The mismatch surfaces as 'Error named symbol not found in /src/csrc/ops.cu'
   on the first CUDA op — a symbol-resolution error, NOT a kernel-image-for-
   device error. The cubins ARE binary-compatible with the device per
   Ampere-family binary compat (sm_80 SASS runs on sm_87 hardware natively).
3. Working options on Jetson: on-device source build, or third-party prebuilt
   from Jetson AI Lab.

References #1218 and #1930 for the original error reports, and #1939 for the
empirical confirmation that the fault is the toolchain delta, not the arch
list (sm_80-only cubin built on-device runs cleanly on sm_87 hardware).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/source/installation.mdx | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/docs/source/installation.mdx b/docs/source/installation.mdx
index b9d48603a..d9d5ad315 100644
--- a/docs/source/installation.mdx
+++ b/docs/source/installation.mdx
@@ -66,6 +66,14 @@ Use `pip` or `uv` to install the latest release:
 pip install bitsandbytes
 ```
 
+> [!WARNING]
+> **NVIDIA Jetson (L4T / JetPack) — source build required.** The `Linux aarch64` wheels above are built on aarch64-sbsa runners (server-class ARM with the standard CUDA Toolkit). They are **not compatible** with the L4T runtime on Jetson devices (Orin Nano / NX / AGX, Xavier, Thor on CUDA 12), even though both are aarch64 and even though the cubins are binary-compatible with the device's compute capability (e.g., `sm_80` cubin runs on `sm_87` hardware via Ampere-family binary compat — see [NVIDIA's docs on binary compatibility](https://developer.nvidia.com/blog/understanding-ptx-the-assembly-language-of-cuda-gpu-computing/#binary_compatibility)). The mismatch is at the CUDA library / ABI layer (JetPack ships its own CUDA Toolkit and system libraries), and surfaces as a runtime symbol-resolution error like `Error named symbol not found in /src/csrc/ops.cu` on the first CUDA op.
+>
+> **Two working options on Jetson:**
+>
+> 1. **Source build on-device.** Use the [Compile from Source](#cuda-compile) instructions below, passing your device's compute capability explicitly (sm_87 for Orin family, sm_72 for Xavier). On an Orin Nano Super: `cmake -DCOMPUTE_BACKEND=cuda -DCOMPUTE_CAPABILITY=87 . && make -j4 && pip install .`
+> 2. **Third-party prebuilt** from [Jetson AI Lab's package index](https://pypi.jetson-ai-lab.io/) (e.g., `pypi.jetson-ai-lab.io/jp6/cu126/bitsandbytes/`).
+
 ### Compile from Source[[cuda-compile]]
 
 > [!TIP]