bitsandbytes-foundation · matthewdouglas · May 7, 2026 · May 7, 2026
diff --git a/docs/source/installation.mdx b/docs/source/installation.mdx
@@ -66,6 +66,14 @@ Use `pip` or `uv` to install the latest release:
 pip install bitsandbytes
 ```
 
+> [!WARNING]
+> **NVIDIA Jetson (L4T / JetPack) — source build required.** The `Linux aarch64` wheels above are built on aarch64-sbsa runners (server-class ARM with the standard CUDA Toolkit). They are **not compatible** with the L4T runtime on Jetson devices (Orin Nano / NX / AGX, Xavier, Thor on CUDA 12), even though both are aarch64 and even though the cubins are binary-compatible with the device's compute capability (e.g., `sm_80` cubin runs on `sm_87` hardware via Ampere-family binary compat — see [NVIDIA's docs on binary compatibility](https://developer.nvidia.com/blog/understanding-ptx-the-assembly-language-of-cuda-gpu-computing/#binary_compatibility)). The mismatch is at the CUDA library / ABI layer (JetPack ships its own CUDA Toolkit and system libraries), and surfaces as a runtime symbol-resolution error like `Error named symbol not found in /src/csrc/ops.cu` on the first CUDA op.
+>
+> **Two working options on Jetson:**
+>
+> 1. **Source build on-device.** Use the [Compile from Source](#cuda-compile) instructions below, passing your device's compute capability explicitly (sm_87 for Orin family, sm_72 for Xavier). On an Orin Nano Super: `cmake -DCOMPUTE_BACKEND=cuda -DCOMPUTE_CAPABILITY=87 . && make -j4 && pip install .`
+> 2. **Third-party prebuilt** from [Jetson AI Lab's package index](https://pypi.jetson-ai-lab.io/) (e.g., `pypi.jetson-ai-lab.io/jp6/cu126/bitsandbytes/`).
+
 ### Compile from Source[[cuda-compile]]
 
 > [!TIP]