docs(install): note aarch64 wheels are aarch64-sbsa, not L4T (Jetson)#1941
Merged
matthewdouglas merged 1 commit intobitsandbytes-foundation:mainfrom May 7, 2026
Conversation
Adds a WARNING callout after the Linux aarch64 row of the PyPI build-targets table, explaining that: 1. Wheels are built on aarch64-sbsa runners (standard CUDA Toolkit), not the L4T / JetPack runtime that Jetson Orin / Xavier / Thor (on CUDA 12) use. 2. The mismatch surfaces as 'Error named symbol not found in /src/csrc/ops.cu' on the first CUDA op — a symbol-resolution error, NOT a kernel-image-for- device error. The cubins ARE binary-compatible with the device per Ampere-family binary compat (sm_80 SASS runs on sm_87 hardware natively). 3. Working options on Jetson: on-device source build, or third-party prebuilt from Jetson AI Lab. References bitsandbytes-foundation#1218 and bitsandbytes-foundation#1930 for the original error reports, and bitsandbytes-foundation#1939 for the empirical confirmation that the fault is the toolchain delta, not the arch list (sm_80-only cubin built on-device runs cleanly on sm_87 hardware). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
matthewdouglas
approved these changes
May 7, 2026
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a short WARNING callout to
docs/source/installation.mdxright after the Linux aarch64 row of the PyPI build-targets table, telling users that aarch64 wheels are built on aarch64-sbsa (server ARM with the standard CUDA Toolkit) — not the L4T / JetPack runtime that Jetson Orin / Xavier / Thor on CUDA 12 use. Points readers at the working options: on-device source build, or Jetson AI Lab's prebuilt index.Why
Per @matthewdouglas's review on #1939 and the original report at #1930: the failure mode is a CUDA symbol-resolution error from the toolchain mismatch (aarch64-sbsa CUDA libs vs L4T-bundled CUDA libs), not a missing-arch kernel-image issue. sm_80 cubins are binary-compatible with sm_87 hardware via Ampere-family binary compat — empirically confirmed in the #1939 thread (built bnb with
-DCOMPUTE_CAPABILITY=80only, verifiedcuobjdump --list-elfshows onlysm_80.cubin, ran cleanly on sm_87 Jetson Orin).Documenting the workaround is the lowest-effort thing that helps the next person hitting the symbol-not-found error find the right answer (source-build or JAL prebuilt) without chasing the arch list as I originally did.
Scope
docs/source/installation.mdx> [!WARNING]callout placed right after the Linux aarch64 PyPI row (line 67)Test plan
pre-commit run --files docs/source/installation.mdxpasses (typos checker confirms no typos in the new prose; format hooks pass)> [!WARNING]syntax matching existing usage in the same fileReferences