Skip to content

Universal CUDA/ROCm wheel via runtime platform detection#78

Merged
takeshi-yoshimura merged 1 commit into
foundation-model-stack:mainfrom
wjabbour:universal-wheel
May 22, 2026
Merged

Universal CUDA/ROCm wheel via runtime platform detection#78
takeshi-yoshimura merged 1 commit into
foundation-model-stack:mainfrom
wjabbour:universal-wheel

Conversation

@wjabbour
Copy link
Copy Markdown
Contributor

@wjabbour wjabbour commented May 19, 2026

Summary

  • Replaces compile-time ROCM_PATH / USE_ROCM platform selection with runtime dlopen detection
  • On startup, tries libcudart.so first, then libamdhip64.so — whichever succeeds sets the GPU function pointers
  • is_cuda_found() and is_hip_found() are now real runtime queries (both always exported), not compile-time constants
  • is_gds_supported() uses the runtime is_hip_runtime flag instead of #ifdef USE_ROCM
  • Renames cuda_compat.hgpu_compat.h with updated include guards and two parallel CUDA_SYM_* / HIP_SYM_* define sets (no #ifdef guards)
  • Removes detect_platform() from setup.py — single build works on both platforms
  • Adds missing is_hip_found() stub to cpp.pyi
  • Updates README ROCm install instructions (no longer need ROCM_PATH= prefix)

Test plan

  • Built and installed locally on ROCm machine (RDNA 4)
  • is_cuda_found()False, is_hip_found()True on ROCm
  • Loaded Qwen3-0.6B (311 tensors, single shard) to cuda:0
  • Loaded Hermes-3-Llama-3.1-8B (291 tensors, 4 shards) to cuda:0 in 9.5s
  • vLLM --load-format fastsafetensors with ibm-granite/granite-3.3-2b-instruct on ROCm (weights loaded in 0.43s, server fully up)
  • CUDA machine smoke test (CI)

Closes #68

🤖 Generated with Claude Code

@wjabbour wjabbour marked this pull request as draft May 19, 2026 04:39
@wjabbour wjabbour marked this pull request as ready for review May 21, 2026 02:42
@wjabbour wjabbour force-pushed the universal-wheel branch 2 times, most recently from 5eaf1c9 to 51e25fe Compare May 22, 2026 01:48
…latforms

Signed-off-by: Turner Jabbour <doubleujabbour@gmail.com>
@wjabbour
Copy link
Copy Markdown
Contributor Author

@takeshi-yoshimura Hey Takeshi, this PR is good for review!

@takeshi-yoshimura takeshi-yoshimura merged commit 799a5de into foundation-model-stack:main May 22, 2026
12 checks passed
@takeshi-yoshimura
Copy link
Copy Markdown
Collaborator

@wjabbour
Cool! This is exactly what I envisioned while ago to support both ROCm and CUDA within a single package. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal] Universal wheel: runtime CUDA/ROCm detection to eliminate separate builds

2 participants