Skip to content

Commit b6f2344

Browse files
cpcloudcursoragentrwgk
authored
feat(pathfinder): add CTK root canary probe for non-standard-path libs (#1595)
* feat(pathfinder): add CTK root canary probe for non-standard-path libs Libraries like nvvm whose shared object lives in a subdirectory (/nvvm/lib64/) that is not on the system linker path cannot be found via bare dlopen on system CTK installs without CUDA_HOME. Add a "canary probe" search step: when direct system search fails, system-load a well-known CTK lib that IS on the linker path (cudart), derive the CTK installation root from its resolved path, and look for the target lib relative to that root via the existing anchor-point logic. The mechanism is generic -- any future lib with a non-standard path just needs its entry in _find_lib_dir_using_anchor_point. The canary probe is intentionally placed after CUDA_HOME in the search cascade to preserve backward compatibility: users who have CUDA_HOME set expect it to be authoritative, and existing code relying on that ordering should not silently change behavior. Co-authored-by: Cursor <cursoragent@cursor.com> * style(pathfinder): update copyright header date in test file Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(pathfinder): use pytest-mock instead of unittest.mock in tests Co-authored-by: Cursor <cursoragent@cursor.com> * chore: fix typing * fix(pathfinder): make CTK root discovery tests platform-aware Tests that create fake CTK directory layouts were hardcoded to Linux paths (lib64/, libnvvm.so) and failed on Windows where the code expects Windows layouts (bin/, nvvm64.dll). Extract platform-aware helpers (_create_nvvm_in_ctk, _create_cudart_in_ctk, _fake_canary_path) that create the right layout and filenames based on IS_WINDOWS. Co-authored-by: Cursor <cursoragent@cursor.com> * chore: style * fix(pathfinder): normalize paths from _find_lib_dir_using_anchor_point The rel_paths for nvvm use forward slashes (e.g. "nvvm/bin") which os.path.join on Windows doesn't normalize, producing mixed-separator paths like "...\nvvm/bin\nvvm64.dll". Apply os.path.normpath to the returned directory so all separators are consistent. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(pathfinder): isolate CTK canary probe in subprocess Resolve CTK canary absolute paths in a spawned Python process so probing cudart does not mutate loader state in the caller process while preserving the nvvm discovery fallback order. Keep JSON as the child-to-parent wire format because it cleanly represents both path and no-result states and avoids fragile stdout/path parsing across platforms. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(pathfinder): satisfy pre-commit typing for canary probe Make canary subprocess path extraction explicitly typed and validated so mypy does not treat platform-specific loader results as Any while keeping probe behavior unchanged. Keep import ordering aligned with Ruff so pre-commit is green. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(pathfinder): use spawn isolation for CTK canary probing Switch canary path resolution from subprocess.run to a shared multiprocessing spawn runner so child probes do not inherit potentially preloaded CUDA libraries from a forked parent. Reuse that runner from tests to keep one implementation for spawned process behavior. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(pathfinder): satisfy pre-commit for spawned runner utilities Add the missing type annotations required by mypy and keep the test shim exporting only the runner entry point so lint checks pass cleanly. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(pathfinder): fail fast on canary probe child errors Treat only a missing canary library as a recoverable probe miss and surface all other child-process failures immediately. This prevents development bugs from being silently masked as normal CTK canary fallback behavior. Co-authored-by: Cursor <cursoragent@cursor.com> * canary_probe_subprocess.py: remove unused main() * refactor(pathfinder): simplify spawned runner usage in tests Remove the tests-only re-export shim and import the shared spawned-process runner directly from pathfinder utils. This makes it more obvious that there is only one implementation. Co-authored-by: Cursor <cursoragent@cursor.com> * Extend copyright date back to original code. * test(pathfinder): assert canary probe subprocess rethrows failures Lock in the fail-fast canary policy by asserting the subprocess runner is invoked with rethrow enabled. This guards against accidental regressions that would silently downgrade child-process errors to probe misses. Co-authored-by: Cursor <cursoragent@cursor.com> * docs(pathfinder): clarify why canary probe must use spawn Document that the canary probe runs in a spawned (not forked) child process so it starts from a clean interpreter state without inherited preloaded CUDA libraries. This explains why spawn is required for an independent system-search probe. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(pathfinder): simplify CTK canary fallback scope Centralize canary configuration in supported_nvidia_libs and gate CTK-root canary probing to discoverable libnames (currently nvvm). This avoids unnecessary canary subprocess work for other libraries and locks the behavior with a focused regression test. Co-authored-by: Cursor <cursoragent@cursor.com> * chore(pathfinder): cache canary anchor probe lookups Cache canary anchor-path resolution to avoid redundant spawned probe work while preserving retry-on-exception behavior. This is mostly a completeness/quality-of-implementation improvement today since only one discoverable lib currently uses the canary path, and tests now clear the cache per case to preserve isolation. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(pathfinder): derive CTK root from Linux targets layouts Handle cudart paths under targets/<triple>/lib{,64} when deriving CTK root for canary-based nvvm discovery. Add focused regression tests for both lib64 and lib variants. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com> Co-authored-by: Ralf W. Grosse-Kunstleve <rwgkio@gmail.com>
1 parent 67251a3 commit b6f2344

9 files changed

Lines changed: 576 additions & 13 deletions
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
#!/usr/bin/env python
2+
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
# SPDX-License-Identifier: Apache-2.0
4+
5+
import json
6+
7+
from cuda.pathfinder._dynamic_libs.load_dl_common import DynamicLibNotFoundError, LoadedDL
8+
from cuda.pathfinder._utils.platform_aware import IS_WINDOWS
9+
10+
if IS_WINDOWS:
11+
from cuda.pathfinder._dynamic_libs.load_dl_windows import load_with_system_search
12+
else:
13+
from cuda.pathfinder._dynamic_libs.load_dl_linux import load_with_system_search
14+
15+
16+
def _probe_canary_abs_path(libname: str) -> str | None:
17+
try:
18+
loaded: LoadedDL | None = load_with_system_search(libname)
19+
except DynamicLibNotFoundError:
20+
return None
21+
if loaded is None:
22+
return None
23+
abs_path = loaded.abs_path
24+
if not isinstance(abs_path, str):
25+
return None
26+
return abs_path
27+
28+
29+
def probe_canary_abs_path_and_print_json(libname: str) -> None:
30+
print(json.dumps(_probe_canary_abs_path(libname))) # noqa: T201

cuda_pathfinder/cuda/pathfinder/_dynamic_libs/find_nvidia_dynamic_lib.py

Lines changed: 69 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ def _find_lib_dir_using_anchor_point(libname: str, anchor_point: str, linux_lib_
101101
for rel_path in rel_paths:
102102
for dirname in sorted(glob.glob(os.path.join(anchor_point, rel_path))):
103103
if os.path.isdir(dirname):
104-
return dirname
104+
return os.path.normpath(dirname)
105105

106106
return None
107107

@@ -152,6 +152,64 @@ def _find_dll_using_lib_dir(
152152
return None
153153

154154

155+
def _derive_ctk_root_linux(resolved_lib_path: str) -> str | None:
156+
"""Derive the CTK installation root from a resolved library path on Linux.
157+
158+
Standard system CTK layout: ``$CTK_ROOT/lib64/libfoo.so.XX``
159+
(some installs use ``lib`` instead of ``lib64``).
160+
Also handles target-specific layouts:
161+
``$CTK_ROOT/targets/<triple>/lib64/libfoo.so.XX`` (or ``lib``).
162+
163+
Returns None if the path doesn't match a recognized layout.
164+
"""
165+
lib_dir = os.path.dirname(resolved_lib_path)
166+
basename = os.path.basename(lib_dir)
167+
if basename in ("lib64", "lib"):
168+
parent = os.path.dirname(lib_dir)
169+
grandparent = os.path.dirname(parent)
170+
if os.path.basename(grandparent) == "targets":
171+
# This corresponds to /.../targets/<triple>/lib{,64}
172+
return os.path.dirname(grandparent)
173+
return parent
174+
return None
175+
176+
177+
def _derive_ctk_root_windows(resolved_lib_path: str) -> str | None:
178+
"""Derive the CTK installation root from a resolved library path on Windows.
179+
180+
Handles two CTK layouts:
181+
- CTK 13: ``$CTK_ROOT/bin/x64/foo.dll``
182+
- CTK 12: ``$CTK_ROOT/bin/foo.dll``
183+
184+
Returns None if the path doesn't match a recognized layout.
185+
186+
Uses ``ntpath`` explicitly so the function is testable on any platform.
187+
"""
188+
import ntpath
189+
190+
lib_dir = ntpath.dirname(resolved_lib_path)
191+
basename = ntpath.basename(lib_dir).lower()
192+
if basename == "x64":
193+
parent = ntpath.dirname(lib_dir)
194+
if ntpath.basename(parent).lower() == "bin":
195+
return ntpath.dirname(parent)
196+
elif basename == "bin":
197+
return ntpath.dirname(lib_dir)
198+
return None
199+
200+
201+
def derive_ctk_root(resolved_lib_path: str) -> str | None:
202+
"""Derive the CTK installation root from a resolved library path.
203+
204+
Given the absolute path of a loaded CTK shared library, walk up the
205+
directory tree to find the CTK root. Returns None if the path doesn't
206+
match any recognized CTK directory layout.
207+
"""
208+
if IS_WINDOWS:
209+
return _derive_ctk_root_windows(resolved_lib_path)
210+
return _derive_ctk_root_linux(resolved_lib_path)
211+
212+
155213
class _FindNvidiaDynamicLib:
156214
def __init__(self, libname: str):
157215
self.libname = libname
@@ -185,6 +243,16 @@ def try_with_conda_prefix(self) -> str | None:
185243
def try_with_cuda_home(self) -> str | None:
186244
return self._find_using_lib_dir(_find_lib_dir_using_cuda_home(self.libname))
187245

246+
def try_via_ctk_root(self, ctk_root: str) -> str | None:
247+
"""Find the library under a derived CTK root directory.
248+
249+
Uses :func:`_find_lib_dir_using_anchor_point` which already knows
250+
about non-standard sub-paths (e.g. ``nvvm/lib64`` for nvvm).
251+
"""
252+
return self._find_using_lib_dir(
253+
_find_lib_dir_using_anchor_point(self.libname, anchor_point=ctk_root, linux_lib_dir="lib64")
254+
)
255+
188256
def _find_using_lib_dir(self, lib_dir: str | None) -> str | None:
189257
if lib_dir is None:
190258
return None

cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py

Lines changed: 94 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,24 @@
22
# SPDX-License-Identifier: Apache-2.0
33

44
import functools
5+
import json
56
import struct
67
import sys
78

8-
from cuda.pathfinder._dynamic_libs.find_nvidia_dynamic_lib import _FindNvidiaDynamicLib
9+
from cuda.pathfinder._dynamic_libs.canary_probe_subprocess import probe_canary_abs_path_and_print_json
10+
from cuda.pathfinder._dynamic_libs.find_nvidia_dynamic_lib import (
11+
_FindNvidiaDynamicLib,
12+
derive_ctk_root,
13+
)
914
from cuda.pathfinder._dynamic_libs.load_dl_common import DynamicLibNotFoundError, LoadedDL, load_dependencies
1015
from cuda.pathfinder._dynamic_libs.supported_nvidia_libs import (
16+
_CTK_ROOT_CANARY_ANCHOR_LIBNAMES,
17+
_CTK_ROOT_CANARY_DISCOVERABLE_LIBNAMES,
1118
SUPPORTED_LINUX_SONAMES,
1219
SUPPORTED_WINDOWS_DLLS,
1320
)
1421
from cuda.pathfinder._utils.platform_aware import IS_WINDOWS
22+
from cuda.pathfinder._utils.spawned_process_runner import run_in_spawned_child_process
1523

1624
if IS_WINDOWS:
1725
from cuda.pathfinder._dynamic_libs.load_dl_windows import (
@@ -60,6 +68,67 @@ def _load_driver_lib_no_cache(libname: str) -> LoadedDL:
6068
)
6169

6270

71+
@functools.cache
72+
def _resolve_system_loaded_abs_path_in_subprocess(libname: str) -> str | None:
73+
"""Resolve a library's system-search absolute path in a child process.
74+
75+
This runs in a spawned (not forked) child process. Spawning is important
76+
because it starts from a fresh interpreter state, so the child does not
77+
inherit already-loaded CUDA dynamic libraries from the parent process
78+
(especially the well-known canary probe library).
79+
80+
That keeps any side-effects of loading the canary library scoped to the
81+
child process instead of polluting the current process, and ensures the
82+
canary probe is an independent system-search attempt.
83+
"""
84+
result = run_in_spawned_child_process(
85+
probe_canary_abs_path_and_print_json,
86+
args=(libname,),
87+
timeout=10.0,
88+
rethrow=True,
89+
)
90+
91+
# Read the final non-empty stdout line in case earlier lines are emitted.
92+
lines = [line for line in result.stdout.splitlines() if line.strip()]
93+
if not lines:
94+
raise RuntimeError(f"Canary probe child process produced no stdout payload for {libname!r}")
95+
try:
96+
payload = json.loads(lines[-1])
97+
except json.JSONDecodeError:
98+
raise RuntimeError(
99+
f"Canary probe child process emitted invalid JSON payload for {libname!r}: {lines[-1]!r}"
100+
) from None
101+
if isinstance(payload, str):
102+
return payload
103+
if payload is None:
104+
return None
105+
raise RuntimeError(f"Canary probe child process emitted unexpected payload for {libname!r}: {payload!r}")
106+
107+
108+
def _try_ctk_root_canary(finder: _FindNvidiaDynamicLib) -> str | None:
109+
"""Derive the CTK root from a system-installed canary lib.
110+
111+
For discoverable libs (currently nvvm) whose shared object doesn't reside
112+
on the standard linker path, we locate a well-known CTK lib that IS on
113+
the linker path via system search, derive the CTK installation root from
114+
its resolved path, and then look for the target lib relative to that root.
115+
116+
The canary load is performed in a subprocess to avoid introducing loader
117+
state into the current process.
118+
"""
119+
for canary_libname in _CTK_ROOT_CANARY_ANCHOR_LIBNAMES:
120+
canary_abs_path = _resolve_system_loaded_abs_path_in_subprocess(canary_libname)
121+
if canary_abs_path is None:
122+
continue
123+
ctk_root = derive_ctk_root(canary_abs_path)
124+
if ctk_root is None:
125+
continue
126+
abs_path: str | None = finder.try_via_ctk_root(ctk_root)
127+
if abs_path is not None:
128+
return abs_path
129+
return None
130+
131+
63132
def _load_lib_no_cache(libname: str) -> LoadedDL:
64133
if libname in _DRIVER_ONLY_LIBNAMES:
65134
return _load_driver_lib_no_cache(libname)
@@ -90,11 +159,24 @@ def _load_lib_no_cache(libname: str) -> LoadedDL:
90159
loaded = load_with_system_search(libname)
91160
if loaded is not None:
92161
return loaded
162+
93163
abs_path = finder.try_with_cuda_home()
94-
if abs_path is None:
95-
finder.raise_not_found_error()
96-
else:
164+
if abs_path is not None:
97165
found_via = "CUDA_HOME"
166+
else:
167+
if libname not in _CTK_ROOT_CANARY_DISCOVERABLE_LIBNAMES:
168+
finder.raise_not_found_error()
169+
170+
# Canary probe (discoverable libs only): if the direct system
171+
# search and CUDA_HOME both failed (e.g. nvvm isn't on the linker
172+
# path and CUDA_HOME is unset), try to discover the CTK root by
173+
# loading a well-known CTK lib in a subprocess, then look for the
174+
# target lib relative to that root.
175+
abs_path = _try_ctk_root_canary(finder)
176+
if abs_path is not None:
177+
found_via = "system-ctk-root"
178+
else:
179+
finder.raise_not_found_error()
98180

99181
return load_with_abs_path(libname, abs_path, found_via)
100182

@@ -164,6 +246,14 @@ def load_nvidia_dynamic_lib(libname: str) -> LoadedDL:
164246
165247
- If set, use ``CUDA_HOME`` or ``CUDA_PATH`` (in that order).
166248
249+
5. **CTK root canary probe (discoverable libs only)**
250+
251+
- For selected libraries whose shared object doesn't reside on the
252+
standard linker path (currently ``nvvm``),
253+
attempt to discover the CTK installation root by system-loading a
254+
well-known CTK library (``cudart``) in a subprocess, then derive
255+
the root from its resolved absolute path.
256+
167257
**Driver libraries** (``"cuda"``, ``"nvml"``):
168258
169259
These are part of the NVIDIA display driver (not the CUDA Toolkit) and

cuda_pathfinder/cuda/pathfinder/_dynamic_libs/supported_nvidia_libs.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -358,6 +358,13 @@
358358

359359
LIBNAMES_REQUIRING_RTLD_DEEPBIND = ("cufftMp",)
360360

361+
# CTK root canary probe config:
362+
# - anchor libs: expected on the standard system loader path and used to derive
363+
# CTK root in an isolated child process.
364+
# - discoverable libs: libs that are allowed to use the CTK-root canary fallback.
365+
_CTK_ROOT_CANARY_ANCHOR_LIBNAMES = ("cudart",)
366+
_CTK_ROOT_CANARY_DISCOVERABLE_LIBNAMES = ("nvvm",)
367+
361368
# Based on output of toolshed/make_site_packages_libdirs_linux.py
362369
SITE_PACKAGES_LIBDIRS_LINUX_CTK = {
363370
"cublas": ("nvidia/cu13/lib", "nvidia/cublas/lib"),

cuda_pathfinder/tests/spawned_process_runner.py renamed to cuda_pathfinder/cuda/pathfinder/_utils/spawned_process_runner.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
# SPDX-License-Identifier: Apache-2.0
33

44
import multiprocessing
@@ -24,13 +24,19 @@ class CompletedProcess:
2424

2525

2626
class ChildProcessWrapper:
27-
def __init__(self, result_queue, target, args, kwargs):
27+
def __init__(
28+
self,
29+
result_queue: Any,
30+
target: Callable[..., None],
31+
args: Sequence[Any] | None,
32+
kwargs: dict[str, Any] | None,
33+
) -> None:
2834
self.target = target
2935
self.args = () if args is None else args
3036
self.kwargs = {} if kwargs is None else kwargs
3137
self.result_queue = result_queue
3238

33-
def __call__(self):
39+
def __call__(self) -> None:
3440
# Capture stdout/stderr
3541
old_stdout = sys.stdout
3642
old_stderr = sys.stderr

0 commit comments

Comments
 (0)