You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Standardize internal version checks in cuda.core (#1825)
* Cythonize _graph/_graph_builder (move from pure Python to .pyx)
Move the GraphBuilder/Graph/GraphCompleteOptions/GraphDebugPrintOptions
implementation out of _graph/__init__.py into _graph/_graph_builder.pyx
so it is compiled by Cython. A thin __init__.py re-exports the public
names so all existing import sites continue to work unchanged.
Cython compatibility adjustments:
- Remove `from __future__ import annotations` (unsupported by Cython)
- Remove TYPE_CHECKING guard; quote annotations that reference Stream
(circular import), forward-reference GraphBuilder/Graph, or use
X | None union syntax
- Update _graphdef.pyx lazy imports to point directly at _graph_builder
No build_hooks.py changes needed — the build system auto-discovers .pyx
files via glob.
Ref: #1076
Made-with: Cursor
* Remove _lazy_init from _graph_builder; add cached get_driver_version
Replace the per-module _lazy_init / _inited / _driver_ver / _py_major_minor
pattern in _graph_builder.pyx with direct calls to centralized cached
functions in cuda_utils:
- Add get_driver_version() with @functools.cache alongside get_binding_version
- Switch get_binding_version from @functools.lru_cache to @functools.cache
(cleaner for nullary functions)
- Fix split() to return tuple(result) — Cython enforces return type
annotations unlike pure Python
- Fix _cond_with_params annotation from -> GraphBuilder to -> tuple
to match actual return value
Made-with: Cursor
* Add CPU callbacks for stream capture (GraphBuilder.callback)
Implements #1328: host callbacks during stream capture via
cuLaunchHostFunc, mirroring the existing GraphDef.callback API.
Extracts shared callback infrastructure (_attach_user_object,
_attach_host_callback_to_graph, trampoline/destructor) into a new
_graph/_utils.pyx module to avoid circular imports between
_graph_builder and _graphdef.
Made-with: Cursor
* Standardize internal version checks in cuda.core
Move binding and driver version queries into a dedicated
cuda/core/_utils/version.{pyx,pxd} module, providing both Python
(binding_version, driver_version) and Cython (cy_binding_version,
cy_driver_version) entry points. All functions return version tuples
((major, minor, patch)) and are cached—Python via @functools.cache,
Cython via module-level globals.
Remove get_binding_version / get_driver_version from cuda_utils.pyx
and update all internal call sites and tests to import from the new
module. Remove version checks for CUDA < 12.0 (now the minimum) and
eliminate dead code exposed by the migration: _lazy_init / _use_ex /
_kernel_ctypes / _is_cukernel_get_library_supported machinery in
_module.pyx, _launcher.pyx, and _launch_config.pyx.
The public NVML-based system.get_driver_version API is unrelated and
left unchanged.
Made-with: Cursor
* Fix unused imports after merge with main
Remove unused imports flagged by cython-lint and ruff after
resolving merge conflicts with origin/main.
Made-with: Cursor
* Replace _reduce_3_tuple with math.prod in _launcher.pyx
Remove the now-dead _reduce_3_tuple helper from cuda_utils.pyx.
Made-with: Cursor
* Remove _driver_ver from _linker.pyx; use _use_nvjitlink_backend as guard
Initialize _use_nvjitlink_backend to None so it can serve as its own
"already decided" sentinel, eliminating the redundant _driver_ver variable
and the driver_version() call that was only used to set it.
Made-with: Cursor
* Add return type annotations to version.pyx; fix minor arithmetic
Add -> tuple[int, int, int] annotations to binding_version and
driver_version. Align driver_version arithmetic with _system.pyx.
Made-with: Cursor
0 commit comments