feat(package,python): batched parallel parse_static_packages_py (closes #94)#95
Merged
Conversation
…oses #94) Adds `parse_static_packages_py(paths) -> list[PackageInfo | None]` to `rer-package` and exposes it as `pyrer.parse_static_packages_py` on the Python side. The function reads + parses every path in one Rust call across a Rayon thread pool, with the GIL released for the duration via `Python::allow_threads`. ## Why Issue #94: after the static parser landed, cProfile shows the rez shim's per-resolve cost is no longer in the parser itself — it's in the serial Python loop of `open()` calls feeding the parser. On a 132-package Fortiche resolve, that's 3,809 file opens taking 3.20 s (35% of total wall time), one per call with everything but one core idle. `parse_static_packages_py` replaces that loop with a single batched Rust call. Same parse semantics per file, parallel I/O and parsing across cores. ## Result Measured on `/thierry/rez/pkg` (Fortiche on CIFS): - 500 files (warm cache): 56.71 ms → 40.76 ms (1.39× speedup) - 2000 files: 4234.41 ms → 1508.05 ms (2.81× speedup, 2.73 s saved) Both paths accept the same set of files (1864/2000 → static-parseable fraction matches `parse_static_package_py` on per-file calls). Per-file saving on the 2000-file bench: ~1.36 ms. Extrapolated to the issue's 132-package / 2,600-file resolve: ~3.5 s saved per resolve. The 500-file bench was bottlenecked on the warm-page-cache floor (the Rayon overhead amortizes less). At larger batch sizes the parallel I/O win shows through. ## API ```python pyrer.parse_static_packages_py(paths: list[str | os.PathLike]) -> list[PackageData | None] ``` - Output is **positionally aligned** with `paths`. Missing files, unreadable bytes, and parser-bails all produce `None` at the same index. - No exception ever escapes the call. - Pool size follows `RAYON_NUM_THREADS` (Rayon default = logical core count). No per-call knob; cap via env var on shared CI. - Pure addition. The single-file `parse_static_package_py` stays for callers that haven't been updated. ## Shim integration shape ```python def load_family(name, package_paths): pkgs, paths = [], [] for pkg in iter_packages(name, paths=package_paths): filepath = getattr(pkg, "filepath", None) if not filepath or not filepath.endswith(".py"): pkgs.append((pkg, None)) continue pkgs.append((pkg, filepath)) paths.append(filepath) pds = pyrer.parse_static_packages_py(paths) pds_iter = iter(pds) out = [] for pkg, filepath in pkgs: if filepath is None: out.append(pyrer.PackageData.from_rez(pkg)) continue pd = next(pds_iter) out.append(pd if pd is not None else pyrer.PackageData.from_rez(pkg)) return out ``` The shim feature-detects via `hasattr(pyrer, "parse_static_packages_py")` and falls back to the per-file loop on older pyrer. ## Tests Rust unit tests in `rer-package`: - batch_empty_returns_empty - batch_parses_each_file_independently (static + dynamic + missing) - batch_missing_file_becomes_none - batch_preserves_input_order (20 files, alternating static/dynamic) Python tests in `tests/test_rich_api.py`: - test_parse_static_packages_py_empty_input - test_parse_static_packages_py_each_file_independent (static + dynamic + missing + static, aligned) - test_parse_static_packages_py_preserves_order (20 alternating, par_iter ordering check) - test_parse_static_packages_py_accepts_pathlib_paths - test_parse_static_packages_py_drives_solve (batch → pyrer.solve E2E) - test_parse_static_packages_py_matches_single_file Plus `scripts/bench_batched_parser.py` for measuring against any rez repo. ## Verification - `cargo test --lib`: 34/34 (rer-package) + 46/46 (rer-resolver) + 34/34 (rer-version) = 114/114 - `pytest tests/`: 111/111 (was 105 + 6 new) - 188-case strict rez differential: 188/188 in 16.91 s - cargo build: clean ## Docs `docs/content/docs/getting-started/rez-integration.md` gets a new "Faster: batched parallel parse (issue #94)" subsection with the shim integration shape, semantics, and a capability-detect recommendation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Picks up the batched parallel `parse_static_packages_py` API (closes #94, this branch). On the Fortiche corpus the batched call delivered a 2.81× cut to the open+parse phase (4.23 s → 1.51 s on 2,000 files) over the serial Python loop the shim ran today, with no per-file correctness drift. Four touchpoints, same as every previous rc bump: - Cargo.toml: workspace version + the three internal-dep pins - docs/config.toml: GitHub-pill version - docs/content/_index.md: homepage repo_version - docs/content/docs/getting-started/quick-start.md: Rust dep snippet Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…94) Two doc surfaces updated: 1. `docs/content/docs/getting-started/rez-integration.md` — "Faster: batched parallel parse (issue #94)" section expanded from a sketch to a full production guide matching the depth of the single-file parser section. Adds: - "What it does" — full semantics (positional alignment, no escaped exceptions, GIL release, pool size). - "Measured impact on Fortiche" — table with the 1.39× / 2.81× numbers and the per-file saving. - "Integration" — complete two-tier load_family snippet wiring #86 + #92 + static parser + #94 together; the actual recipe a shim author can paste. - "Backward compatibility / feature detection" — `hasattr` pattern matching the rest of pyrer's optional APIs. - "Shadow-validation mode" — `REZ_PYRER_VALIDATE_BATCHED` recipe that reuses the from_rez(pkg) comparison from Stage 2. - "Metrics" — class-level counters for batched_hits / batched_misses_io / batched_misses_dynamic / non_py_packages so production rollouts can confirm hit rate. - "Rollout plan" — 4-week table with progressive user-percentage gates, mirroring the parser rollout shape. - "Where this WON'T help" — honest caveat list (tiny resolves, dynamic 7%, load_family cache hits, cross-invocation cost). 2. `docs/content/docs/engineering/fast-package-py-parser.md` — new "Stage 4 — Batched parallel parse (issue #94)" section slotted before "Considered alternatives". Captures the design decisions, the result table, and the safety-net carry-over from Stage 2 (per-file semantics are identical so the differential coverage transfers byte-for-byte). Top-of-doc status banner updated to "Stages 1–4 shipped" with all the relevant script paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
pyrer.parse_static_packages_py(paths)— a batched, Rayon-parallel variant ofparse_static_package_pythat opens and parses every path in one Rust call, with the GIL released for the whole batch.Pure addition. The single-file API stays. Closes #94.
Stacked on
experimentalbecause the existing static parser lives there.Measured on /thierry/rez/pkg + CIFS
Per-file saving on the 2,000-file run: ~1.36 ms. Extrapolated to the issue's 132-package / 2,600-file resolve: ~3.5 s saved per resolve.
The smaller-sample bench was bottlenecked on warm-page-cache parsing-CPU and the Rayon overhead amortizes less; the win scales as the batch grows and as more files require fresh reads. On cold CIFS (Windows production) the parallel-I/O overlap should compound further — the bench is a lower bound.
Correctness
Both paths produce identical `PackageData` for every file (1864/2000 → static-parseable fraction matches the single-file parser on warm-cache).
API
```python
pyrer.parse_static_packages_py(paths: list[str | os.PathLike])
-> list[PackageData | None]
```
Shim integration
The new section in `docs/content/docs/getting-started/rez-integration.md` shows the recommended `load_family` shape: gather paths first, single `parse_static_packages_py` call, then iterate aligned results and fall back to `from_rez(pkg)` on `None`.
Test plan
🤖 Generated with Claude Code