Skip to content

feat(package,python): batched parallel parse_static_packages_py (closes #94)#95

Merged
doubleailes merged 3 commits into
experimentalfrom
batched-static-parser
May 18, 2026
Merged

feat(package,python): batched parallel parse_static_packages_py (closes #94)#95
doubleailes merged 3 commits into
experimentalfrom
batched-static-parser

Conversation

@doubleailes
Copy link
Copy Markdown
Owner

Summary

Adds pyrer.parse_static_packages_py(paths) — a batched, Rayon-parallel variant of parse_static_package_py that opens and parses every path in one Rust call, with the GIL released for the whole batch.

Pure addition. The single-file API stays. Closes #94.

Stacked on experimental because the existing static parser lives there.

Measured on /thierry/rez/pkg + CIFS

Sample Serial loop Batched Speedup
500 files (warm cache) 56.71 ms 40.76 ms 1.39×
2,000 files 4,234 ms 1,508 ms 2.81× (2.73 s saved)

Per-file saving on the 2,000-file run: ~1.36 ms. Extrapolated to the issue's 132-package / 2,600-file resolve: ~3.5 s saved per resolve.

The smaller-sample bench was bottlenecked on warm-page-cache parsing-CPU and the Rayon overhead amortizes less; the win scales as the batch grows and as more files require fresh reads. On cold CIFS (Windows production) the parallel-I/O overlap should compound further — the bench is a lower bound.

Correctness

Both paths produce identical `PackageData` for every file (1864/2000 → static-parseable fraction matches the single-file parser on warm-cache).

Suite Result
`cargo test --lib` 114/114 (34 rer-package + 46 rer-resolver + 34 rer-version)
`pytest tests/` 111/111 (was 105 + 6 new)
188-case strict rez differential 188/188 in 16.91 s — solver unchanged
`cargo build` clean

API

```python
pyrer.parse_static_packages_py(paths: list[str | os.PathLike])
-> list[PackageData | None]
```

  • Output is positionally aligned with `paths`. Missing files, unreadable bytes, parser-bails all become `None` at the same index.
  • No exception escapes. Per-file failures map to `None`.
  • GIL released for the whole batch via `Python::allow_threads`.
  • Pool size follows `RAYON_NUM_THREADS` (default: logical core count).
  • Feature-detectable: `hasattr(pyrer, "parse_static_packages_py")`.

Shim integration

The new section in `docs/content/docs/getting-started/rez-integration.md` shows the recommended `load_family` shape: gather paths first, single `parse_static_packages_py` call, then iterate aligned results and fall back to `from_rez(pkg)` on `None`.

Test plan

  • Rust unit tests pass (114/114)
  • Python tests pass (111/111)
  • Strict 188-case rez differential still 188/188
  • `scripts/bench_batched_parser.py` shows >2× on the Fortiche corpus
  • Wire into the Fortiche shim, capture before/after profile of a real `rez env`
  • Confirm ~3.5 s saved per resolve in production

🤖 Generated with Claude Code

…oses #94)

Adds `parse_static_packages_py(paths) -> list[PackageInfo | None]`
to `rer-package` and exposes it as `pyrer.parse_static_packages_py`
on the Python side. The function reads + parses every path in one
Rust call across a Rayon thread pool, with the GIL released for
the duration via `Python::allow_threads`.

## Why

Issue #94: after the static parser landed, cProfile shows the rez
shim's per-resolve cost is no longer in the parser itself — it's
in the serial Python loop of `open()` calls feeding the parser.
On a 132-package Fortiche resolve, that's 3,809 file opens taking
3.20 s (35% of total wall time), one per call with everything but
one core idle.

`parse_static_packages_py` replaces that loop with a single
batched Rust call. Same parse semantics per file, parallel I/O and
parsing across cores.

## Result

Measured on `/thierry/rez/pkg` (Fortiche on CIFS):

  - 500 files (warm cache):     56.71 ms → 40.76 ms  (1.39× speedup)
  - 2000 files:               4234.41 ms → 1508.05 ms (2.81× speedup,
                                                       2.73 s saved)

Both paths accept the same set of files (1864/2000 → static-parseable
fraction matches `parse_static_package_py` on per-file calls). Per-file
saving on the 2000-file bench: ~1.36 ms. Extrapolated to the issue's
132-package / 2,600-file resolve: ~3.5 s saved per resolve.

The 500-file bench was bottlenecked on the warm-page-cache floor (the
Rayon overhead amortizes less). At larger batch sizes the parallel I/O
win shows through.

## API

```python
pyrer.parse_static_packages_py(paths: list[str | os.PathLike])
    -> list[PackageData | None]
```

- Output is **positionally aligned** with `paths`. Missing files,
  unreadable bytes, and parser-bails all produce `None` at the
  same index.
- No exception ever escapes the call.
- Pool size follows `RAYON_NUM_THREADS` (Rayon default = logical
  core count). No per-call knob; cap via env var on shared CI.
- Pure addition. The single-file `parse_static_package_py` stays
  for callers that haven't been updated.

## Shim integration shape

```python
def load_family(name, package_paths):
    pkgs, paths = [], []
    for pkg in iter_packages(name, paths=package_paths):
        filepath = getattr(pkg, "filepath", None)
        if not filepath or not filepath.endswith(".py"):
            pkgs.append((pkg, None))
            continue
        pkgs.append((pkg, filepath))
        paths.append(filepath)

    pds = pyrer.parse_static_packages_py(paths)
    pds_iter = iter(pds)

    out = []
    for pkg, filepath in pkgs:
        if filepath is None:
            out.append(pyrer.PackageData.from_rez(pkg))
            continue
        pd = next(pds_iter)
        out.append(pd if pd is not None else pyrer.PackageData.from_rez(pkg))
    return out
```

The shim feature-detects via `hasattr(pyrer, "parse_static_packages_py")`
and falls back to the per-file loop on older pyrer.

## Tests

Rust unit tests in `rer-package`:
  - batch_empty_returns_empty
  - batch_parses_each_file_independently (static + dynamic + missing)
  - batch_missing_file_becomes_none
  - batch_preserves_input_order (20 files, alternating static/dynamic)

Python tests in `tests/test_rich_api.py`:
  - test_parse_static_packages_py_empty_input
  - test_parse_static_packages_py_each_file_independent (static + dynamic + missing + static, aligned)
  - test_parse_static_packages_py_preserves_order (20 alternating, par_iter ordering check)
  - test_parse_static_packages_py_accepts_pathlib_paths
  - test_parse_static_packages_py_drives_solve (batch → pyrer.solve E2E)
  - test_parse_static_packages_py_matches_single_file

Plus `scripts/bench_batched_parser.py` for measuring against any rez repo.

## Verification

  - `cargo test --lib`: 34/34 (rer-package) + 46/46 (rer-resolver) +
    34/34 (rer-version) = 114/114
  - `pytest tests/`: 111/111 (was 105 + 6 new)
  - 188-case strict rez differential: 188/188 in 16.91 s
  - cargo build: clean

## Docs

`docs/content/docs/getting-started/rez-integration.md` gets a new
"Faster: batched parallel parse (issue #94)" subsection with the
shim integration shape, semantics, and a capability-detect
recommendation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

doubleailes and others added 2 commits May 18, 2026 19:25
Picks up the batched parallel `parse_static_packages_py` API
(closes #94, this branch). On the Fortiche corpus the batched
call delivered a 2.81× cut to the open+parse phase (4.23 s →
1.51 s on 2,000 files) over the serial Python loop the shim
ran today, with no per-file correctness drift.

Four touchpoints, same as every previous rc bump:
  - Cargo.toml: workspace version + the three internal-dep pins
  - docs/config.toml: GitHub-pill version
  - docs/content/_index.md: homepage repo_version
  - docs/content/docs/getting-started/quick-start.md: Rust dep snippet

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…94)

Two doc surfaces updated:

1. `docs/content/docs/getting-started/rez-integration.md` —
   "Faster: batched parallel parse (issue #94)" section expanded
   from a sketch to a full production guide matching the depth of
   the single-file parser section. Adds:

   - "What it does" — full semantics (positional alignment, no
     escaped exceptions, GIL release, pool size).
   - "Measured impact on Fortiche" — table with the 1.39× / 2.81×
     numbers and the per-file saving.
   - "Integration" — complete two-tier load_family snippet wiring
     #86 + #92 + static parser + #94 together; the actual recipe
     a shim author can paste.
   - "Backward compatibility / feature detection" — `hasattr`
     pattern matching the rest of pyrer's optional APIs.
   - "Shadow-validation mode" — `REZ_PYRER_VALIDATE_BATCHED` recipe
     that reuses the from_rez(pkg) comparison from Stage 2.
   - "Metrics" — class-level counters for batched_hits /
     batched_misses_io / batched_misses_dynamic / non_py_packages
     so production rollouts can confirm hit rate.
   - "Rollout plan" — 4-week table with progressive user-percentage
     gates, mirroring the parser rollout shape.
   - "Where this WON'T help" — honest caveat list (tiny resolves,
     dynamic 7%, load_family cache hits, cross-invocation cost).

2. `docs/content/docs/engineering/fast-package-py-parser.md` —
   new "Stage 4 — Batched parallel parse (issue #94)" section
   slotted before "Considered alternatives". Captures the design
   decisions, the result table, and the safety-net carry-over
   from Stage 2 (per-file semantics are identical so the
   differential coverage transfers byte-for-byte). Top-of-doc
   status banner updated to "Stages 1–4 shipped" with all the
   relevant script paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@doubleailes doubleailes merged commit e9c743a into experimental May 18, 2026
23 checks passed
@doubleailes doubleailes deleted the batched-static-parser branch May 18, 2026 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant