From a21aceb32a4f5235e99dca97ef9b6c1f761b977d Mon Sep 17 00:00:00 2001 From: Philippe Llerena Date: Sat, 16 May 2026 18:17:22 +0200 Subject: [PATCH 1/4] feat(resolver,python): lazy package discovery via a Python callback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes #86. ## Why `pyrer.solve()` previously required the complete `list[PackageData]` up front, so any integration had to materialise every package the solve *might* touch before the Rust algorithm started. For loader-driven integrations against rez — where each `package.py` is arbitrary Python that's only AST-evaluated on attribute access — this defeats rez's own lazy-load advantage: rez can bail on an early conflict after touching two families while the rest of the dep graph stays on disk. ## Resolver: `PackageRepo` becomes a struct `PackageRepo` was a type alias for `HashMap>` — eager and immutable. It's now a struct holding: RefCell>>> # cache Option Vec<(String, PackageData)>>> # loader `get_family(name) -> Option>` routes every lookup through the cache; on miss it calls the loader (if any), memoising both the hit and the "no such family" answer so the loader fires at most once per family per repo. Constructors: PackageRepo::from_map(HashMap<…>) # eager, no loader (back-compat) HashMap<…>::into() → PackageRepo # same, via From PackageRepo::with_loader(loader) # lazy PackageRepo::insert_family(name, fam) # pre-seed a lazy repo Plus `family_count()` for the bench reporter (was `.len()` on the old alias). The two repo-access sites in the codebase change cleanly: - `PackageVariantList::new` (`variant.rs:372`) now does `ctx.repo.get_family(name)?` and stores `Rc` for the one family it represents (was `Rc` covering all of them). - The version-data lookup inside `get_intersection` becomes `self.versions[&version_str]` — single index, no longer double-indexed through the whole repo. The cache (`PackageVariantCache`) is unchanged: it still memoises the parsed `PackageVariantList` per family, on top of the new repo's own cache. Two cheap memos for two distinct things (parsed variants vs. raw family map); no double work. ## Python: `load_family` kwarg ```python result = pyrer.solve( requests, packages=None, # eager seed, optional load_family=my_loader, # NEW: Callable[[str], list[PackageData]] ) ``` - The eager `packages` arg now defaults to `None`. When `load_family` is given, anything in `packages` is pre-seeded into the cache so the loader is never called for those families. - The loader returns a `list[PackageData]` for the requested family. Empty list ⇒ "no such family" (cached, never re-asked). - Defensive: entries whose `name` ≠ the requested family are dropped (a misbehaving loader can't poison the repo for other families). Duplicate versions inside one loader response surface as `status='error'`. - If the Python callback raises, the loader stores the exception's message in a shared `RefCell`, returns empty (so the solver doesn't pile up errors), and the outer `solve()` surfaces it as `status='error'` before any other status. Never a Python exception out of pyrer. - GIL: kept held throughout the solve for v1. `Solver` is `!Send` because it holds `Rc`, so `Python::allow_threads` can't move it across the GIL release boundary. Practical effect: blocks other Python threads during the resolve. Worth revisiting after the internal `Rc → Arc` switch (separate change; not load-bearing for this issue's wins). ## Tests Rust (in `solver::tests`): - `test_loader_called_only_for_needed_families` — a loader-backed repo solves correctly and the loader is *not* called for an unrelated family the solver never touches. - `test_loader_called_once_per_family` — a diamond (`app → lib & util; util → lib`) loads `lib` exactly once. - `test_loader_empty_means_missing_family` — empty result for an unknown name produces a failed resolve, not a panic. Python (in `tests/test_rich_api.py`): - `test_load_family_lazy_only_touched_families` - `test_load_family_called_at_most_once_per_family` - `test_load_family_empty_means_no_such_family` - `test_load_family_works_with_eager_seed` - `test_load_family_callback_exception_surfaces_as_error` - `test_load_family_filters_mismatched_name` - `test_load_family_duplicate_versions_reports_error` ## Verification - `cargo test --lib -p rer-resolver`: **44/44** (was 41 + 3 new). - `pytest tests/`: **87/87** (was 80 + 7 new). - `cargo test --release … --ignored` (strict 188-case differential against recorded rez resolves): **188/188** in 17.37 s — unchanged. - Eager-path benchmark on the same machine as the README reference (Intel Xeon E5-2699 v4): total 11.29 s, mean 60.07 ms, 33.82× rez. README reference is 11.35 s / 60 ms / 34.1× — within noise. The `RefCell` hop pays nothing because each family's `Rc` is borrowed once during `PackageVariantList::new` and the cache holds a direct `Rc` thereafter. Co-Authored-By: Claude Opus 4.7 --- crates/examples/rez_benchmark_dataset.rs | 12 +- crates/rer-python/src/lib.rs | 119 ++++++++++++++-- crates/rer-resolver/src/rez_solver/context.rs | 127 +++++++++++++++++- crates/rer-resolver/src/rez_solver/mod.rs | 3 +- crates/rer-resolver/src/rez_solver/scope.rs | 7 +- crates/rer-resolver/src/rez_solver/solver.rs | 127 ++++++++++++++---- crates/rer-resolver/src/rez_solver/variant.rs | 27 ++-- .../rer-resolver/tests/test_rez_benchmark.rs | 23 +++- tests/test_rich_api.py | 120 +++++++++++++++++ 9 files changed, 494 insertions(+), 71 deletions(-) diff --git a/crates/examples/rez_benchmark_dataset.rs b/crates/examples/rez_benchmark_dataset.rs index 437896d..e414fcb 100644 --- a/crates/examples/rez_benchmark_dataset.rs +++ b/crates/examples/rez_benchmark_dataset.rs @@ -8,9 +8,8 @@ //! cargo run --release -p examples --example rez_benchmark_dataset //! ``` -use rer_resolver::rez_solver::{ - make_shared_cache, PackageRepo, Requirement, Solver, SolverStatus, -}; +use rer_resolver::rez_solver::{make_shared_cache, PackageRepo, Requirement, Solver, SolverStatus}; +use std::collections::HashMap; // Callgrind on this binary shows ~33 % of cycles in libc malloc/free — // `SmallVec` extends inside `Ranges`, per-call `FxHashMap`s in `reduce_by`, @@ -57,13 +56,16 @@ fn percentile(sorted: &[f64], p: f64) -> f64 { } fn main() { - let repo: Rc = Rc::new(load_json("benchmark_packages.json")); + type RepoMap = HashMap>; + let repo_map: RepoMap = load_json("benchmark_packages.json"); + let family_count = repo_map.len(); + let repo: Rc = Rc::new(PackageRepo::from_map(repo_map)); let cases: Vec = load_json("benchmark_expected.json"); println!( "running {} requests against {} package families\n", cases.len(), - repo.len() + family_count ); let mut times_ms: Vec = Vec::with_capacity(cases.len()); diff --git a/crates/rer-python/src/lib.rs b/crates/rer-python/src/lib.rs index 78a195d..75952d8 100644 --- a/crates/rer-python/src/lib.rs +++ b/crates/rer-python/src/lib.rs @@ -14,9 +14,10 @@ use pyo3::exceptions::PyValueError; use pyo3::prelude::*; use pyo3::types::PyType; use rer_resolver::rez_solver::{ - make_shared_cache, PackageRepo, Requirement, ScopeError, Solver, SolverStatus, - VariantSelectMode, + make_shared_cache, FamilyLoader, FamilyMap, PackageRepo, Requirement, ScopeError, Solver, + SolverStatus, VariantSelectMode, }; +use std::cell::RefCell; use std::collections::HashMap; use std::panic::{catch_unwind, AssertUnwindSafe}; use std::rc::Rc; @@ -235,13 +236,13 @@ impl SolveResult { // Repository conversion // --------------------------------------------------------------------------- -/// Fold a flat `list[PackageData]` into the `family → version → data` repo +/// Fold a flat `list[PackageData]` into the `family → version → data` map /// shape the solver works on. Duplicates (same family + version) raise an /// `error` result rather than silently shadowing. -fn packages_to_repo(packages: Vec) -> Result { - let mut repo: PackageRepo = HashMap::new(); +fn packages_to_map(packages: Vec) -> Result, String> { + let mut map: HashMap = HashMap::new(); for p in packages { - let entry = repo.entry(p.name.clone()).or_default(); + let entry = map.entry(p.name.clone()).or_default(); if entry .insert( p.version.clone(), @@ -255,7 +256,68 @@ fn packages_to_repo(packages: Vec) -> Result { return Err(format!("duplicate package: {}-{}", p.name, p.version)); } } - Ok(repo) + Ok(map) +} + +/// Build a [`FamilyLoader`] that calls the given Python callable for each +/// family the solver hasn't yet seen, mirroring issue #86's lazy-discovery +/// shape. +/// +/// `load_err` is shared with the caller — if the Python callback raises, +/// the loader stores the error there and returns an empty `Vec`, which the +/// repo memoises as "no such family". The outer `solve()` checks the +/// `RefCell` after the solver finishes and surfaces the captured error as +/// a `"error"`-status `SolveResult`. +/// +/// Entries whose `name` doesn't match the requested family are dropped +/// defensively — a misbehaving loader can't poison the repo for unrelated +/// families. +fn make_loader(callback: Py, load_err: Rc>>) -> FamilyLoader { + Box::new( + move |name: &str| -> Vec<(String, rer_resolver::PackageData)> { + // Already errored on a previous call — short-circuit so we don't + // pile up errors and don't keep calling a broken callback. + if load_err.borrow().is_some() { + return Vec::new(); + } + let result: PyResult> = + Python::with_gil(|py| -> PyResult<_> { + let ret = callback.bind(py).call1((name,))?; + let pkgs: Vec = ret.extract()?; + let mut out: Vec<(String, rer_resolver::PackageData)> = + Vec::with_capacity(pkgs.len()); + let mut seen_versions: std::collections::HashSet = + std::collections::HashSet::new(); + for p in pkgs { + if p.name != name { + continue; + } + if !seen_versions.insert(p.version.clone()) { + return Err(PyValueError::new_err(format!( + "load_family({:?}) returned duplicate version {:?}", + name, p.version + ))); + } + out.push(( + p.version, + rer_resolver::PackageData { + requires: p.requires, + variants: p.variants, + }, + )); + } + Ok(out) + }); + match result { + Ok(pairs) => pairs, + Err(err) => { + let msg = Python::with_gil(|py| err.value(py).to_string()); + *load_err.borrow_mut() = Some(format!("load_family({name:?}) raised: {msg}")); + Vec::new() + } + } + }, + ) } // --------------------------------------------------------------------------- @@ -284,7 +346,12 @@ fn parse_variant_select_mode(s: &str) -> PyResult { /// * `packages` — a `list[PackageData]`, mirroring rez's already-loaded /// packages. Construct each entry from a `rez.Package` (via /// `rez.packages.iter_package_families` etc.) — `pyrer` does not read -/// the filesystem itself. +/// the filesystem itself. Optional if `load_family` is supplied. +/// * `load_family` — Optional `Callable[[str], list[PackageData]]` invoked +/// on demand the first time the solver needs a family that isn't already +/// in `packages`. The result is cached for the lifetime of the solve, +/// so each family is loaded at most once. An empty list means "no such +/// family" and is treated the same as an absent family. See issue #86. /// * `variant_select_mode` — either `"version_priority"` (default, rez's /// default config) or `"intersection_priority"`. Mirrors rez's /// `config.variant_select_mode`. @@ -299,14 +366,17 @@ fn parse_variant_select_mode(s: &str) -> PyResult { #[pyfunction] #[pyo3( signature = ( - package_requests, packages, /, + package_requests, packages=None, /, + *, + load_family=None, variant_select_mode="version_priority", filters=None, max_iterations=None, ) )] fn solve( package_requests: Vec, - packages: Vec, + packages: Option>, + load_family: Option>, variant_select_mode: &str, filters: Option>, max_iterations: Option, @@ -316,11 +386,27 @@ fn solve( let mode = parse_variant_select_mode(variant_select_mode)?; - let repo: PackageRepo = match packages_to_repo(packages) { - Ok(repo) => repo, + let initial_map = match packages_to_map(packages.unwrap_or_default()) { + Ok(map) => map, Err(msg) => return Ok(SolveResult::error(msg, start)), }; + // Shared error slot for the loader. Populated if the Python callback + // raises; checked after the solver finishes to surface the failure. + let load_err: Rc>> = Rc::new(RefCell::new(None)); + + let repo = if let Some(callback) = load_family { + let lazy = PackageRepo::with_loader(make_loader(callback, Rc::clone(&load_err))); + // Seed the eager set so the loader is never called for families + // the caller already supplied. + for (name, fam) in initial_map { + lazy.insert_family(name, fam); + } + lazy + } else { + PackageRepo::from_map(initial_map) + }; + // `Requirement::parse` panics on a syntactically invalid version range; // catch that at the FFI boundary and report it as `"error"` rather than // letting it surface as a Python `PanicException`. @@ -329,12 +415,17 @@ fn solve( .iter() .map(|s| Requirement::parse(s)) .collect(); - let mut solver = - Solver::new_with_options(reqs, Rc::new(repo), make_shared_cache(), mode)?; + let mut solver = Solver::new_with_options(reqs, Rc::new(repo), make_shared_cache(), mode)?; solver.solve(); Ok::(solver) })); + // If the loader raised, that's the user-facing error — surface it + // before whatever fallback status the solver may have produced. + if let Some(msg) = load_err.borrow_mut().take() { + return Ok(SolveResult::error(msg, start)); + } + let solver = match outcome { Ok(Ok(solver)) => solver, // A missing top-level package family/version. rez reports this as a diff --git a/crates/rer-resolver/src/rez_solver/context.rs b/crates/rer-resolver/src/rez_solver/context.rs index c5c600f..378bb35 100644 --- a/crates/rer-resolver/src/rez_solver/context.rs +++ b/crates/rer-resolver/src/rez_solver/context.rs @@ -10,11 +10,130 @@ use std::cell::RefCell; use std::collections::HashMap; use std::rc::Rc; -/// The in-memory package repository: `family -> version -> PackageData`. +/// One package family's `version -> PackageData` map. +pub type FamilyMap = HashMap; + +/// Callback invoked on the first lookup for a family that is not already in +/// the repo. Returns `(version_string, PackageData)` pairs — every version of +/// the family. An empty result means "no such family"; the repo caches that +/// answer and never calls the loader for the same name again. +/// +/// Mirrors the lazy-load behaviour rez gets from its `Package` resource +/// wrapper (each `package.py` is AST-evaluated on first attribute access). +/// `pyrer` builds one of these from a Python callable for issue #86. +pub type FamilyLoader = Box Vec<(String, PackageData)>>; + +/// The package repository — `family -> version -> PackageData`. +/// +/// Replaces rez's on-disk `iter_packages(paths)` for callers that have the +/// data already loaded. With a [`FamilyLoader`] attached (see +/// [`Self::with_loader`]) it can also discover families lazily, the way rez's +/// own solver does. /// -/// This replaces rez's on-disk `iter_packages(paths)` — the data is already -/// loaded, so the port does not need rez's lazy package-loading machinery. -pub type PackageRepo = HashMap>; +/// Lookups are routed through [`Self::get_family`], which: +/// 1. Returns the cached `Rc` if the family has been seen. +/// 2. Otherwise calls the loader (if any), memoising both the hit and the +/// "no such family" answer. +/// 3. Otherwise returns `None`. +/// +/// Construction: +/// - [`Self::from_map`] / `impl From>` — eager, no loader. +/// - [`Self::with_loader`] — lazy; the loader is consulted on miss. +#[derive(Default)] +pub struct PackageRepo { + /// `Some(map)` for present families, `None` for families the loader + /// confirmed as absent (so we don't re-call it on miss). + families: RefCell>>>, + loader: Option, +} + +impl std::fmt::Debug for PackageRepo { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + f.debug_struct("PackageRepo") + .field("families", &self.families) + .field("loader", &self.loader.as_ref().map(|_| "")) + .finish() + } +} + +impl PackageRepo { + /// Empty repo, no loader. Mostly useful as a starting point for tests + /// that build the repo via [`Self::insert_family`]. + pub fn empty() -> Self { + Self::default() + } + + /// Eager repo from a `family -> version -> PackageData` map. The loader + /// is `None`, so any family not in `map` is reported as absent on lookup. + pub fn from_map(map: HashMap) -> Self { + let families = map + .into_iter() + .map(|(name, fam)| (name, Some(Rc::new(fam)))) + .collect(); + PackageRepo { + families: RefCell::new(families), + loader: None, + } + } + + /// Repo backed by a loader. The loader is called the first time the + /// solver asks for a family that isn't already cached — both hits and + /// "no such family" answers are memoised, so the loader fires at most + /// once per family per repo. + /// + /// Use [`Self::insert_family`] to pre-seed families that are already + /// in memory (e.g. ones produced by the caller's BFS seed pass). + pub fn with_loader(loader: FamilyLoader) -> Self { + PackageRepo { + families: RefCell::default(), + loader: Some(loader), + } + } + + /// Pre-populate a family. Useful with [`Self::with_loader`] to skip + /// the loader for families already in memory. + pub fn insert_family(&self, name: String, fam: FamilyMap) { + self.families.borrow_mut().insert(name, Some(Rc::new(fam))); + } + + /// Number of families currently cached in the repo. With a loader + /// attached this grows as the solve progresses — it only reflects the + /// eager-seeded set + whatever the loader has been asked for so far. + pub fn family_count(&self) -> usize { + self.families + .borrow() + .values() + .filter(|v| v.is_some()) + .count() + } + + /// `Some(family map)` if the family exists (cached or lazily loaded); + /// `None` if there's no loader and it isn't cached, or if the loader + /// returned no entries for it. + pub fn get_family(&self, name: &str) -> Option> { + if let Some(slot) = self.families.borrow().get(name) { + return slot.clone(); + } + let loaded = self.loader.as_ref().and_then(|load| { + let entries = load(name); + if entries.is_empty() { + None + } else { + Some(Rc::new(entries.into_iter().collect::>())) + } + }); + self.families + .borrow_mut() + .insert(name.to_string(), loaded.clone()); + loaded + } +} + +impl From> for PackageRepo { + fn from(map: HashMap) -> Self { + Self::from_map(map) + } +} /// A `PackageVariantCache` that can be shared between solves of the same /// repository. Building a variant list (parsing every variant's requires) diff --git a/crates/rer-resolver/src/rez_solver/mod.rs b/crates/rer-resolver/src/rez_solver/mod.rs index 79e0e0c..a91a6d0 100644 --- a/crates/rer-resolver/src/rez_solver/mod.rs +++ b/crates/rer-resolver/src/rez_solver/mod.rs @@ -35,7 +35,8 @@ pub mod variant; pub type Name = std::rc::Rc; pub use context::{ - make_shared_cache, PackageRepo, SharedVariantCache, SolverContext, VariantSelectMode, + make_shared_cache, FamilyLoader, FamilyMap, PackageRepo, SharedVariantCache, SolverContext, + VariantSelectMode, }; pub use failure::{DependencyConflict, FailureReason, SolverStatus}; pub use phase::ResolvePhase; diff --git a/crates/rer-resolver/src/rez_solver/scope.rs b/crates/rer-resolver/src/rez_solver/scope.rs index caf1568..07d0ba5 100644 --- a/crates/rer-resolver/src/rez_solver/scope.rs +++ b/crates/rer-resolver/src/rez_solver/scope.rs @@ -342,7 +342,7 @@ impl std::fmt::Display for PackageScope { #[cfg(test)] mod tests { - use super::super::context::{PackageRepo, SolverContext}; + use super::super::context::{FamilyMap, PackageRepo, SolverContext}; use super::super::requirement::{Requirement, RequirementList}; use super::*; use crate::PackageData; @@ -358,7 +358,7 @@ mod tests { } fn repo(entries: Vec<(&str, Vec<(&str, PackageData)>)>) -> PackageRepo { - entries + let map: std::collections::HashMap = entries .into_iter() .map(|(name, versions)| { ( @@ -369,7 +369,8 @@ mod tests { .collect(), ) }) - .collect() + .collect(); + PackageRepo::from_map(map) } fn ctx_with(repo: PackageRepo, requests: &[&str]) -> Rc { diff --git a/crates/rer-resolver/src/rez_solver/solver.rs b/crates/rer-resolver/src/rez_solver/solver.rs index c7f4a9b..50deba4 100644 --- a/crates/rer-resolver/src/rez_solver/solver.rs +++ b/crates/rer-resolver/src/rez_solver/solver.rs @@ -37,11 +37,7 @@ impl Solver { package_requests: Vec, repo: Rc, ) -> Result { - Self::new_with_cache( - package_requests, - repo, - super::context::make_shared_cache(), - ) + Self::new_with_cache(package_requests, repo, super::context::make_shared_cache()) } /// Create a solver sharing the given variant cache. @@ -57,12 +53,7 @@ impl Solver { repo: Rc, cache: SharedVariantCache, ) -> Result { - Self::new_with_options( - package_requests, - repo, - cache, - VariantSelectMode::default(), - ) + Self::new_with_options(package_requests, repo, cache, VariantSelectMode::default()) } /// Create a solver with full control over both the shared cache and the @@ -76,12 +67,13 @@ impl Solver { ) -> Result { let request_list = RequirementList::new(package_requests); - let build_ctx = |repo: Rc, request_list: RequirementList| -> Rc { - Rc::new( - SolverContext::new_with_cache(repo, request_list, Rc::clone(&cache)) - .with_variant_select_mode(variant_select_mode), - ) - }; + let build_ctx = + |repo: Rc, request_list: RequirementList| -> Rc { + Rc::new( + SolverContext::new_with_cache(repo, request_list, Rc::clone(&cache)) + .with_variant_select_mode(variant_select_mode), + ) + }; // A conflicting request fails immediately, with no scopes. if let Some((req1, req2)) = request_list.conflict() { @@ -205,9 +197,7 @@ impl Solver { /// solve did not succeed; otherwise an iterator that yields each resolved /// variant (cheap `Rc` refcount bumps) without allocating an intermediate /// `Vec`. - pub fn resolved_packages_iter( - &self, - ) -> Option> + '_> { + pub fn resolved_packages_iter(&self) -> Option> + '_> { if self.status() != SolverStatus::Solved { return None; } @@ -273,7 +263,7 @@ mod tests { } fn repo(entries: Vec<(&str, Vec<(&str, PackageData)>)>) -> PackageRepo { - entries + let map: std::collections::HashMap = entries .into_iter() .map(|(name, versions)| { ( @@ -284,7 +274,8 @@ mod tests { .collect(), ) }) - .collect() + .collect(); + PackageRepo::from_map(map) } fn solve(repo: PackageRepo, requests: &[&str]) -> Solver { @@ -305,6 +296,95 @@ mod tests { out } + #[test] + fn test_loader_called_only_for_needed_families() { + // Two families on disk: "app" depends on "lib". A bystander + // family "unrelated" exists but the solver never touches it, + // so the loader must never be called for it. + use std::cell::RefCell; + + let calls: Rc>> = Rc::new(RefCell::new(Vec::new())); + let calls_inner = Rc::clone(&calls); + + let repo = crate::rez_solver::PackageRepo::with_loader(Box::new(move |name: &str| { + calls_inner.borrow_mut().push(name.to_string()); + match name { + "app" => vec![("1.0".to_string(), pkg(&["lib-2"], &[]))], + "lib" => vec![ + ("1.0".to_string(), pkg(&[], &[])), + ("2.0".to_string(), pkg(&[], &[])), + ], + "unrelated" => vec![("1.0".to_string(), pkg(&[], &[]))], + _ => Vec::new(), + } + })); + + let reqs = vec![Requirement::parse("app")]; + let mut solver = Solver::new(reqs, Rc::new(repo)).expect("solver construction"); + solver.solve(); + assert_eq!(solver.status(), SolverStatus::Solved); + assert_eq!( + resolved_set(&solver), + vec![("app".into(), "1.0".into()), ("lib".into(), "2.0".into())] + ); + + let calls = calls.borrow(); + // app and lib were touched; unrelated was not. + assert!(calls.contains(&"app".to_string())); + assert!(calls.contains(&"lib".to_string())); + assert!(!calls.contains(&"unrelated".to_string())); + } + + #[test] + fn test_loader_called_once_per_family() { + use std::cell::RefCell; + + let calls: Rc>> = Rc::new(RefCell::new(Vec::new())); + let calls_inner = Rc::clone(&calls); + + // A diamond: app -> lib & util; util -> lib. lib is reached twice + // but the loader must only be invoked once. + let repo = crate::rez_solver::PackageRepo::with_loader(Box::new(move |name: &str| { + calls_inner.borrow_mut().push(name.to_string()); + match name { + "app" => vec![("1.0".into(), pkg(&["lib", "util"], &[]))], + "util" => vec![("1.0".into(), pkg(&["lib"], &[]))], + "lib" => vec![("1.0".into(), pkg(&[], &[]))], + _ => Vec::new(), + } + })); + + let reqs = vec![Requirement::parse("app")]; + let mut solver = Solver::new(reqs, Rc::new(repo)).expect("solver construction"); + solver.solve(); + assert_eq!(solver.status(), SolverStatus::Solved); + + let calls = calls.borrow(); + let lib_calls = calls.iter().filter(|n| *n == "lib").count(); + assert_eq!( + lib_calls, 1, + "loader should be called at most once per family" + ); + } + + #[test] + fn test_loader_empty_means_missing_family() { + // The loader returns no entries for an unknown name; the solver + // treats that as a missing family (failed resolve), not a panic. + let repo = crate::rez_solver::PackageRepo::with_loader(Box::new(|_| Vec::new())); + let reqs = vec![Requirement::parse("doesnotexist")]; + let solver = Solver::new(reqs, Rc::new(repo)); + // Either Solver::new returns a ScopeError or the solve fails; + // both are valid encodings of "no such top-level family". + match solver { + Err(_) => {} // expected + Ok(mut solver) => { + solver.solve(); + assert_ne!(solver.status(), SolverStatus::Solved); + } + } + } + #[test] fn test_trivial_single_package() { let solver = solve(repo(vec![("foo", vec![("1.0", pkg(&[], &[]))])]), &["foo"]); @@ -492,8 +572,7 @@ mod tests { assert_eq!(owned, borrowed); // The borrowing form yields `&Requirement` — confirm callers can // collect without forcing an owned copy of the requirements. - let by_ref: Vec<&Requirement> = - solver.resolved_ephemerals_iter().unwrap().collect(); + let by_ref: Vec<&Requirement> = solver.resolved_ephemerals_iter().unwrap().collect(); assert_eq!(by_ref.len(), 1); assert_eq!(by_ref[0].to_string(), ".feature-2+<3"); } diff --git a/crates/rer-resolver/src/rez_solver/variant.rs b/crates/rer-resolver/src/rez_solver/variant.rs index c5009d1..27110e0 100644 --- a/crates/rer-resolver/src/rez_solver/variant.rs +++ b/crates/rer-resolver/src/rez_solver/variant.rs @@ -7,7 +7,7 @@ //! requirements (`reduce_by`), and by peeling off the search space (`split`). //! `extract` pulls out a dependency common to every remaining variant. -use super::context::{PackageRepo, SolverContext}; +use super::context::{FamilyMap, SolverContext}; use super::requirement::{Requirement, RequirementList}; use super::Name; use rer_version::{RerVersion, VersionRange}; @@ -357,7 +357,11 @@ struct LazyEntry { #[derive(Debug)] pub struct PackageVariantList { package_name: Name, - repo: Rc, + /// The family's `version_str -> PackageData` map. Held as `Rc` so the + /// repo and the variant list share storage cheaply; this also lets the + /// repo discard the map after handing it out (relevant for the + /// loader-backed case). + versions: Rc, /// One entry per version, version-sorted ascending for deterministic /// iteration; `_PackageVariantSlice::sort_versions` re-sorts when ordering /// actually matters. @@ -369,7 +373,7 @@ impl PackageVariantList { /// absent from the repository. Only version strings are parsed here — the /// requirement strings are parsed on demand by [`Self::get_intersection`]. pub fn new(ctx: &SolverContext, package_name: &str) -> Option { - let versions = ctx.repo.get(package_name)?; + let versions = ctx.repo.get_family(package_name)?; let mut entries: Vec = versions .keys() .map(|version_str| { @@ -386,7 +390,7 @@ impl PackageVariantList { Some(PackageVariantList { package_name: Name::from(package_name), - repo: Rc::clone(&ctx.repo), + versions, entries, }) } @@ -403,7 +407,7 @@ impl PackageVariantList { } let mut slot = lazy.entry.borrow_mut(); let built = slot.get_or_insert_with(|| { - let data = &self.repo[self.package_name.as_ref()][&lazy.version_str]; + let data = &self.versions[&lazy.version_str]; Rc::new(PackageEntry { version: lazy.version.clone(), variants: build_variants(&self.package_name, &lazy.version, data), @@ -514,11 +518,7 @@ pub struct PackageVariantSlice { impl PackageVariantSlice { /// Build a slice over the given entries. - pub fn new( - ctx: Rc, - package_name: Name, - entries: Vec>, - ) -> Self { + pub fn new(ctx: Rc, package_name: Name, entries: Vec>) -> Self { PackageVariantSlice { ctx, package_name, @@ -918,7 +918,7 @@ impl PackageVariantCache { #[cfg(test)] mod tests { - use super::super::context::PackageRepo; + use super::super::context::{FamilyMap, PackageRepo}; use super::*; use crate::PackageData; @@ -933,7 +933,7 @@ mod tests { } fn repo(entries: Vec<(&str, Vec<(&str, PackageData)>)>) -> PackageRepo { - entries + let map: std::collections::HashMap = entries .into_iter() .map(|(name, versions)| { ( @@ -944,7 +944,8 @@ mod tests { .collect(), ) }) - .collect() + .collect(); + PackageRepo::from_map(map) } fn ctx_with(repo: PackageRepo, requests: &[&str]) -> Rc { diff --git a/crates/rer-resolver/tests/test_rez_benchmark.rs b/crates/rer-resolver/tests/test_rez_benchmark.rs index 6470ee5..7360745 100644 --- a/crates/rer-resolver/tests/test_rez_benchmark.rs +++ b/crates/rer-resolver/tests/test_rez_benchmark.rs @@ -27,7 +27,7 @@ //! run explicitly by the `benchmark` CI workflow (`cargo test --release ... //! -- --ignored`), and can be chunked locally with `BENCH_RANGE=start:end`. -use rer_resolver::rez_solver::{Requirement, Solver, SolverStatus}; +use rer_resolver::rez_solver::{PackageRepo, Requirement, Solver, SolverStatus}; use rer_resolver::PackageData; use serde::Deserialize; use std::collections::HashMap; @@ -36,7 +36,9 @@ use std::path::PathBuf; use std::rc::Rc; /// `package name → version → PackageData`, as produced by the prep script. -type PackageRepo = HashMap>; +/// Distinct from the resolver crate's [`PackageRepo`] (which is now a struct +/// with optional loader support — issue #86); this is just the JSON shape. +type RepoMap = HashMap>; /// One trimmed entry from rez's recorded `resolves.json`. #[derive(Deserialize)] @@ -114,7 +116,10 @@ fn normalize_rez(entries: &[String]) -> Vec<(String, String, Option)> { /// success, or `None` if the solve failed (including a construction error — /// rez would error, but the benchmark records no error cases, so we treat /// it as a failed solve). -fn solve(request: &[String], repo: Rc) -> Option)>> { +fn solve( + request: &[String], + repo: Rc, +) -> Option)>> { let reqs: Vec = request.iter().map(|s| Requirement::parse(s)).collect(); let mut solver = Solver::new(reqs, repo).ok()?; solver.solve(); @@ -139,7 +144,9 @@ fn solve(request: &[String], repo: Rc) -> Option("benchmark_packages.json").map(Rc::new) else { + let Some(repo) = + load_json::("benchmark_packages.json").map(|m| Rc::new(PackageRepo::from_map(m))) + else { eprintln!( "benchmark fixtures missing — skipping. Generate them with:\n \ git submodule update --init\n \ @@ -152,7 +159,7 @@ fn test_rez_benchmark_correctness() { println!( "loaded {} package families, {} benchmark requests", - repo.len(), + repo.family_count(), cases.len() ); @@ -205,8 +212,10 @@ fn test_rez_benchmark_correctness() { // entries are only in one side. Helps narrow down whether // the divergence is a missing package, a wrong version, // or just a different variant index. - let rer_only: Vec<_> = rer_set.iter().filter(|t| !rez_set.contains(t)).collect(); - let rez_only: Vec<_> = rez_set.iter().filter(|t| !rer_set.contains(t)).collect(); + let rer_only: Vec<_> = + rer_set.iter().filter(|t| !rez_set.contains(t)).collect(); + let rez_only: Vec<_> = + rez_set.iter().filter(|t| !rer_set.contains(t)).collect(); divergent.push(i); divergent_details.push(format!( "case {i}: request={:?}\n rer-only: {:?}\n rez-only: {:?}", diff --git a/tests/test_rich_api.py b/tests/test_rich_api.py index c48e727..3370e8b 100644 --- a/tests/test_rich_api.py +++ b/tests/test_rich_api.py @@ -331,6 +331,126 @@ def test_variant_select_mode_invalid_raises_valueerror(): pyrer.solve(["foo"], [_pkg("foo", "1.0.0")], variant_select_mode="nope") +# --------------------------------------------------------------------------- +# load_family — lazy package discovery (issue #86) +# --------------------------------------------------------------------------- + + +def test_load_family_lazy_only_touched_families(): + """The callback fires only for families the solver actually needs.""" + calls = [] + + def loader(name): + calls.append(name) + if name == "app": + return [_pkg("app", "1.0.0", requires=["lib-2"])] + if name == "lib": + return [_pkg("lib", "1.0.0"), _pkg("lib", "2.0.0")] + if name == "unrelated": + return [_pkg("unrelated", "1.0.0")] + return [] + + result = pyrer.solve(["app"], None, load_family=loader) + assert result.status == "solved", result.failure_description + names = {v.name: v.version for v in result.resolved_packages} + assert names == {"app": "1.0.0", "lib": "2.0.0"} + + assert "app" in calls and "lib" in calls + assert "unrelated" not in calls, "loader should not be called for unrelated families" + + +def test_load_family_called_at_most_once_per_family(): + """Diamond dep: app -> lib & util, util -> lib. lib loaded once only.""" + calls = [] + + def loader(name): + calls.append(name) + if name == "app": + return [_pkg("app", "1.0.0", requires=["lib", "util"])] + if name == "util": + return [_pkg("util", "1.0.0", requires=["lib"])] + if name == "lib": + return [_pkg("lib", "1.0.0")] + return [] + + result = pyrer.solve(["app"], None, load_family=loader) + assert result.status == "solved" + assert calls.count("lib") == 1, f"loader called {calls.count('lib')}x for 'lib'" + + +def test_load_family_empty_means_no_such_family(): + """A loader that returns [] for an unknown name produces a failed resolve, + not a crash.""" + def loader(_name): + return [] + + result = pyrer.solve(["doesnotexist"], None, load_family=loader) + assert result.status == "failed" + assert result.failure_description + + +def test_load_family_works_with_eager_seed(): + """Caller can pre-seed some families; the loader is consulted only + for ones not in the eager list.""" + calls = [] + + def loader(name): + calls.append(name) + if name == "lib": + return [_pkg("lib", "2.0.0")] + return [] + + seed = [_pkg("app", "1.0.0", requires=["lib-2"])] + result = pyrer.solve(["app"], seed, load_family=loader) + assert result.status == "solved" + names = {v.name: v.version for v in result.resolved_packages} + assert names == {"app": "1.0.0", "lib": "2.0.0"} + # 'app' came from the eager seed; loader was only asked for 'lib'. + assert "app" not in calls + assert calls == ["lib"] + + +def test_load_family_callback_exception_surfaces_as_error(): + """If the callback raises, the solve returns status='error' with the + error in the description — never a Python exception out of pyrer.""" + def loader(name): + raise RuntimeError(f"boom for {name}") + + result = pyrer.solve(["whatever"], None, load_family=loader) + assert result.status == "error" + assert "boom for whatever" in (result.failure_description or "") + + +def test_load_family_filters_mismatched_name(): + """A loader that returns entries for the wrong family name has those + entries dropped, not silently mixed in.""" + def loader(name): + if name == "app": + return [ + _pkg("app", "1.0.0"), + _pkg("not-app", "1.0.0"), # bogus — must be dropped + ] + return [] + + result = pyrer.solve(["app"], None, load_family=loader) + assert result.status == "solved" + names = {v.name for v in result.resolved_packages} + assert names == {"app"} + + +def test_load_family_duplicate_versions_reports_error(): + """A loader returning two PackageData for the same (family, version) is + a data bug — surface it rather than silently shadowing.""" + def loader(name): + if name == "app": + return [_pkg("app", "1.0.0"), _pkg("app", "1.0.0")] + return [] + + result = pyrer.solve(["app"], None, load_family=loader) + assert result.status == "error" + assert "duplicate" in (result.failure_description or "").lower() + + def test_from_rez_used_in_solve(): """End-to-end: from_rez → solve produces the same result as constructor.""" From 1384efd6860c941f73f33edcb78a530e2b248cfb Mon Sep 17 00:00:00 2001 From: Philippe Llerena Date: Sat, 16 May 2026 18:39:09 +0200 Subject: [PATCH 2/4] chore(release): bump workspace version to 0.1.0-rc.8 Picks up the lazy package-discovery feature (issue #86): `pyrer.solve` gains an optional `load_family` callback, and `PackageRepo` becomes a struct with cache + optional loader instead of a `HashMap` type alias. Eager-path perf and the strict 188-case rez differential are unchanged. Co-Authored-By: Claude Opus 4.7 --- Cargo.toml | 6 +++--- docs/config.toml | 2 +- docs/content/_index.md | 2 +- docs/content/docs/getting-started/quick-start.md | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/Cargo.toml b/Cargo.toml index 47f39bb..4e3b845 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -3,7 +3,7 @@ members = ["crates/*"] resolver = "2" [workspace.package] -version = "0.1.0-rc.7" +version = "0.1.0-rc.8" authors = [ "Lorenzo Montant ", "Maxim Doucet ", @@ -23,8 +23,8 @@ lazy_static = "1.5.0" rand = "0.8.5" serde = { version = "1.0", features = ["derive"] } serde_json = "1.0" -rer-version = { path = "crates/rer-version", version = "0.1.0-rc.7" } -rer-resolver = { path = "crates/rer-resolver", version = "0.1.0-rc.7" } +rer-version = { path = "crates/rer-version", version = "0.1.0-rc.8" } +rer-resolver = { path = "crates/rer-resolver", version = "0.1.0-rc.8" } pyo3 = { version = "0.23.5", features = ["extension-module"] } # `mimalloc` is wired into the bench binary as a `#[global_allocator]`. # Callgrind shows ~33 % of cycles in libc malloc/free; mimalloc has measurably diff --git a/docs/config.toml b/docs/config.toml index 311af1c..f1cd0b6 100644 --- a/docs/config.toml +++ b/docs/config.toml @@ -125,7 +125,7 @@ weight = 10 name = "GitHub" pre = '' url = "https://github.com/doubleailes/rer" -post = "v0.1.0-rc.7" +post = "v0.1.0-rc.8" weight = 20 # Footer contents diff --git a/docs/content/_index.md b/docs/content/_index.md index 5469c43..f16bd1e 100644 --- a/docs/content/_index.md +++ b/docs/content/_index.md @@ -7,7 +7,7 @@ title = "rer — Rez En Rust" lead = "A faithful Rust port of rez's package solver — callable from Python via PyO3, resolves match rez 1:1." url = "/docs/getting-started/introduction/" url_button = "Get started" -repo_version = "GitHub v0.1.0-rc.7" +repo_version = "GitHub v0.1.0-rc.8" repo_license = "MIT-licensed." repo_url = "https://github.com/doubleailes/rer" diff --git a/docs/content/docs/getting-started/quick-start.md b/docs/content/docs/getting-started/quick-start.md index 6f462b8..13a027d 100644 --- a/docs/content/docs/getting-started/quick-start.md +++ b/docs/content/docs/getting-started/quick-start.md @@ -74,7 +74,7 @@ Add the resolver crate to your `Cargo.toml`: ```toml [dependencies] -rer-resolver = "0.1.0-rc.7" +rer-resolver = "0.1.0-rc.8" ``` Then call the solver against an in-memory repository: From 320b7f6b61275144ed85f8d20ebb8a00db9ff1dd Mon Sep 17 00:00:00 2001 From: Philippe Llerena Date: Sat, 16 May 2026 19:59:55 +0200 Subject: [PATCH 3/4] docs: document load_family callback for lazy package discovery MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds the `load_family` story to the user-facing docs alongside the #86 implementation: - `docs/getting-started/rez-integration.md` — new "Lazy package discovery on cold caches" section covering API, semantics, when the win is real vs flat, a worked Windows+CIFS example, the lazy variant of the monkey-patch shim, and what lazy loading does *not* fix (cross-invocation cost, GIL contention, solve-phase CPU). The pre-existing eager note picks up a forward pointer. - `docs/getting-started/quick-start.md` — short callout with the basic shape and a link to the integration page. - `docs/help/faq.md` — "Where does rer get package data from?" updated to mention both eager and lazy supply. - `CHANGELOG.md` — Unreleased section gets `load_family`, `resolved_ephemerals`, the borrowing-iterator forms, and the `PackageRepo` struct conversion. The Windows+CIFS framing matches the actual canonical motivating case: Samba-served package stores, no Windows-side page cache for SMB, every `rez env` invocation pays full network roundtrips for every reachable family. Lazy loading is the right primitive to close that gap. Co-Authored-By: Claude Opus 4.7 --- CHANGELOG.md | 31 ++++ .../docs/getting-started/quick-start.md | 21 +++ .../docs/getting-started/rez-integration.md | 172 +++++++++++++++++- docs/content/docs/help/faq.md | 13 +- 4 files changed, 226 insertions(+), 11 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 8b0e5ab..2aa6ea4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,37 @@ page. ## [Unreleased] +### Added + +- **`load_family` callback** on `pyrer.solve()` — opt-in lazy package + discovery: pass `load_family: Callable[[str], list[PackageData]]` and the + solver calls it on demand the first time it needs a family it hasn't seen. + Each family is loaded at most once per solve; returning `[]` means "no + such family". Aimed at cold-cache / network-filesystem integrations + (Windows + CIFS in particular) where the up-front BFS of every reachable + family dominates the wall-clock cost of `rez env`. See the + [lazy-discovery section of the rez integration page](https://doubleailes.github.io/rer/docs/getting-started/rez-integration/#lazy-package-discovery-on-cold-caches). + Closes #86. +- **`resolved_ephemerals`** on `pyrer.SolveResult` — list of rez-style + ephemeral requirement strings (e.g. `[".feature-1.5", ".mode-debug"]`) + surfaced from the solver, matching `rez.solver.Solver.resolved_ephemerals`. + Closes #84. +- **Borrowing-iterator forms** on the Rust API: `Solver::resolved_packages_iter` + / `resolved_ephemerals_iter` and `ResolvePhase::iter_solved_variants` / + `iter_solved_ephemerals`. Avoid the intermediate `Vec` (and, for + ephemerals, the per-element `Requirement::clone`) when callers just want + to iterate. + +### Changed + +- **`PackageRepo` is now a struct**, not a `HashMap` type alias. Carries a + cache (`RefCell>`) and an optional `FamilyLoader` closure. + Construct with `PackageRepo::from_map(map)` for the eager case, or + `PackageRepo::with_loader(loader)` for lazy. `From>` is + implemented for back-compat with the old type-alias shape. The eager + path's perf is unchanged in measurement (within run-to-run noise of the + README baseline). + ## [1.0.0] — TBD The first stable release. Public API is now under semver — see the diff --git a/docs/content/docs/getting-started/quick-start.md b/docs/content/docs/getting-started/quick-start.md index 13a027d..c04fdcc 100644 --- a/docs/content/docs/getting-started/quick-start.md +++ b/docs/content/docs/getting-started/quick-start.md @@ -68,6 +68,27 @@ It returns a `SolveResult` with: Failures and bad input are reported via `status`, never as Python exceptions (except `TypeError` if `packages` is the wrong type). +### Lazy package discovery + +For large repositories on slow filesystems (network mounts, CIFS, +NFS without a useful page cache), `pyrer.solve` accepts a +`load_family` callback that is invoked on demand only for families +the solver actually touches: + +```python +def load_family(name): + """Return every PackageData for `name`, or [] if no such family.""" + return [pyrer.PackageData(name, "1.0.0"), ...] + +result = pyrer.solve(["app"], packages=None, load_family=load_family) +``` + +`load_family` is called at most once per family per solve. Returning +`[]` is treated as "no such family". Useful in particular when +wiring `pyrer` into `rez` over a slow share — see the +[rez integration page](../rez-integration/#lazy-package-discovery-on-cold-caches) +for the full story. + ## From Rust Add the resolver crate to your `Cargo.toml`: diff --git a/docs/content/docs/getting-started/rez-integration.md b/docs/content/docs/getting-started/rez-integration.md index 3f7052b..d633c33 100644 --- a/docs/content/docs/getting-started/rez-integration.md +++ b/docs/content/docs/getting-started/rez-integration.md @@ -81,14 +81,164 @@ fixture). Two notes on this step: -- It is **eager** — every package on every path is loaded. `rez` - normally loads lazily; the trade-off is one upfront cost vs many - small ones during the solve. On a real repo, eager loading is - typically a few seconds; on the rez 188-case benchmark it is the - dominant pre-solve cost. -- If you're running many resolves against the same repo (CI, batch - validation, a long-lived daemon), build the list **once** and reuse - it. +- It is **eager** — every package on every path is loaded before the + solve starts. `rez` normally loads lazily; the trade-off is one + upfront cost vs many small ones during the solve. On a real repo + on local disk with a warm page cache, eager loading is typically a + few seconds; on the rez 188-case benchmark it is the dominant + pre-solve cost. +- If you're running many resolves against the same repo in one + process (CI, batch validation, a long-lived daemon), build the + list **once** and reuse it. + +If your repository sits on a slow filesystem (network mount, no +useful page cache), the eager load can easily exceed the solve +itself. The [next section](#lazy-package-discovery-on-cold-caches) +covers a callback-driven alternative that loads families on demand. + +## Lazy package discovery on cold caches + +`pyrer.solve` accepts an optional `load_family` callback that is +invoked the first time the solver needs a family it hasn't already +been given: + +```python +import pyrer + +def load_family(name): + """Return every PackageData for `name`, or [] if no such family.""" + pkgs = [] + for pkg in iter_packages(name, paths=PACKAGE_PATHS): + pkgs.append(pyrer.PackageData.from_rez(pkg)) + return pkgs + +result = pyrer.solve( + ["maya-2024", "nuke-14"], + packages=None, # or a small eager seed + load_family=load_family, +) +``` + +Semantics: + +- The callback is called **at most once per family** in one solve + (results are cached internally), and **only for families the + solver actually exercises**. +- Returning `[]` means "no such family" — treated the same as a + family that was never added. +- The `packages` argument is still accepted; entries supplied that + way are pre-seeded into the cache and the callback is never asked + for those families. Useful for a hybrid where you pre-load + hot families and lazy-load the long tail. +- If the callback raises, the solve returns + `result.status == "error"` with the exception message in + `result.failure_description`. No exception escapes `pyrer.solve`. +- Defensive: entries whose `name` does not match the requested + family are dropped; a duplicate `(family, version)` from the + callback surfaces as `status="error"`. + +### When this actually helps + +The win is in I/O avoided, not in CPU. Specifically: + +| Scenario | Lazy vs eager | +|---|---| +| Local disk, warm page cache, wide healthy resolve | Roughly equal — reachable ≈ touched, the eager cost is small anyway | +| Network filesystem (NFS / CIFS / SMB), studio-scale repo | **Substantial win** — every cold roundtrip avoided is a direct latency saving | +| Early-fail conflict resolves (e.g. `maya-2024 maya-2025`) | **Substantial win** — touches a handful of families instead of the whole reachable closure | +| Selective deep resolves in a large package universe | **Substantial win** — sparse subgraph means most reachable families are never opened | +| Single tool / CI probe inside a 5000-package store | **Substantial win** — same reason as above | + +The shape of the win depends on the gap between the *reachable* +subgraph (eager BFS) and the *exercised* subgraph (what the solver +actually opens). When those diverge, lazy loading is essentially +free latency back. + +### Worked example: Windows + CIFS + +A common case: the rez repository lives on a Samba / CIFS share, +mounted on Windows clients. Windows has no equivalent of Linux's +page cache for SMB content, so every `rez env` invocation pays the +full network roundtrip for every `package.py` it opens — there is +no cross-invocation caching to amortise it. On that combination, the +eager BFS in the basic shim can easily dominate the wall-clock cost +of `rez env`, even though the solve itself runs in tens of +milliseconds. + +`load_family` is the right primitive for this case: the solver only +asks the network for families it genuinely needs to inspect, and +each one is fetched at most once per resolve. + +### Lazy variant of the shim + +The monkey-patch shim becomes slightly simpler with the callback +form — no upfront BFS: + +```python +import pyrer +import rez.solver as _rez_solver +import rez.resolver as _rez_resolver +from rez.packages import iter_packages +from rez.config import config as _rez_config + +_original_resolve = _rez_resolver.Resolver._solve + + +def _pyrer_resolve(self): + if self.package_filter or self.package_orderers: + return _original_resolve(self) + + # Closure over the resolver's package paths — pyrer calls this + # only for families the solver actually needs. + def load_family(name): + return [ + pyrer.PackageData.from_rez(pkg) + for pkg in iter_packages(name, paths=self.package_paths) + ] + + requests = [str(r) for r in self.package_requests] + result = pyrer.solve( + requests, + packages=None, + load_family=load_family, + variant_select_mode=_rez_config.variant_select_mode, + ) + + if result.status != "solved": + return _original_resolve(self) + + self.resolved_packages_ = resolve_to_rez_variants( + result, self.package_paths, + ) + self.status_ = _rez_solver.SolverStatus.solved + return self + + +_rez_resolver.Resolver._solve = _pyrer_resolve +``` + +If the studio's `package_filter` configuration matters, apply it +inside `load_family` before returning the list — the filter then +runs only on families the solver actually exercises, instead of +every reachable family. + +### What lazy loading does *not* fix + +- **Cross-invocation cost.** `load_family` caches inside one solve; + the next `rez env` invocation pays the load cost again for every + family it touches. Closing that gap would need a persistent cache + in the shim itself (keyed e.g. by `package.py` mtime). That sits + outside `pyrer` — but `load_family` is the prerequisite that + makes such a cache implementable as a wrapper around the + callback. +- **GIL contention during the solve.** `pyrer.solve` currently + holds the GIL for the duration of the resolve. In practice this + rarely matters: the callback itself, when it does I/O via rez's + loaders, releases the GIL inside the underlying C call. Other + Python threads block only during the pure-Rust portions, which + are short. +- **Solve-phase CPU.** The solver itself runs the same algorithm + either way. Lazy loading is purely about avoiding pre-solve I/O. ## Solving @@ -171,8 +321,10 @@ intercept the happy path, fall back to the real rez solver on any non-default config (custom orderer, `late` binding requires, `@early` evaluation, etc.). -The minimum viable shim — for studios with default-configured repos — -looks roughly like: +The eager-loading shim below is the simplest form; for cold-cache +repos prefer the lazy variant shown +[earlier](#lazy-variant-of-the-shim), which lets the solver drive +the loading directly: ```python import pyrer diff --git a/docs/content/docs/help/faq.md b/docs/content/docs/help/faq.md index 4540b3e..6abac27 100644 --- a/docs/content/docs/help/faq.md +++ b/docs/content/docs/help/faq.md @@ -62,9 +62,20 @@ phases, and implicit backtracking all behave as rez does. From the caller. rer never reads the filesystem — there is no `package.py` parser in Rust. The host (rez, or a test harness) loads -packages and passes them in as JSON in the `PackageData` schema: +packages and passes them in as `pyrer.PackageData` instances: `name -> version -> {requires, variants}`. +The host can pass them in two ways: + +- **Eager** — a `list[PackageData]` built up front. Simple; best + when the repo is on local disk or already cached in memory. +- **Lazy** — a `load_family(name) -> list[PackageData]` callback + that the solver invokes only for families it actually touches. + Better when the repo lives on a slow filesystem (network mounts, + CIFS, NFS without a useful page cache) or when typical resolves + exercise a small subgraph of a large package store. See the + [rez integration page](../../getting-started/rez-integration/#lazy-package-discovery-on-cold-caches). + ## Is there a CLI? Not yet. There was a placeholder `rer` binary; it was removed because From f192e950ec43c8b24cafda08c0a533050ccf5f9a Mon Sep 17 00:00:00 2001 From: Philippe Llerena Date: Sat, 16 May 2026 21:52:44 +0200 Subject: [PATCH 4/4] chore(resolver): port solver_micro bench to PackageRepo struct MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The bench file built its repo as `HashMap>` and wrapped it in `Rc::new(...)`, which matched the old `PackageRepo` type alias. After the issue #86 conversion to a struct, two construction sites (`slice_for` and `bench_solve`) needed to wrap via `PackageRepo::from_map`. Same shape, no behaviour change — just keeps `cargo bench` compiling. Last run on this machine, criterion comparing against the prior local baseline: every bench neutral or faster, none slower. `reduce_by(fast-path)` -5.97% (p=0.00), `Solver/triple-with-pin` -3.56% (p=0.01); the rest in the noise band on the improvement side. Consistent with the `RefCell` indirection in `PackageRepo` being cold-pathed (one borrow per family at variant-list construction, never in the hot loop). Co-Authored-By: Claude Opus 4.7 --- crates/rer-resolver/benches/solver_micro.rs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/crates/rer-resolver/benches/solver_micro.rs b/crates/rer-resolver/benches/solver_micro.rs index c9fd03f..0413a7c 100644 --- a/crates/rer-resolver/benches/solver_micro.rs +++ b/crates/rer-resolver/benches/solver_micro.rs @@ -141,7 +141,7 @@ fn fam( /// through the same cache code path the solver uses. Used by the slice-level /// benches below. fn slice_for(family: &str, range: &VersionRange) -> PackageVariantSlice { - let repo = Rc::new(build_repo()); + let repo = Rc::new(rer_resolver::rez_solver::PackageRepo::from_map(build_repo())); let ctx = Rc::new(SolverContext::new(repo, RequirementList::new(vec![]))); ctx.get_variant_slice(family, range) .expect("slice for the requested family/range") @@ -300,7 +300,7 @@ fn bench_slice(c: &mut Criterion) { fn bench_solve(c: &mut Criterion) { let mut group = c.benchmark_group("Solver"); - let repo = Rc::new(build_repo()); + let repo = Rc::new(rer_resolver::rez_solver::PackageRepo::from_map(build_repo())); let cache = make_shared_cache(); let cases: &[(&str, &[&str])] = &[