Parent sprint: #87
Depends on: #88
Recommended order: 2
Codex-ready: yes
Goal
Refactor collection indexing so SynBioHub fetching and SBOL document indexing are separate steps.
Background
Today _index_collections(collections) does too much:
- assumes every collection is a SynBioHub URI;
- pulls each URI into
self.sbol_doc;
- scans implementations;
- classifies strains, plasmids, backbones, enzymes, and ligases;
- mutates
indexed_plasmids, indexed_backbones, restriction_enzyme_implementations, and ligase_implementations;
- does additional strain and component-definition scans.
This makes it hard to debug whether a failure is due to a fetch problem, local SBOL structure, role mismatch, implementation linking, or indexing logic.
Scope
Extract the existing behavior into smaller helpers. Suggested shape:
def pull_collection_uris(self, uris: list[str]) -> sbol2.Document: ...
def index_sbol_document(self, doc: sbol2.Document, source: str = "local") -> InventoryAudit | dict: ...
def _index_implementation(self, implementation: sbol2.Implementation, doc: sbol2.Document, source: str): ...
def _index_strain_module(self, strain: sbol2.ModuleDefinition, implementation: sbol2.Implementation | None, doc: sbol2.Document): ...
def _index_plasmid_or_backbone_definition(self, definition: sbol2.ComponentDefinition, implementation: sbol2.Implementation | None, doc: sbol2.Document): ...
def _index_reagent_implementation(self, implementation: sbol2.Implementation, built_object): ...
The exact names may differ. The important requirement is separation of concerns.
Non-goals
- Do not change biological classification semantics unless tests expose a bug.
- Do not implement the full future
inventory/collection_indexer.py architecture unless it is the smallest clean path.
- Do not remove existing constructor behavior yet.
Acceptance criteria
Verification
Run:
pytest -k "index or collection or synbiohub"
ruff check .
Codex implementation notes
- Keep PR size moderate; extract functions without rewriting every caller.
- Preserve current list-based state until a later sprint introduces a richer inventory object.
- Avoid broad renames unless they simplify tests and reduce duplicated code.
Parent sprint: #87
Depends on: #88
Recommended order: 2
Codex-ready: yes
Goal
Refactor collection indexing so SynBioHub fetching and SBOL document indexing are separate steps.
Background
Today
_index_collections(collections)does too much:self.sbol_doc;indexed_plasmids,indexed_backbones,restriction_enzyme_implementations, andligase_implementations;This makes it hard to debug whether a failure is due to a fetch problem, local SBOL structure, role mismatch, implementation linking, or indexing logic.
Scope
Extract the existing behavior into smaller helpers. Suggested shape:
The exact names may differ. The important requirement is separation of concerns.
Non-goals
inventory/collection_indexer.pyarchitecture unless it is the smallest clean path.Acceptance criteria
Verification
Run:
Codex implementation notes