Skip to content

[SPRINT-02-02] Split collection pulling from SBOL document indexing #89

@Gonza10V

Description

@Gonza10V

Parent sprint: #87
Depends on: #88
Recommended order: 2
Codex-ready: yes

Goal

Refactor collection indexing so SynBioHub fetching and SBOL document indexing are separate steps.

Background

Today _index_collections(collections) does too much:

  1. assumes every collection is a SynBioHub URI;
  2. pulls each URI into self.sbol_doc;
  3. scans implementations;
  4. classifies strains, plasmids, backbones, enzymes, and ligases;
  5. mutates indexed_plasmids, indexed_backbones, restriction_enzyme_implementations, and ligase_implementations;
  6. does additional strain and component-definition scans.

This makes it hard to debug whether a failure is due to a fetch problem, local SBOL structure, role mismatch, implementation linking, or indexing logic.

Scope

Extract the existing behavior into smaller helpers. Suggested shape:

def pull_collection_uris(self, uris: list[str]) -> sbol2.Document: ...
def index_sbol_document(self, doc: sbol2.Document, source: str = "local") -> InventoryAudit | dict: ...
def _index_implementation(self, implementation: sbol2.Implementation, doc: sbol2.Document, source: str): ...
def _index_strain_module(self, strain: sbol2.ModuleDefinition, implementation: sbol2.Implementation | None, doc: sbol2.Document): ...
def _index_plasmid_or_backbone_definition(self, definition: sbol2.ComponentDefinition, implementation: sbol2.Implementation | None, doc: sbol2.Document): ...
def _index_reagent_implementation(self, implementation: sbol2.Implementation, built_object): ...

The exact names may differ. The important requirement is separation of concerns.

Non-goals

  • Do not change biological classification semantics unless tests expose a bug.
  • Do not implement the full future inventory/collection_indexer.py architecture unless it is the smallest clean path.
  • Do not remove existing constructor behavior yet.

Acceptance criteria

  • Existing SynBioHub URI constructor path still works.
  • Local document indexing can call the same SBOL document indexing helper.
  • Fetch failures are distinguishable from indexing failures.
  • Indexing helpers are individually testable or at least covered through targeted integration tests.
  • No hidden PUDU/Opentrons dependency is introduced.

Verification

Run:

pytest -k "index or collection or synbiohub"
ruff check .

Codex implementation notes

  • Keep PR size moderate; extract functions without rewriting every caller.
  • Preserve current list-based state until a later sprint introduces a richer inventory object.
  • Avoid broad renames unless they simplify tests and reduce duplicated code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions