Skip to content

Feature/scripts and api optimizations#215

Draft
SimoneAriens wants to merge 2 commits intomainfrom
feature/scripts-and-api-optimizations
Draft

Feature/scripts and api optimizations#215
SimoneAriens wants to merge 2 commits intomainfrom
feature/scripts-and-api-optimizations

Conversation

@SimoneAriens
Copy link
Copy Markdown
Collaborator

Changes to support running 600k+ comparisons efficiently:

Skip plots: added skip_plots flag to score endpoints and client. Matplotlib plot generation was 96% of per-call API time (~3s). Skipping it drops calls to ~0.1s.
Upfront existence check: replaced 600k individual exists() calls with a single os.walk scan to find already-completed results.
Two-level output folders: {i // 1000:04d}/{i:06d} structure to avoid 600k entries in a single directory.
Vault cleanup: _cleanup_vault in try/finally to prevent /tmp/scratch_api from growing unboundedly (was hitting 83GB+), even on partial download failures.
404 filtering: skip downloading plot URLs that don't exist when skip_plots=True.
Producer/consumer pattern in convert_marks.py: parallel API fetching with sequential disk writes.
Profiling instrumentation: timing logs in score endpoints and calculate_score.
Resilient error handling: all three scripts catch and log per-item failures instead of aborting the batch. calculate_score returns a ScoreStatus enum for structured summary reporting.
Connection reuse: requests.Session for connection pooling across downloads.
Lazy cross-product sampling: _different_source_pairs samples via cumulative indexing instead of materializing the full pool.

@SimoneAriens SimoneAriens requested review from cfs-data and vergep April 3, 2026 13:38
@SimoneAriens SimoneAriens marked this pull request as draft April 3, 2026 13:38
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 3, 2026

Diff Coverage

Diff: origin/main..HEAD, staged and unstaged changes

  • src/processors/router.py (100%)
  • src/processors/schemas.py (100%)

Summary

  • Total: 21 lines
  • Missing: 0 lines
  • Coverage: 100%

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 3, 2026

Code Coverage

Package Line Rate Branch Rate Health
. 96% 92%
computations 94% 67%
container_models 99% 100%
conversion 96% 89%
conversion.export 99% 93%
conversion.filter 97% 89%
conversion.leveling 100% 100%
conversion.leveling.solver 100% 75%
conversion.plots 99% 88%
conversion.preprocess_impression 99% 91%
conversion.preprocess_striation 90% 62%
conversion.profile_correlator 96% 82%
conversion.surface_comparison 99% 89%
conversion.surface_comparison.cell_registration 100% 90%
extractors 97% 75%
mutations 100% 100%
parsers 97% 50%
parsers.patches 89% 60%
preprocessors 100% 100%
processors 100% 75%
renders 99% 50%
utils 71% 100%
Summary 98% (3261 / 3331) 87% (341 / 394)

Minimum allowed line rate is 50%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant