feat(api): Phase 7b foundation — init, datetime-TZ, /readiness, status cache, proxy widening#75
Conversation
Bite-sized TDD plan derived from the merged Phase 7b spec, covering all 9 clusters in dependency order: datetime-TZ → printer identity → lifespan init-order → alembic verify → /readiness → status cache → frontend proxy → README → verification. Refs #22
Go frontend oapi-codegen rejects naive datetimes. Helper normalises any datetime to a timezone-aware ISO string before serialisation. Refs #22
Go oapi-codegen client rejected naive datetimes from /api/templates with `parsing time "..." cannot parse "" as "Z07:00"`. Apply the new serialize_datetime_utc helper via @field_serializer. Refs #22
Centralises the API integration test fixture so Phase 7b Task B3 (PrinterRead and JobRead) can reuse it without duplication or cross-file imports. Refs #22
Same Go-oapi-codegen contract fix as TemplateRead. JobRead.started_at and finished_at each get their own serializer that handles the nullable case. conftest.py re-discovers IntegrationRegistry after lifespan shutdown so the api_client_with_seed fixture works for all tests in sequence, not just the first one. Refs #22
Every model column (templates/printers/jobs/presets/printer_state/ printer_status_cache) now uses DateTime(timezone=True) with default_factory=lambda: datetime.now(UTC). Fresh inserts write tz-aware values that survive the SQLite roundtrip. Existing rows are migrated by the Phase 7b alembic data migration in Task B5. Refs #22
Existing rows from Phase 5 inserts contain naive datetimes that break the Go frontend's RFC3339 parser. Migration appends '+00:00' to any value without an explicit TZ marker across templates/printers/jobs/ presets/printer_state/printer_status_cache. Idempotent via WHERE NOT LIKE '%+%' AND NOT LIKE '%Z'. SQLite is dynamically typed so no ALTER TABLE is needed — the new column types from the previous commit only affect new inserts via the SQLAlchemy layer. Refs #22
…store caplog Alembic's command.upgrade() calls logging.config.fileConfig() which, by default, uses disable_existing_loggers=True. This marks every logger not explicitly named in alembic.ini — including app.integrations — as logger.disabled=True. Any _logger.error()/_logger.exception() call on a disabled logger silently drops the record, breaking caplog assertions in test_discovery.py tests that ran after the migration tests. The fix mirrors the guard already present in app/db/lifespan.py: set cfg.attributes["configure_logger"] = False so alembic skips its logging reconfiguration entirely. The four previously failing caplog assertions now pass in all orderings. Refs #22
Lifespan can now compute a stable printer.id from env config (model, host, port) so the runtime printer and the DB row share the same id across restarts. Phase 7b Cluster 1b prep work. Refs #22
Creates or refreshes one DB Printer row from env config, keyed by the deterministic UUIDv5 from derive_printer_id(model, host, port). Returns None for the mock backend so the lifespan can no-op when no printer is configured. Idempotent across restarts. Refs #22
Lifespan can now hand the DB-deterministic UUID (from upsert_runtime_printer) to the in-memory queue printer so app.state.printer_id matches the DB row. Backwards compatible — omitting the parameter falls back to uuid4(). _PrinterLike.id and Job.printer_id promoted from str to UUID throughout the in-memory queue stack (print_queue, job_lifecycle, print_service) to maintain type consistency end-to-end. Refs #22
…nt no-op Catches the Phase 7a bug pattern where lifespan called seed_templates before TemplateLoader.load_dir() — cache empty, 0 rows upserted, no error, UI shows no templates. The defensive RuntimeError surfaces the misordering at startup so it cannot reach production silently. Refs #22
… printer Calls plugin discovery and TemplateLoader.load_dir() before seed_templates(), and adds upsert_runtime_printer(s, settings) between seed_templates and ensure_printer_state. Hands the resulting DB UUID to driver.make_queue_printer so app.state.printer_id matches the DB row. Closes the Phase 7a bug where a fresh deploy showed 0 templates and 0 printers in the UI. Removes the now-unnecessary D1 monkey-patches in test fixtures. Refs #22
Lifespan calls verify_alembic_at_head(settings) right after run_migrations(). If the DB revision deviates from the script head (e.g. partial migration, downgrade, missing script file) the lifespan raises with a clear message before any ORM query runs. Takes settings explicitly (C2/D2 testability pattern) so unit tests can verify against ad-hoc DBs without monkey-patching get_settings(). Sync alembic work runs inside asyncio.to_thread to keep the event loop unblocked. configure_logger=False prevents alembic from clobbering pytest caplog handlers (Phase 7b B6 learning). Fixtures in test_lifespan.py and tests/integration/conftest.py extended to patch verify_alembic_at_head to a no-op alongside run_migrations, because create_all() does not populate alembic_version. Refs #22
Frozen Pydantic models for the new /readiness deep-check endpoint introduced by Phase 7b Cluster 1e. Refs #22
…runtime First four checks for the /readiness deep-check endpoint plus the ready/degraded/not-ready aggregation. Endpoint wiring lands in F4; remaining 4 checks (printer_db_sync, snmp_discovery, print_queue, sse_bus) land in F3. Refs #22
printer_db_sync, snmp_discovery (<90s ok / <600s stale / else fail), print_queue worker liveness, sse_bus subscriber capacity. Completes Cluster 1e aggregator. F4 wires the FastAPI route. Refs #22
Returns HTTP 200 with body.status in {ready, degraded} when the
critical checks pass; 503 with status=not-ready when database/
alembic/template_seed fail. Pangolin can switch its healthcheck.path
to /readiness — Docker keeps polling /healthz for liveness-only.
Refs #22
…roken Locks in the Cluster 1e contract: liveness probe is restart-relevant (must NOT touch the DB), readiness probe owns the deep checks. Prevents accidental DB queries sneaking back into /healthz. Refs #22
Every probe success writes parsed JSON + captured_at; SNMP timeouts persist online=False + last_error while preserving the prior parsed snapshot. No schema change — uses Phase 5 columns. Refs #22
Adds captured_at, last_probe_age_s, last_error, note to the response
of /api/printers/{id}/status so the UI can render staleness and offline
reasons instead of guessing.
Refs #22
Eliminates the 5-second block when the printer is offline. The probe worker keeps printer_status_cache fresh in the background; this endpoint returns whatever is there in <10ms. Refs #22
Swagger UI and the raw OpenAPI document are now reachable behind the public domain (which sits behind Pangolin SSO + the Basic-Auth bypass). Closes the 404 reported in the hhdocker02 production smoke test. Refs #22
Explains the liveness/readiness split introduced in Phase 7b Cluster 1e and links to the spec for the full check list. Recommends using /readiness for reverse-proxy routing checks while keeping /healthz on Docker container healthchecks. Refs #22
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements the Phase 7b foundation design, focusing on improving system reliability, observability, and data consistency. Key changes include robust datetime handling, stable printer identity management, a new deep-check readiness probe, and optimized status retrieval via caching. These improvements address production deployment gaps and ensure better integration between the backend and the Go-based frontend. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
The real EventBus exposes distinct_subscriber_count() (no zero-arg subscriber_count) and reads its cap from settings.sse_max_subscribers (no max_subscribers attribute). Probe both surfaces so production and unit-test fakes both report correct subscriber counts and caps. Refs #22
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #75 +/- ##
==========================================
+ Coverage 91.85% 91.93% +0.07%
==========================================
Files 66 70 +4
Lines 2788 3038 +250
Branches 234 259 +25
==========================================
+ Hits 2561 2793 +232
- Misses 167 181 +14
- Partials 60 64 +4
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Phase 7b foundation work hardening the backend lifespan, datetime serialization, health surface, printer identity, and frontend proxy after the first production deploy uncovered latent bugs. The change is large (9 clusters, 24 commits) and touches DB schema/migrations, lifespan ordering, every datetime-bearing schema, the /api/printers/{id}/status contract, and the frontend router.
Changes:
- Adds UTC-aware
DateTime(timezone=True)columns + aserialize_datetime_utcPydantic helper so all API datetimes emit RFC3339 withZ, plus an idempotent Alembic data migration. - Introduces deterministic UUIDv5 printer identity, lifespan auto-upsert, post-migration
verify_alembic_at_head, and a new/readinessdeep-check endpoint (critical → 503, non-critical → 200/degraded). - Switches
GET /api/printers/{id}/statusto cache-only reads written byStatusProbeProducer, and widens the Go frontend proxy to forward/docs,/openapi.json,/redoc.
Reviewed changes
Copilot reviewed 56 out of 59 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| README.md | Documents /healthz vs /readiness contract and links the spec. |
| frontend/cmd/server/main.go | Adds r.Handle routes for /docs, /openapi.json, /redoc so chi preserves the full path when proxying. |
| frontend/cmd/server/main_test.go | Adds parallel subtests verifying the three new doc-route proxies forward to the backend. |
| docs/superpowers/plans/2026-05-17-phase-7b-foundation.md | New 2580-line task-granular implementation plan for the phase. |
| backend/app/api/routes/printers.py | Rewrites get_printer_status to read from printer_status_cache only; leaves tape_loaded/error_state unpopulated despite still being in the schema. |
| backend/app/schemas/printer.py | Extends PrinterStatus with captured_at, last_probe_age_s, last_error, note. |
| backend/app/services/producers/status_probe_producer.py | Persists parsed probe + offline state into the status cache. |
| backend/app/services/readiness.py | 8-check aggregator with critical/non-critical classification. |
| backend/app/db/lifespan.py | Adds verify_alembic_at_head, upsert_runtime_printer, defensive guard in seed_templates. |
| backend/app/main.py | Reorders lifespan (load_dir before seed_templates), wires derived printer UUID, mounts /readiness. |
There was a problem hiding this comment.
Code Review
This pull request implements the Phase 7b foundation, focusing on improving datetime handling, printer identity stability, and system observability. Key changes include migrating naive datetimes to UTC-aware ISO strings, implementing a deterministic UUIDv5 printer identity, re-ordering the lifespan to ensure correct initialization, and adding a new /readiness endpoint for deep health checks. Additionally, the printer status endpoint was refactored to read from a cache rather than performing synchronous SNMP probes, and the frontend proxy was updated to expose documentation routes. I have identified an issue where the PrinterStatus response model fields are not being correctly populated from the cache, which needs to be addressed to ensure consistency with the schema.
Bot reviews (Copilot + Gemini, identical HIGH-priority finding on PR #75) flagged that the G3 endpoint rewrite stopped populating the schema's tape_loaded and error_state fields — they were always null. Map the cache JSON: loaded_tape_mm=12 → tape_loaded="12mm", error_flags=[...] → error_state="flag1, flag2". Existing test test_status_endpoint_returns_cached_tape_data extended to lock the contract. Also sanitises two private hostname references in the plan file that tripped the Privacy / secret scan workflow. Refs #22
Summary
Implements the merged Phase 7b spec (
docs/superpowers/specs/2026-05-17-phase-7b-foundation-design.md, PR #74) across nine clusters. Closes the foundation gaps surfaced by the first hhdocker02 production deploy on labels.strausmann.cloud.24 commits, all on green tests, all bodies end with
Refs #22.Cluster-by-cluster
Cluster 1c — Datetime-TZ (B1-B5):
serialize_datetime_utcPydantic helper, TemplateRead/PrinterRead/JobRead emit RFC3339 withZsuffix, every SQLAlchemy model column upgraded toDateTime(timezone=True)with UTCdefault_factory, idempotent Alembic data migration normalises legacy naive rows. Closes the Go oapi-codegen "cannot parse "" as Z07:00" failures.Cluster 1b — Printer identity (C1-C3): Deterministic UUIDv5 from
(model, host, port)(derive_printer_id), lifespan helperupsert_runtime_printermaterialises one DB row, drivermake_queue_printer(...)acceptsprinter_idsoapp.state.printer_idmatches the DB row across restarts. Cascading type promotionstr → UUIDthrough the in-memory queue.Cluster 1a — Lifespan init-order (D1-D2): Defensive
RuntimeErrorinseed_templatesifTemplateLoader._cacheis empty (catches the Phase 7a regression at startup); lifespan re-ordered soload_dir()runs BEFOREseed_templates().upsert_runtime_printerand the C3-derivedprinter_idare wired in.Cluster 1d — Alembic verify (E1):
verify_alembic_at_head(settings)fails fast on revision drift, called immediately afterrun_migrations().Cluster 1e — /readiness deep check (F1-F5): New endpoint with 8 checks:
database,alembic,template_seed,printer_runtime,printer_db_sync,snmp_discovery(90s/600s freshness thresholds),print_queue,sse_bus. Critical failure (database/alembic/template_seed) → HTTP 503 +status=not-ready; non-critical → HTTP 200 +status=degraded; all ok → HTTP 200 +status=ready./healthzstays minimal — regression test locks the contract.Cluster 1f — Status cache (G1-G3):
StatusProbeProducer._upsert_cacheand_mark_offlinepersistparsed JSON+captured_atintoprinter_status_cache; offline state preserves prior parsed snapshot.PrinterStatusschema gainscaptured_at,last_probe_age_s,last_error,note.GET /api/printers/{id}/statusnow reads cache exclusively, no sync SNMP — sub-100ms responses even when the printer is offline.Cluster 3 — Frontend proxy widening (H1):
r.Handle(\"/docs\", prx),/openapi.json,/redocmounted in the chi router (Mount strips prefix; Handle preserves). Closes the 404 from the hhdocker02 smoke test.Cluster 2 — Documentation (I1): README documents the
/healthzvs/readinesscontract and links to the spec.Regression-fix side-quest
Mid-Phase-B the integration tests' caplog assertions broke for unrelated discovery tests. Root cause: Alembic's
logging.config.fileConfig()runs withdisable_existing_loggers=Trueand silently disabledapp.integrations. Fixed by settingcfg.attributes[\"configure_logger\"] = Falsein the test, mirroring the guard already present inapp/db/lifespan.py. Same fix applied in B5's migration test, C2's printer-upsert test, and the new readiness builder tests.Test plan
uv run pytest -q— 676 passed, 3 skipped, 0 faileduv run pytest --cov=app— coverage 92.62% (threshold 80%)uv run ruff check . && uv run ruff format --check . && uv run mypy app— cleango test ./...(frontend) — all packages OKgo vet ./...— clean/healthz,/readiness,/docs,/openapi.json,/api/printers/{id}/statusvia theclaude-automationBasic-Auth bypass; verify UI shows templates + printer with live statusFollow-ups (out-of-scope, separate issues)
api_client_with_seedlacks PT-P750W env — a fixture variant that sets the host would unblock them_PrinterResumeResponse.printer_id: UUID | str(C3) — narrow once all callers pass UUIDStatusProbeProducerUUID guard (G1) — temporary workaround for pre-C3 tests; remove once those tests use real UUIDsapi_client_with_seedfixture has aTODO(#22): simplify after D2 landsmarker (D2 landed in this PR — simplification can happen in a follow-up cleanup)Refs #22