Skip to content

feat(api): Phase 7b foundation — init, datetime-TZ, /readiness, status cache, proxy widening#75

Merged
strausmann merged 26 commits into
mainfrom
feat/phase-7b-foundation
May 17, 2026
Merged

feat(api): Phase 7b foundation — init, datetime-TZ, /readiness, status cache, proxy widening#75
strausmann merged 26 commits into
mainfrom
feat/phase-7b-foundation

Conversation

@strausmann
Copy link
Copy Markdown
Owner

Summary

Implements the merged Phase 7b spec (docs/superpowers/specs/2026-05-17-phase-7b-foundation-design.md, PR #74) across nine clusters. Closes the foundation gaps surfaced by the first hhdocker02 production deploy on labels.strausmann.cloud.

24 commits, all on green tests, all bodies end with Refs #22.

Cluster-by-cluster

Cluster 1c — Datetime-TZ (B1-B5): serialize_datetime_utc Pydantic helper, TemplateRead/PrinterRead/JobRead emit RFC3339 with Z suffix, every SQLAlchemy model column upgraded to DateTime(timezone=True) with UTC default_factory, idempotent Alembic data migration normalises legacy naive rows. Closes the Go oapi-codegen "cannot parse "" as Z07:00" failures.

Cluster 1b — Printer identity (C1-C3): Deterministic UUIDv5 from (model, host, port) (derive_printer_id), lifespan helper upsert_runtime_printer materialises one DB row, driver make_queue_printer(...) accepts printer_id so app.state.printer_id matches the DB row across restarts. Cascading type promotion str → UUID through the in-memory queue.

Cluster 1a — Lifespan init-order (D1-D2): Defensive RuntimeError in seed_templates if TemplateLoader._cache is empty (catches the Phase 7a regression at startup); lifespan re-ordered so load_dir() runs BEFORE seed_templates(). upsert_runtime_printer and the C3-derived printer_id are wired in.

Cluster 1d — Alembic verify (E1): verify_alembic_at_head(settings) fails fast on revision drift, called immediately after run_migrations().

Cluster 1e — /readiness deep check (F1-F5): New endpoint with 8 checks: database, alembic, template_seed, printer_runtime, printer_db_sync, snmp_discovery (90s/600s freshness thresholds), print_queue, sse_bus. Critical failure (database/alembic/template_seed) → HTTP 503 + status=not-ready; non-critical → HTTP 200 + status=degraded; all ok → HTTP 200 + status=ready. /healthz stays minimal — regression test locks the contract.

Cluster 1f — Status cache (G1-G3): StatusProbeProducer._upsert_cache and _mark_offline persist parsed JSON + captured_at into printer_status_cache; offline state preserves prior parsed snapshot. PrinterStatus schema gains captured_at, last_probe_age_s, last_error, note. GET /api/printers/{id}/status now reads cache exclusively, no sync SNMP — sub-100ms responses even when the printer is offline.

Cluster 3 — Frontend proxy widening (H1): r.Handle(\"/docs\", prx), /openapi.json, /redoc mounted in the chi router (Mount strips prefix; Handle preserves). Closes the 404 from the hhdocker02 smoke test.

Cluster 2 — Documentation (I1): README documents the /healthz vs /readiness contract and links to the spec.

Regression-fix side-quest

Mid-Phase-B the integration tests' caplog assertions broke for unrelated discovery tests. Root cause: Alembic's logging.config.fileConfig() runs with disable_existing_loggers=True and silently disabled app.integrations. Fixed by setting cfg.attributes[\"configure_logger\"] = False in the test, mirroring the guard already present in app/db/lifespan.py. Same fix applied in B5's migration test, C2's printer-upsert test, and the new readiness builder tests.

Test plan

  • uv run pytest -q676 passed, 3 skipped, 0 failed
  • uv run pytest --cov=appcoverage 92.62% (threshold 80%)
  • uv run ruff check . && uv run ruff format --check . && uv run mypy app — clean
  • go test ./... (frontend) — all packages OK
  • go vet ./... — clean
  • Production smoke against labels.strausmann.cloud after merge + deploy — hit /healthz, /readiness, /docs, /openapi.json, /api/printers/{id}/status via the claude-automation Basic-Auth bypass; verify UI shows templates + printer with live status

Follow-ups (out-of-scope, separate issues)

  • 3 B3 tests (PrinterRead/JobRead TZ-suffix) skip when api_client_with_seed lacks PT-P750W env — a fixture variant that sets the host would unblock them
  • _PrinterResumeResponse.printer_id: UUID | str (C3) — narrow once all callers pass UUID
  • StatusProbeProducer UUID guard (G1) — temporary workaround for pre-C3 tests; remove once those tests use real UUIDs
  • B3 api_client_with_seed fixture has a TODO(#22): simplify after D2 lands marker (D2 landed in this PR — simplification can happen in a follow-up cleanup)

Refs #22

strausmann added 24 commits May 17, 2026 14:19
Bite-sized TDD plan derived from the merged Phase 7b spec, covering
all 9 clusters in dependency order: datetime-TZ → printer identity →
lifespan init-order → alembic verify → /readiness → status cache →
frontend proxy → README → verification.

Refs #22
Go frontend oapi-codegen rejects naive datetimes. Helper normalises any
datetime to a timezone-aware ISO string before serialisation.

Refs #22
Go oapi-codegen client rejected naive datetimes from /api/templates
with `parsing time "..." cannot parse "" as "Z07:00"`. Apply the new
serialize_datetime_utc helper via @field_serializer.

Refs #22
Centralises the API integration test fixture so Phase 7b Task B3 (PrinterRead
and JobRead) can reuse it without duplication or cross-file imports.

Refs #22
Same Go-oapi-codegen contract fix as TemplateRead. JobRead.started_at
and finished_at each get their own serializer that handles the nullable
case. conftest.py re-discovers IntegrationRegistry after lifespan
shutdown so the api_client_with_seed fixture works for all tests in
sequence, not just the first one.

Refs #22
Every model column (templates/printers/jobs/presets/printer_state/
printer_status_cache) now uses DateTime(timezone=True) with
default_factory=lambda: datetime.now(UTC). Fresh inserts
write tz-aware values that survive the SQLite roundtrip.

Existing rows are migrated by the Phase 7b alembic data migration
in Task B5.

Refs #22
Existing rows from Phase 5 inserts contain naive datetimes that break
the Go frontend's RFC3339 parser. Migration appends '+00:00' to any
value without an explicit TZ marker across templates/printers/jobs/
presets/printer_state/printer_status_cache. Idempotent via WHERE
NOT LIKE '%+%' AND NOT LIKE '%Z'.

SQLite is dynamically typed so no ALTER TABLE is needed — the new
column types from the previous commit only affect new inserts via
the SQLAlchemy layer.

Refs #22
…store caplog

Alembic's command.upgrade() calls logging.config.fileConfig() which, by
default, uses disable_existing_loggers=True.  This marks every logger not
explicitly named in alembic.ini — including app.integrations — as
logger.disabled=True.  Any _logger.error()/_logger.exception() call on a
disabled logger silently drops the record, breaking caplog assertions in
test_discovery.py tests that ran after the migration tests.

The fix mirrors the guard already present in app/db/lifespan.py:
set cfg.attributes["configure_logger"] = False so alembic skips its
logging reconfiguration entirely.  The four previously failing caplog
assertions now pass in all orderings.

Refs #22
Lifespan can now compute a stable printer.id from env config
(model, host, port) so the runtime printer and the DB row share
the same id across restarts. Phase 7b Cluster 1b prep work.

Refs #22
Creates or refreshes one DB Printer row from env config, keyed by the
deterministic UUIDv5 from derive_printer_id(model, host, port). Returns
None for the mock backend so the lifespan can no-op when no printer is
configured. Idempotent across restarts.

Refs #22
Lifespan can now hand the DB-deterministic UUID (from upsert_runtime_printer)
to the in-memory queue printer so app.state.printer_id matches the DB row.
Backwards compatible — omitting the parameter falls back to uuid4().

_PrinterLike.id and Job.printer_id promoted from str to UUID throughout
the in-memory queue stack (print_queue, job_lifecycle, print_service) to
maintain type consistency end-to-end.

Refs #22
…nt no-op

Catches the Phase 7a bug pattern where lifespan called seed_templates
before TemplateLoader.load_dir() — cache empty, 0 rows upserted, no
error, UI shows no templates. The defensive RuntimeError surfaces the
misordering at startup so it cannot reach production silently.

Refs #22
… printer

Calls plugin discovery and TemplateLoader.load_dir() before
seed_templates(), and adds upsert_runtime_printer(s, settings) between
seed_templates and ensure_printer_state. Hands the resulting DB UUID to
driver.make_queue_printer so app.state.printer_id matches the DB row.

Closes the Phase 7a bug where a fresh deploy showed 0 templates and 0
printers in the UI. Removes the now-unnecessary D1 monkey-patches in
test fixtures.

Refs #22
Lifespan calls verify_alembic_at_head(settings) right after
run_migrations(). If the DB revision deviates from the script head
(e.g. partial migration, downgrade, missing script file) the lifespan
raises with a clear message before any ORM query runs.

Takes settings explicitly (C2/D2 testability pattern) so unit tests
can verify against ad-hoc DBs without monkey-patching get_settings().
Sync alembic work runs inside asyncio.to_thread to keep the event loop
unblocked. configure_logger=False prevents alembic from clobbering
pytest caplog handlers (Phase 7b B6 learning).

Fixtures in test_lifespan.py and tests/integration/conftest.py
extended to patch verify_alembic_at_head to a no-op alongside
run_migrations, because create_all() does not populate alembic_version.

Refs #22
Frozen Pydantic models for the new /readiness deep-check endpoint
introduced by Phase 7b Cluster 1e.

Refs #22
…runtime

First four checks for the /readiness deep-check endpoint plus the
ready/degraded/not-ready aggregation. Endpoint wiring lands in F4;
remaining 4 checks (printer_db_sync, snmp_discovery, print_queue,
sse_bus) land in F3.

Refs #22
printer_db_sync, snmp_discovery (<90s ok / <600s stale / else fail),
print_queue worker liveness, sse_bus subscriber capacity. Completes
Cluster 1e aggregator. F4 wires the FastAPI route.

Refs #22
Returns HTTP 200 with body.status in {ready, degraded} when the
critical checks pass; 503 with status=not-ready when database/
alembic/template_seed fail. Pangolin can switch its healthcheck.path
to /readiness — Docker keeps polling /healthz for liveness-only.

Refs #22
…roken

Locks in the Cluster 1e contract: liveness probe is restart-relevant
(must NOT touch the DB), readiness probe owns the deep checks.
Prevents accidental DB queries sneaking back into /healthz.

Refs #22
Every probe success writes parsed JSON + captured_at; SNMP timeouts
persist online=False + last_error while preserving the prior parsed
snapshot. No schema change — uses Phase 5 columns.

Refs #22
Adds captured_at, last_probe_age_s, last_error, note to the response
of /api/printers/{id}/status so the UI can render staleness and offline
reasons instead of guessing.

Refs #22
Eliminates the 5-second block when the printer is offline. The probe
worker keeps printer_status_cache fresh in the background; this
endpoint returns whatever is there in <10ms.

Refs #22
Swagger UI and the raw OpenAPI document are now reachable behind the
public domain (which sits behind Pangolin SSO + the Basic-Auth bypass).
Closes the 404 reported in the hhdocker02 production smoke test.

Refs #22
Explains the liveness/readiness split introduced in Phase 7b Cluster
1e and links to the spec for the full check list. Recommends using
/readiness for reverse-proxy routing checks while keeping /healthz
on Docker container healthchecks.

Refs #22
Copilot AI review requested due to automatic review settings May 17, 2026 17:11
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements the Phase 7b foundation design, focusing on improving system reliability, observability, and data consistency. Key changes include robust datetime handling, stable printer identity management, a new deep-check readiness probe, and optimized status retrieval via caching. These improvements address production deployment gaps and ensure better integration between the backend and the Go-based frontend.

Highlights

  • Lifespan and Init-Order: Re-ordered the application lifespan to load templates before seeding the database and introduced a defensive check to prevent silent failures. Added a fail-fast check for Alembic migration drift.
  • Printer Identity: Implemented deterministic printer identity using UUIDv5 derived from model, host, and port. The lifespan now automatically upserts a stable printer row, ensuring consistency across restarts.
  • Datetime Normalization: Upgraded all SQLAlchemy models to use timezone-aware UTC datetimes and added a Pydantic field-serializer to ensure RFC3339 compliance with 'Z' suffixes for the frontend.
  • Readiness and Health: Introduced a new /readiness endpoint performing deep checks (database, alembic, template seed, printer status, etc.) and updated the frontend proxy to expose API documentation.
  • Status Cache: Optimized printer status retrieval by reading exclusively from the printer_status_cache, ensuring sub-100ms response times even when printers are offline.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

The real EventBus exposes distinct_subscriber_count() (no zero-arg
subscriber_count) and reads its cap from settings.sse_max_subscribers
(no max_subscribers attribute). Probe both surfaces so production and
unit-test fakes both report correct subscriber counts and caps.

Refs #22
@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

❌ Patch coverage is 94.25982% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.93%. Comparing base (c5a7964) to head (6ced8bb).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
backend/app/services/readiness.py 92.22% 5 Missing and 2 partials ⚠️
backend/app/api/routes/printers.py 69.23% 2 Missing and 2 partials ⚠️
backend/app/main.py 81.81% 4 Missing ⚠️
...nd/app/services/producers/status_probe_producer.py 94.02% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #75      +/-   ##
==========================================
+ Coverage   91.85%   91.93%   +0.07%     
==========================================
  Files          66       70       +4     
  Lines        2788     3038     +250     
  Branches      234      259      +25     
==========================================
+ Hits         2561     2793     +232     
- Misses        167      181      +14     
- Partials       60       64       +4     
Components Coverage Δ
Printer Backends (transport) 87.50% <ø> (ø)
Printer Models (drivers) 91.42% <100.00%> (+0.12%) ⬆️
Services 92.09% <93.85%> (+0.13%) ⬆️
REST API 91.30% <73.33%> (-0.74%) ⬇️
Pydantic Schemas 100.00% <100.00%> (ø)
Integration Plugins 100.00% <ø> (ø)
Files with missing lines Coverage Δ
backend/app/api/routes/print.py 97.43% <100.00%> (+0.03%) ⬆️
backend/app/db/lifespan.py 87.95% <100.00%> (+15.72%) ⬆️
backend/app/models/job.py 100.00% <100.00%> (ø)
backend/app/models/preset.py 100.00% <100.00%> (ø)
backend/app/models/printer.py 100.00% <100.00%> (ø)
backend/app/models/printer_state.py 100.00% <100.00%> (ø)
backend/app/models/printer_status_cache.py 100.00% <100.00%> (ø)
backend/app/models/template.py 100.00% <100.00%> (ø)
backend/app/printer_models/pt.py 96.00% <100.00%> (+0.16%) ⬆️
backend/app/schemas/_datetime.py 100.00% <100.00%> (ø)
... and 12 more
Flag Coverage Δ
backend 91.93% <94.25%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c5a7964...6ced8bb. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Phase 7b foundation work hardening the backend lifespan, datetime serialization, health surface, printer identity, and frontend proxy after the first production deploy uncovered latent bugs. The change is large (9 clusters, 24 commits) and touches DB schema/migrations, lifespan ordering, every datetime-bearing schema, the /api/printers/{id}/status contract, and the frontend router.

Changes:

  • Adds UTC-aware DateTime(timezone=True) columns + a serialize_datetime_utc Pydantic helper so all API datetimes emit RFC3339 with Z, plus an idempotent Alembic data migration.
  • Introduces deterministic UUIDv5 printer identity, lifespan auto-upsert, post-migration verify_alembic_at_head, and a new /readiness deep-check endpoint (critical → 503, non-critical → 200/degraded).
  • Switches GET /api/printers/{id}/status to cache-only reads written by StatusProbeProducer, and widens the Go frontend proxy to forward /docs, /openapi.json, /redoc.

Reviewed changes

Copilot reviewed 56 out of 59 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
README.md Documents /healthz vs /readiness contract and links the spec.
frontend/cmd/server/main.go Adds r.Handle routes for /docs, /openapi.json, /redoc so chi preserves the full path when proxying.
frontend/cmd/server/main_test.go Adds parallel subtests verifying the three new doc-route proxies forward to the backend.
docs/superpowers/plans/2026-05-17-phase-7b-foundation.md New 2580-line task-granular implementation plan for the phase.
backend/app/api/routes/printers.py Rewrites get_printer_status to read from printer_status_cache only; leaves tape_loaded/error_state unpopulated despite still being in the schema.
backend/app/schemas/printer.py Extends PrinterStatus with captured_at, last_probe_age_s, last_error, note.
backend/app/services/producers/status_probe_producer.py Persists parsed probe + offline state into the status cache.
backend/app/services/readiness.py 8-check aggregator with critical/non-critical classification.
backend/app/db/lifespan.py Adds verify_alembic_at_head, upsert_runtime_printer, defensive guard in seed_templates.
backend/app/main.py Reorders lifespan (load_dir before seed_templates), wires derived printer UUID, mounts /readiness.

Comment thread backend/app/api/routes/printers.py
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the Phase 7b foundation, focusing on improving datetime handling, printer identity stability, and system observability. Key changes include migrating naive datetimes to UTC-aware ISO strings, implementing a deterministic UUIDv5 printer identity, re-ordering the lifespan to ensure correct initialization, and adding a new /readiness endpoint for deep health checks. Additionally, the printer status endpoint was refactored to read from a cache rather than performing synchronous SNMP probes, and the frontend proxy was updated to expose documentation routes. I have identified an issue where the PrinterStatus response model fields are not being correctly populated from the cache, which needs to be addressed to ensure consistency with the schema.

Comment thread backend/app/api/routes/printers.py
Bot reviews (Copilot + Gemini, identical HIGH-priority finding on PR
#75) flagged that the G3 endpoint rewrite stopped populating the
schema's tape_loaded and error_state fields — they were always null.

Map the cache JSON: loaded_tape_mm=12 → tape_loaded="12mm",
error_flags=[...] → error_state="flag1, flag2". Existing test
test_status_endpoint_returns_cached_tape_data extended to lock the
contract.

Also sanitises two private hostname references in the plan file that
tripped the Privacy / secret scan workflow.

Refs #22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants