From 90dc3761cfca2cffa8ff8075410319091e83cc76 Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Thu, 7 May 2026 21:52:01 -0400 Subject: [PATCH 01/37] =?UTF-8?q?docs(spec):=20aippatch=20=E2=80=94=20prot?= =?UTF-8?q?o/SQL=20PATCH=20framework=20design?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A small Go library plus a sibling code generator that turn AIP-134 PATCH RPCs into safe, dynamic Postgres UPDATE statements. The proto message and FieldMask carry intent on the wire; an aippatch.yaml plus codegen produce a typed Mapping[T] per resource; a runtime Apply[T] call validates the mask, builds the SQL, executes it, and returns the post-update proto. Built first at drill/thirdparty/aippatch/ with UpdateProfile as the v0 target; designed for clean extraction into a standalone module reusable across Spanda LLC products. Codegen consumes proto FileDescriptorSet plus parsed sql/migrations via pg_query_go plus aippatch.yaml policy; output is committed *.gen.go files reviewable in PRs and guarded by a --check mode in CI. v0 codec set covers scalars, timestamps, and enums — sufficient to retire drill's hand-rolled UpdateProfile handler. JSONB, declarative validators, declarative authz, AIP-193 error mapping, and AIP-154 ETag are roadmap items; each is backwards-compatible with v0 call sites. --- .../specs/2026-05-07-aippatch-design.md | 690 ++++++++++++++++++ 1 file changed, 690 insertions(+) create mode 100644 docs/superpowers/specs/2026-05-07-aippatch-design.md diff --git a/docs/superpowers/specs/2026-05-07-aippatch-design.md b/docs/superpowers/specs/2026-05-07-aippatch-design.md new file mode 100644 index 00000000..3ad62090 --- /dev/null +++ b/docs/superpowers/specs/2026-05-07-aippatch-design.md @@ -0,0 +1,690 @@ +# aippatch — proto/SQL PATCH framework + +**Status:** spec +**Author:** Brian Tiger Chow +**Date:** 2026-05-07 + +## Summary + +`aippatch` is a small Go library plus a sibling code-generator that turn AIP-134 +PATCH RPCs into safe, dynamic Postgres `UPDATE` statements. The proto message +and `FieldMask` carry intent on the wire; an `aippatch.yaml` config plus +codegen produce a typed `Mapping[T]` per resource; a runtime `Apply[T]` call +validates the mask, builds the SQL, executes it, and returns the post-update +proto. Built first inside drill at `thirdparty/aippatch/`; designed to be +lifted into a standalone Go module and re-used across Spanda LLC products. + +## Goals + +- **Common case is one yaml entry.** Adding a new writable field on a resource + is: declare it in `aippatch.yaml`, regenerate, ship. +- **Hard case is possible.** Name divergences, enum codecs, and opt-outs are + expressible as overrides without escape hatches into custom Go. +- **No type casting in user code.** The public API is generic over the proto + message type; callers never see `proto.Message` erasure. +- **Schema drift is a build-time error.** When proto fields rename, columns + rename, or types diverge, the codegen tool fails CI before runtime can. +- **Replicable across projects.** Three artifacts (runtime library, codegen + binary, yaml file) port to any Go service speaking ConnectRPC + pgx. +- **AIP-134 compliant on the wire.** PATCH responses carry the updated + resource. `FieldMask` semantics are honored. + +## Non-goals (v0) + +- Declarative authorization (per-field write gating). v0 keeps all authz in + handler code. +- Declarative validation (`NonEmptyTrimmed`, `LenBetween`, `URL`, …). v0 keeps + per-field validation in handler code. +- AIP-193 error-code mapping (`unique_violation` → `AlreadyExists`, etc.). v0 + returns `Internal` for unmapped pgx errors and `NotFound` for zero-row + updates. +- Optimistic concurrency / ETag (AIP-154). v0 has no version column awareness. +- JSONB, repeated, oneof, message-as-jsonb, proto3 explicit-optional / NULL + semantics. v0 codecs cover scalars, timestamps, and enums only; everything + else fails at codegen with a diagnostic. +- Replacing sqlc for SELECTs and non-PATCH UPDATEs. aippatch only owns dynamic + PATCH UPDATEs. + +## First-principles mechanics + +To turn a PATCH RPC into `UPDATE … WHERE … RETURNING *` you need nine things: + +1. **Presence detection** — which fields to apply. +2. **Proto-field → SQL-column mapping.** +3. **Value coercion** for the write side (proto value → SQL parameter). +4. **Row identity** — the primary key. +5. **Scope predicates** — tenancy / soft-delete / additional WHERE clauses. +6. **Per-field write authorization.** +7. **Per-field value validation.** +8. **Returned representation** — the post-update resource on the wire. +9. **Optimistic concurrency.** + +`aippatch` v0 owns 1, 2, 3, 4, 5, and 8. Items 6 and 7 stay in the handler; +item 9 is deferred. Read-back (item 8) requires bidirectional coercion, so the +v0 codec set covers every type the v0 target resources use. + +## Architecture + +``` +┌─────────────────────────┐ ┌──────────────────────────┐ +│ pb/drill/v1/*.proto │ │ sql/migrations/*.up.sql │ +│ (proto contracts) │ │ (logical schema) │ +└──────────┬──────────────┘ └────────────┬─────────────┘ + │ buf build → buf.binpb │ pg_query_go + │ (FileDescriptorSet) │ + ▼ ▼ + ┌─────────────────────────────────────────────────┐ + │ cmd/aippatchgen (standalone Go binary) │ + │ reads: descriptors + SQL schema + aippatch.yaml│ + │ writes: typed Mapping[T] literals (Go) │ + └─────────────────┬───────────────────────────────┘ + │ ▲ + │ │ aippatch.yaml + ▼ │ (codecs, overrides, writable) + ┌─────────────────────────────────────────────────┐ + │ internal/patches/*.gen.go (committed) │ + │ e.g. var UserPatch = aippatch.Mapping[*User]{} │ + └─────────────────┬───────────────────────────────┘ + │ imported by + ▼ + ┌─────────────────────────────────────────────────┐ + │ internal/rpc//server.go (handler) │ + │ aippatch.Apply(ctx, pool, │ + │ patches.UserPatch, Op[*User]{...}) │ + └─────────────────┬───────────────────────────────┘ + │ uses + ▼ + ┌─────────────────────────────────────────────────┐ + │ thirdparty/aippatch/ (runtime library) │ + │ • Mapping[T], Binding, Op[T], EmptyMaskPolicy │ + │ • Apply[T] — validate → build → exec → scan │ + │ • Codec dispatch: scalar / timestamp / enum │ + │ • Self-contained: no drill imports │ + └─────────────────────────────────────────────────┘ +``` + +Five components, three new: + +1. **`thirdparty/aippatch/`** — runtime library. Self-contained, no drill + imports, ready to lift into a standalone Go module. Imports: + `google.golang.org/protobuf`, `github.com/jackc/pgx/v5`, + `github.com/huandu/go-sqlbuilder`. Reads back via `pgx.Rows.Values()` and + populates the proto via reflection — no third-party row scanner is needed. +2. **`thirdparty/aippatch/cmd/aippatchgen/`** — codegen binary. Imports: + `google.golang.org/protobuf` + `github.com/pganalyze/pg_query_go/v5`. +3. **`aippatch.yaml`** — at the repo root. Source of truth for codecs, + resource bindings, name overrides, and writable allow-list. + +Pre-existing components shrink: + +4. **`internal/patches/*.gen.go`** — committed generated code, one file per + resource. +5. **`internal/rpc//server.go`** — handlers shrink to ~12 lines. + +### Boundary properties + +- The runtime imports nothing from drill. It speaks proto and pgx. +- `aippatchgen` imports nothing from the runtime — it produces Go literals + whose types satisfy the public runtime API. +- Handlers import nothing from `aippatchgen` — they import the runtime and the + generated `patches` package. +- sqlc still owns SELECT, INSERT, and any non-PATCH UPDATE. aippatch only + writes the dynamic PATCH UPDATE. + +## Public API (runtime) + +```go +package aippatch + +// Mapping is the static description of how a proto message round-trips +// through a SQL table. Generated by aippatchgen; never hand-edited. +type Mapping[T proto.Message] struct { + Table string + PK string // column name (PK value comes from Op) + SoftDelete string // "" if none; framework adds "AND col IS NULL" + EmptyMask EmptyMaskPolicy // ErrorOnEmpty (v0 default) + Bindings []Binding // ordered, alphabetical by Proto + + // Populated by Validate(); unexported. + bindingsByProto map[string]*Binding + bindingsByColumn map[string]*Binding +} + +// Binding pairs one proto field with one SQL column. +type Binding struct { + Proto string // proto field name, e.g. "display_name" + Column string // SQL column, e.g. "display_name" (or "created_at") + SQLType string // diagnostic: "text", "timestamptz", "uuid", "boolean", "integer", … + Writable bool // PATCH may set this column; default false (deny-by-default) + Codec string // "" (scalar pass-through) | "timestamp" | "enum:" +} + +// Op carries a single PATCH invocation's runtime data. +type Op[T proto.Message] struct { + Message T // input proto carrying the new values + Mask *fieldmaskpb.FieldMask // which fields to apply + PKValue any // value for the PK column (e.g. uuid.UUID) + Where map[string]any // optional extra equality predicates +} + +// DBTX is the minimal pgx interface aippatch needs (matches sqlc's DBTX). +type DBTX interface { + Query(ctx context.Context, sql string, args ...any) (pgx.Rows, error) +} + +// EmptyMaskPolicy controls behavior when Op.Mask has zero paths. +type EmptyMaskPolicy int +const ( + ErrorOnEmpty EmptyMaskPolicy = iota // v0 default + UpdateAllWritable // future opt-in (not implemented in v0) +) + +// Apply executes the PATCH described by op against m, and returns the +// updated proto message populated from the RETURNING * row. +func Apply[T proto.Message]( + ctx context.Context, db DBTX, + m *Mapping[T], op Op[T], +) (T, error) + +// Validate is called by generated package init; checks every binding's +// proto path against the descriptor of T and indexes binding maps. +// Returns error rather than panicking, per drill's no-panic-at-init rule. +func (m *Mapping[T]) Validate() error +``` + +The codec registry inside the runtime is keyed by the `Codec` string on each +Binding. v0 ships three: + +- `""` — scalar pass-through (`StringKind`, `BoolKind`, `Int32Kind`, + `Sint32Kind`, `Int64Kind`, `Sint64Kind`) +- `"timestamp"` — `google.protobuf.Timestamp` ↔ `time.Time` via `.AsTime()` / + `timestamppb.New(...)` +- `"enum:"` — proto enum number ↔ SQL text via a declared map; both + directions look up the map keyed by the enum's protoreflect name. Out-of-map + values on the read side return `Internal` (data invariant violation); on the + write side return `InvalidArgument`. + +### Errors + +All errors returned by `Apply` are `*connect.Error` with appropriate codes: + +- `CodeInvalidArgument` — empty mask (when policy is `ErrorOnEmpty`), unknown + mask path, non-writable mask path, enum write value not in declared map. +- `CodeNotFound` — `UPDATE` matched zero rows (PK wrong or scope filter + excluded the row). +- `CodeInternal` — pgx error, codec read-side data invariant violation, or + binding/proto desync that escaped boot-time `Validate`. + +## Codegen tool: `aippatchgen` + +### CLI + +``` +aippatchgen [--check] [--config aippatch.yaml] [--out internal/patches] + [--proto buf.binpb] [--migrations sql/migrations] +``` + +Defaults: reads `./aippatch.yaml`, writes to `./internal/patches/`, reads +`./buf.binpb` and `./sql/migrations/`. `--check` exits non-zero if any +generated file would change. Wired into `Makefile`: + +``` +codegen: ; buf generate && sqlc generate && go run ./thirdparty/aippatch/cmd/aippatchgen +test: ; … && go run ./thirdparty/aippatch/cmd/aippatchgen --check && … +``` + +### Inputs + +1. **`buf.binpb`** — emitted by `buf build -o buf.binpb` as part of + `buf generate`. Unmarshaled into `*descriptorpb.FileDescriptorSet`; walked + via `protoreflect.FileDescriptor`. +2. **`sql/migrations/*.up.sql`** — read in lexical order. Each statement + parsed by `pg_query_go`. The tool accumulates a logical schema: + - `CREATE TABLE` → register table with columns `(name, type, nullable)` + - `ALTER TABLE … ADD COLUMN` → add column + - `ALTER TABLE … DROP COLUMN` → remove column + - `ALTER TABLE … ALTER COLUMN … TYPE` → change type + - `ALTER TABLE … RENAME COLUMN` → rename + - `DROP TABLE` → remove table + - Other statements (indexes, constraints, FK refs) are ignored. +3. **`aippatch.yaml`** — codecs + resource declarations (schema below). + +### Algorithm + +1. Load `buf.binpb` → `FileDescriptorSet`. +2. Replay migrations to build the logical schema state. +3. For each `resources[i]` in `aippatch.yaml`: + 1. Look up the proto message descriptor by full name. + 2. Look up the SQL table from the schema; resolve the PK column. + 3. For each proto field in the message, in field-number order: + - If `overrides[field].skip` is true → drop. + - If `overrides[field].column` set → use that column. + - Else → snake-case name match with the SQL column list. + - If no match → record diagnostic: "field X has no matching column; + list candidates and suggest yaml fix." + - Compatibility check between proto kind and column type (table below). + If incompatible → diagnostic. + - Determine codec: + - `MessageKind` with full name `google.protobuf.Timestamp` → `"timestamp"`. + - Enum kind → `"enum:" + name` from `overrides[field].codec`. If missing + → diagnostic: "enum field requires explicit codec in overrides." + - Scalar kind → `""`. + - Anything else → diagnostic: "unsupported in v0; mark `skip: true`." + - `Writable` = field name is in `resources[i].writable`. + 4. Sort bindings alphabetically by `Proto` for stable output. +4. Emit one Go file per resource. +5. If `--check`: byte-compare to existing files; exit 1 on any diff. + +### Type compatibility (v0) + +| Proto kind | SQL types accepted | Codec | +|---|---|---| +| `StringKind` | `text`, `varchar`, `citext`, `uuid` | `""` | +| `BoolKind` | `boolean` | `""` | +| `Int32Kind`, `Sint32Kind` | `integer`, `smallint` | `""` | +| `Int64Kind`, `Sint64Kind` | `bigint` | `""` | +| `MessageKind`: `google.protobuf.Timestamp` | `timestamptz`, `timestamp` | `"timestamp"` | +| `EnumKind` | `text`, `varchar` | `"enum:"` (declared) | +| anything else | — | codegen error | + +`uuid`-as-string is special-cased: a proto `string` field maps to a `uuid` +column when the column type is `uuid`, with `[16]byte`↔string conversion in +the runtime. + +### Diagnostics + +Codegen failures are clear and actionable: + +``` +aippatchgen: drill.v1.User: field "create_time" has no matching column + proto field type: google.protobuf.Timestamp + candidate columns: [created_at, updated_at, deleted_at] + hint: add to aippatch.yaml: + overrides: + create_time: { column: created_at } + +aippatchgen: drill.v1.User: field "role" has unsupported type without codec + proto field kind: enum (drill.v1.UserRole) + hint: declare codec and override: + codecs: + enum_role: { proto_enum: drill.v1.UserRole, map: { ... } } + resources: + - message: drill.v1.User + overrides: { role: { codec: enum_role } } + +aippatchgen: drill.v1.User: writable field "display_name" not present in proto descriptor +``` + +## Configuration: `aippatch.yaml` + +```yaml +codecs: + enum_role: + proto_enum: drill.v1.UserRole + map: + USER_ROLE_CANDIDATE: candidate + USER_ROLE_ADMIN: admin + enum_plan: + proto_enum: drill.v1.UserPlan + map: + USER_PLAN_FREE: free + USER_PLAN_PRO: pro + +resources: + - message: drill.v1.User + table: users + pk: id + soft_delete: deleted_at + empty_mask: error # error (default) | update_writable + writable: [display_name] # AIP-203 deny-by-default + overrides: + create_time: { column: created_at } + role: { codec: enum_role } + plan: { codec: enum_plan } +``` + +One file per repo. `~10–20` lines per resource. Reviewers see policy and +mapping deltas in a single diff. Adding a writable field is one line. + +## Generated file shape + +`internal/patches/user.gen.go`: + +```go +// Code generated by aippatchgen. DO NOT EDIT. +package patches + +import ( + drillv1 "github.com/btc/drill/internal/pb/drill/v1" + "github.com/btc/drill/thirdparty/aippatch" +) + +// UserPatch is the PATCH mapping for drill.v1.User → users. +var UserPatch = mustValidate(&aippatch.Mapping[*drillv1.User]{ + Table: "users", + PK: "id", + SoftDelete: "deleted_at", + EmptyMask: aippatch.ErrorOnEmpty, + Bindings: []aippatch.Binding{ + {Proto: "create_time", Column: "created_at", SQLType: "timestamptz", Writable: false, Codec: "timestamp"}, + {Proto: "display_name", Column: "display_name", SQLType: "text", Writable: true, Codec: ""}, + {Proto: "email", Column: "email", SQLType: "text", Writable: false, Codec: ""}, + {Proto: "email_verified", Column: "email_verified", SQLType: "boolean", Writable: false, Codec: ""}, + {Proto: "id", Column: "id", SQLType: "uuid", Writable: false, Codec: ""}, + {Proto: "plan", Column: "plan", SQLType: "text", Writable: false, Codec: "enum:enum_plan"}, + {Proto: "role", Column: "role", SQLType: "text", Writable: false, Codec: "enum:enum_role"}, + }, +}) + +func mustValidate[T proto.Message](m *aippatch.Mapping[T]) *aippatch.Mapping[T] { + if err := m.Validate(); err != nil { + // Per drill's no-panic-at-init rule, the generated package exposes + // an Init() that returns the error; main wires it up. The + // mustValidate helper is only used in tests; production wiring uses + // an explicit constructor that returns (mappings, error). + panic(err) + } + return m +} +``` + +**Init wiring:** to comply with drill's no-panic-at-init rule, the production +build does *not* use the `mustValidate` helper. Instead `aippatchgen` emits an +`InitPatches() error` function that calls `Validate()` on every generated +mapping and returns the first error. `cmd/server/main.go` calls it during +startup and propagates the error normally. The `mustValidate` helper exists +only for test-side use where panicking is acceptable. + +## Runtime: `Apply` walkthrough + +```go +func Apply[T proto.Message]( + ctx context.Context, db DBTX, + m *Mapping[T], op Op[T], +) (T, error) { + var zero T + + // 1. Mask validation + paths := op.Mask.GetPaths() + if len(paths) == 0 { + if m.EmptyMask == ErrorOnEmpty { + return zero, connectInvalidArg("update_mask must not be empty") + } + // Future: collect all writable bindings as paths. + } + + // 2. Resolve paths to bindings; reject unknown / non-writable. + sets := make([]string, 0, len(paths)) // go-sqlbuilder Assign exprs + desc := op.Message.ProtoReflect().Descriptor() + ub := sqlbuilder.PostgreSQL.NewUpdateBuilder() + ub.Update(m.Table) + + for _, p := range paths { + b, ok := m.bindingsByProto[p] + if !ok { + return zero, connectInvalidArg("unknown field in update_mask: %q", p) + } + if !b.Writable { + return zero, connectInvalidArg("field not writable: %q", p) + } + fd := desc.Fields().ByName(protoreflect.Name(p)) + if fd == nil { + return zero, connectInternal("binding/proto desync: %q", p) + } + v, err := encode(op.Message, fd, b.Codec, m.codecs) // proto value → SQL parameter + if err != nil { return zero, err } + sets = append(sets, ub.Assign(b.Column, v)) + } + ub.Set(sets...) + + // 3. WHERE clauses. + ub.Where(ub.Equal(m.PK, op.PKValue)) + if m.SoftDelete != "" { + ub.Where(m.SoftDelete + " IS NULL") + } + for col, v := range op.Where { + ub.Where(ub.Equal(col, v)) + } + ub.SQL("RETURNING *") + + sqlStr, args := ub.Build() + + // 4. Execute and read back via RETURNING *. + rows, err := db.Query(ctx, sqlStr, args...) + if err != nil { return zero, connectInternal("query: %w", err) } + defer rows.Close() + + if !rows.Next() { + if err := rows.Err(); err != nil { + return zero, connectInternal("query: %w", err) + } + return zero, connectNotFound("resource not found or filtered out") + } + cols := rows.FieldDescriptions() + vals, err := rows.Values() + if err != nil { return zero, connectInternal("scan: %w", err) } + + // 5. Build result proto from input message + RETURNING values. + result := proto.Clone(op.Message).(T) + msg := result.ProtoReflect() + for i, c := range cols { + b, ok := m.bindingsByColumn[string(c.Name)] + if !ok { continue } // unmapped column → ignore + fd := msg.Descriptor().Fields().ByName(protoreflect.Name(b.Proto)) + if err := decode(msg, fd, vals[i], b.Codec, m.codecs); err != nil { + return zero, connectInternal("decode %s: %w", b.Proto, err) + } + } + return result, nil +} +``` + +`encode` and `decode` are bounded switches over field kind × codec. The total +runtime is approximately 300 LoC including codec dispatch, error +constructors, and `Validate`. + +### pgx native types on the read side + +| SQL type | pgx returns | Proto kind expected | Conversion | +|---|---|---|---| +| `text`, `varchar`, `citext` | `string` | `StringKind` | direct | +| `uuid` | `[16]byte` | `StringKind` | `uuid.UUID(v).String()` | +| `boolean` | `bool` | `BoolKind` | direct | +| `integer`, `smallint` | `int32` | `Int32Kind`, `Sint32Kind` | direct | +| `bigint` | `int64` | `Int64Kind`, `Sint64Kind` | direct | +| `timestamptz`, `timestamp` | `time.Time` | `MessageKind: Timestamp` | `timestamppb.New(v)` | +| `text` (with enum codec) | `string` | `EnumKind` | reverse-map declared codec | + +Any other pgx-native type encountered at runtime is an `Internal` error +(should have been caught by codegen's compatibility check). + +## Handler call site + +drill's existing `UpdateProfile` (currently `internal/rpc/user/server.go:100-141`) +shrinks from ~45 lines to ~12: + +```go +func (s *Server) UpdateProfile( + ctx context.Context, + req *connect.Request[drillv1.UpdateProfileRequest], +) (*connect.Response[drillv1.UpdateProfileResponse], error) { + u := auth.UserFromContext(ctx) + if u == nil { + return nil, connect.NewError(connect.CodeUnauthenticated, errors.New("authentication required")) + } + + // Per-field validation lives in the handler in v0. v2 makes it declarative. + if maskHasPath(req.Msg.GetUpdateMask(), "display_name") { + trimmed := strings.TrimSpace(req.Msg.GetUser().GetDisplayName()) + if trimmed == "" { + return nil, connect.NewError(connect.CodeInvalidArgument, errors.New("display_name must not be empty")) + } + req.Msg.GetUser().DisplayName = trimmed + } + + updated, err := aippatch.Apply(ctx, s.b.Pool, patches.UserPatch, aippatch.Op[*drillv1.User]{ + Message: req.Msg.GetUser(), + Mask: req.Msg.GetUpdateMask(), + PKValue: u.ID, + }) + if err != nil { return nil, err } + + return connect.NewResponse(&drillv1.UpdateProfileResponse{User: updated}), nil +} +``` + +The hand-rolled `implementedUserFields` allow-list and per-path validation +loop disappear. Authorization (the unauthenticated check) and per-field value +validation (the trim + non-empty check) remain in the handler. + +## Testing strategy + +Three layers: + +### Runtime unit tests (`thirdparty/aippatch/apply_test.go`) + +Table-driven against testcontainers Postgres. Drill already uses +`testcontainers-go/modules/postgres` (per `go.mod`); aippatch tests reuse the +same approach with a small fixture schema independent of drill's migrations. + +Cases: +- empty mask → `InvalidArgument` +- unknown mask path → `InvalidArgument` +- non-writable mask path → `InvalidArgument` +- writable scalar (string, bool, int32, int64) write + read-back +- writable timestamp write + read-back +- writable enum write (valid & invalid) + read-back +- soft-delete WHERE filters out deleted rows → `NotFound` +- PK mismatch → `NotFound` +- extra `Op.Where` predicate excludes row → `NotFound` +- `RETURNING *` populates fields not touched by mask +- `proto.Clone` preserves input-message fields that have no binding + +### Codegen golden tests (`thirdparty/aippatch/cmd/aippatchgen/aippatchgen_test.go`) + +Fixtures under `testdata/`: +- `simple/` — proto + 2 migrations + yaml → expected `*.gen.go` +- `name_divergence/` — `create_time` ↔ `created_at` +- `enum_codec/` — proto enum + declared codec +- `unsupported_kind/` — proto with bytes field (not in v0); expects diagnostic +- `missing_column/` — proto field with no candidate; expects diagnostic +- `--check_drift/` — fixture with stale `*.gen.go`; expects exit 1 + +### Handler integration test (`internal/rpc/user/server_test.go`) + +Uses drill's existing `backendtest.SeedUser` to create a real user, then +exercises `UpdateProfile` end-to-end through the connect handler: +- valid PATCH on `display_name` → response carries updated User; DB row + updated. +- empty mask → `InvalidArgument`. +- mask with `email` (non-writable) → `InvalidArgument`. +- unauthenticated → `Unauthenticated`. + +These tests already exist for the current implementation; they should pass +unchanged after the migration. + +## Drill rollout plan + +1. Add `thirdparty/aippatch/` runtime package and `cmd/aippatchgen/` binary. +2. Add `aippatch.yaml` at repo root with the `User` resource and enum codecs. +3. Add `internal/patches/` directory; wire `aippatchgen` into `make codegen`. +4. Add `aippatchgen --check` to `make test`. +5. Generate `internal/patches/user.gen.go`. Review the diff manually. +6. Wire `patches.InitPatches()` into `cmd/server/main.go` startup; surface + any error from `Validate()`. +7. Replace `UpdateProfile` handler body with the shrunk version. +8. Delete the `UpdateUserDisplayName` query from `sql/queries/users.sql` and + regenerate sqlc. +9. Run `make test`; existing handler tests should pass unchanged. + +Rollback: single-commit revert. The proto wire contract is unchanged. + +## Spanda replication + +Each Spanda repo gets three artifacts: + +1. The `thirdparty/aippatch/` directory (initially copied from drill; once + stable, extracted to its own module and imported). +2. The `aippatchgen` binary — `go install ./thirdparty/aippatch/cmd/aippatchgen`. +3. A `aippatch.yaml` skeleton. + +Each project's `Makefile` wires `aippatchgen` into its `codegen` and `test` +targets. No drill-specific code is required. + +## Roadmap + +| Tier | Feature | Notes | +|---|---|---| +| v1 | JSONB codec | Marshals proto sub-messages or `[]byte` to `jsonb` columns. Likely first non-v0 demand. | +| v1 | Proto3 explicit-optional + NULL semantics | AIP-134 clearing rule (`mask path + zero value → NULL`); only meaningful for `optional` fields. | +| v2 | Declarative validators | `NonEmptyTrimmed`, `LenBetween`, `URL`, `OneOf`. Per-resource yaml + handler-side composition. | +| v2 | AIP-193 error mapping | pgx error inspection: `unique_violation` → `AlreadyExists`, `fk_violation` → `FailedPrecondition`, `not_null_violation` / `check_violation` → `InvalidArgument`. Per-resource override map. | +| v3 | Per-field declarative authz | `admin_only_fields:` in yaml; layered with handler narrowing. | +| v3 | Concurrency / ETag (AIP-154) | Resource declares version column; Apply requires inbound etag and bumps on success. | +| v4 | Buf plugin | Proto annotations replace yaml entries; same generated output. | +| Out of scope | repeated, oneof, sub-resources | AIP-134 punts these to sub-resource RPCs. | + +Each tier is backwards-compatible: v0 call sites do not change when later +tiers ship. New features are opt-in via `aippatch.yaml`. + +## Risks + +1. **`pg_query_go` is a CGO dependency.** It wraps `libpg_query`. drill's + production binary builds may run with `CGO_ENABLED=0` in some paths. + Mitigation: `aippatchgen` is a developer/CI tool, not part of the + production binary; CGO is only required where `aippatchgen` runs. Document + this in the package README. + +2. **pgx-native ↔ proto type drift.** New SQL types added to drill in the + future may not be in the runtime's `decode` switch. Mitigation: + `aippatchgen` rejects unknown SQL types at codegen with a clear diagnostic; + the runtime never sees a type the codegen accepted. + +3. **`EmptyMaskPolicy` is wire-affecting.** Switching from `ErrorOnEmpty` to + `UpdateAllWritable` changes behavior visible to clients. Mitigation: + document as a per-resource permanent decision; `error` is the v0 default + and recommended. + +4. **Validation duplication in v0.** Per-field validation lives in handlers + until v2. New PATCH RPCs added before v2 must hand-roll trimming / + non-empty / length checks. Mitigation: ship v2 quickly if the duplication + becomes painful; document the v0 expectation in the README. + +5. **`proto.Clone` for the result base.** `Apply` clones `op.Message` as the + starting point for the returned proto. Fields not in the RETURNING row + keep their input-message values. For PATCH this is fine (all DB-backed + fields are populated by RETURNING). For non-DB fields (rare; would only + exist if the proto carries computed-only fields), the input message's + values pass through. Mitigation: document; reject in codegen any proto + field that has no `skip: true` and no binding match. + +## Decisions (locked, with rationale) + +| # | Decision | Why | +|---|---|---| +| 1 | Runtime library + standalone codegen binary; not a buf plugin | Cleaner separation from buf's plugin machinery; reusable in non-buf contexts. | +| 2 | Working name `aippatch`; lives at `drill/thirdparty/aippatch/` | Signals AIP-134 lineage; thirdparty/ prepares clean extraction. | +| 3 | Generic `Mapping[T proto.Message]` (single type parameter) | No row-type coupling; framework is sqlc-independent. No type casts in user code. | +| 4 | Codegen consumes proto FileDescriptorSet + SQL migrations + yaml | Both schemas already on disk; yaml carries policy + overrides only. | +| 5 | Generated `*.gen.go` files committed to repo | Mapping is reviewable in PRs; CI checks for drift via `--check`. | +| 6 | SQL builder: `huandu/go-sqlbuilder` (private to package) | Mature; `PostgreSQL.NewUpdateBuilder()` emits `$1` placeholders cleanly. | +| 7 | Row scan: direct `pgx.Rows.Values()` + proto reflection (no third-party scanner) | A scanner like `scany/v2` would target a Go row struct; we populate a proto via reflection instead. Avoids an unnecessary dependency and a proto-aware shim. | +| 8 | Empty FieldMask rejected with `InvalidArgument` (default) | Strict; matches drill's existing behavior; relax later if a use case warrants. | +| 9 | Deny-by-default writable; opt in via `writable:` list | AIP-203; security posture. | +| 10 | Codegen errors on unsupported field types | Bad fields stop at codegen; runtime never sees a type it cannot handle. | +| 11 | Framework reads back via `RETURNING *` and returns the populated proto | One round-trip; AIP-134 compliant on the wire; pulls codec set up to cover every type in target protos. | +| 12 | v0 codec set: scalars + timestamps + enum | Smallest set that covers drill's `User` and most Spanda CRUD shapes. JSONB and others in v1+. | +| 13 | v0 first user: drill's `UpdateProfile` | Validates the framework against an existing target; replaces the most boilerplate-heavy code path today. | +| 14 | Boot validation via `Mapping.Validate() error` propagated to `main` | drill's no-panic-at-init rule. | + +## Open questions (deferred) + +- **JSONB shape** — for v1: do we marshal proto sub-messages as JSON via + `protojson`, or accept opaque `[]byte` from the handler? Trade-offs around + schema evolution. +- **AIP-154 ETag column type** — `bigint` counter, `uuid` token, or + per-resource choice? Defer to v3 when the use case is concrete. +- **Buf plugin migration** — when (and if) v4 ships, the yaml format remains + the source of truth for policy; only mappings move to proto annotations. + Migration path TBD. + From 780ef4ea15995f1cafa9528e6313bfff997620f5 Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Thu, 7 May 2026 22:21:24 -0400 Subject: [PATCH 02/37] spec: idle auto-cancel design (3 review rounds applied) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Auto-cancel a Pro subscription via cancel_at_period_end=true when the user has been idle for two consecutive billing periods. No settings toggle — this is the default Sabermatic behavior. Establishes a Spanda tenet: earn money for ongoing value delivered. Trigger fires at Stripe invoice.upcoming (~T-7) when MAX(last_active) across auth_sessions falls before the previous period's start, with launch-day grandfathering via users.idle_eligible_after. Reversal via email-link click (signed token, single-use) or auto-reverse on any authenticated activity during the cancel window. sub_cancel_is_auto distinguishes our auto-cancels from user-initiated portal cancels. Migrations 015–018: auth_sessions soft-delete, users sub-state cache, stripe_webhook_dedup, keep_link_token_uses. Feature package home at internal/feat/idleunsub/. --- .../2026-05-07-idle-auto-cancel-design.md | 656 ++++++++++++++++++ 1 file changed, 656 insertions(+) create mode 100644 docs/superpowers/specs/2026-05-07-idle-auto-cancel-design.md diff --git a/docs/superpowers/specs/2026-05-07-idle-auto-cancel-design.md b/docs/superpowers/specs/2026-05-07-idle-auto-cancel-design.md new file mode 100644 index 00000000..7b930555 --- /dev/null +++ b/docs/superpowers/specs/2026-05-07-idle-auto-cancel-design.md @@ -0,0 +1,656 @@ +# Idle Auto-Cancel — Design Spec + +**Date**: 2026-05-07 +**Status**: Draft (rounds 1 + 2 + 3 review applied) +**Scope**: Auto-cancel a Pro subscription when the user has been idle for two consecutive billing periods. No settings toggle — this is the default Sabermatic behavior. Establishes a Spanda, LLC product tenet: earn money for ongoing value delivered. + +--- + +## 1. Motivation + +A subscription that bills a user month after month for a service they haven't touched generates negative sentiment — remorse, distrust, "why am I still paying for this?" Industry convention is to keep collecting until the customer notices and cancels manually. We reject that convention. + +This feature implements the inverse: if a Pro user goes idle for a *full* billing period, and the next period also looks idle as it nears renewal, we set `cancel_at_period_end=true` *before* the next charge fires. The customer is informed, given a frictionless way to keep the subscription, and given access through their already-paid period regardless. They are never billed for a period they're about to skip too. + +This is a tenet for all Spanda, LLC products — Sabermatic[.DEV] is the first to implement it. + +--- + +## 2. Decisions + +| Decision | Choice | Rationale | +|---|---|---| +| Activity definition | Any authenticated request — touches `auth_sessions.last_active` | Matches user intent: "visiting the site while logged in counts." Drill activity is a strict subset (drill RPCs are authenticated). | +| Activity SQL source | `MAX(last_active)` from `auth_sessions` for the user, **across all rows including soft-revoked ones** | Single-table query. The activity question is *historical* ("when did this user last act?"), not *current* ("is this session valid?"), so it intentionally ignores `revoked_at`. | +| `auth_sessions` retention | Soft delete: add `revoked_at`; logout = `UPDATE`, account-delete = hard `DELETE` | Preserves rows through logout so the activity query can read across history. Audit trail. Consistent with existing `users.deleted_at` pattern. | +| Trigger rule | `MAX(last_active) < (current_period.start − 1 interval)` AND `current_period.start ≥ users.idle_eligible_after` | Two periods of evidence, evaluated near renewal. Second clause grandfathers periods that began before this feature shipped. | +| Evaluation moment | Stripe `invoice.upcoming` webhook (~7 days before next renewal) | Event-driven; no cron; Stripe carries the period boundaries. | +| Period boundaries source | `stripe-go/v82` puts these on `SubscriptionItem`, not `Subscription`. We read `sub.Items.Data[0].CurrentPeriodStart` and `Price.Recurring.Interval` from a fresh `subscription.Get(...)` at evaluation time. | The `invoice.upcoming` event's `Invoice.Lines.Data[0].Period` covers the *upcoming* period (after renewal). To get the period currently ending, we fetch the subscription. | +| JSONB timestamp format for period anchors in `user_events.metadata` | RFC3339 strings, written via `json.Marshal` of a Go `time.Time` field on a typed metadata struct. Read via `(metadata->>'current_period_start')::timestamptz`. | Postgres can cast RFC3339 text to `timestamptz` reliably across versions and timezones. Avoids the int64-Unix vs ISO ambiguity. The Go writer is the only producer; using a typed struct prevents drift. | +| Cancel mechanic | `cancel_at_period_end=true` | User keeps already-paid period. No refund. Reversible. | +| Distinguishing auto vs manual cancel | `users.sub_cancel_is_auto` BOOLEAN | Set true *only* when our handler initiates the cancel. Stripe's `cancel_at_period_end` is shared with manual portal cancels; without this distinction, our auto-reverse path would silently undo user-initiated cancellations. | +| Notification | Single email at decision moment with `[Keep my subscription]` link | One transparent message; the link reverses the cancel via signed token. | +| Reversal triggers | (a) clicking the email link, (b) any authed activity during the cancel window — but only if `sub_cancel_is_auto=true` | Activity in the current period invalidates the trigger condition; auto-reverse is rule-consistent. We never reverse a user's deliberate cancellation. | +| Reversal feedback | Banner on next page load + confirmation email on every reversal | Transparent and communicative. | +| Settings toggle | None | This is the default behavior; making it opt-in dilutes the tenet. | +| Code home | `internal/feat/idleunsub/` | New `internal/feat/` convention for cohesive feature packages. First entry. | +| Idempotency | New `stripe_webhook_dedup` table with `event_id TEXT PRIMARY KEY`. Insert-then-act pattern: ON CONFLICT DO NOTHING; if the insert returns no row, the event was already handled. | Stripe retries webhooks. Same event ID = same logical event. A separate dedup table is cleaner than embedding `stripe_event_id` in `user_events.metadata` (which can't be UNIQUE-constrained without expression indexes). | +| Multi-firing of `invoice.upcoming` for same period | State-aware predicate before acting: skip if Stripe says `cancel_at_period_end=true` already; skip if a `subscription_kept` event exists more recently than the most recent `subscription_auto_canceled` for the same `(user_id, subscription_id, current_period_start)`. | Stripe re-fires `invoice.upcoming` (with a fresh `event.ID`) when the upcoming invoice changes (proration, plan switch, coupon edit). Event-ID dedup alone would re-act on each firing. | +| Skip evaluation when sub is `trialing`, `past_due`, `unpaid`, `incomplete` | Early return | Trial: no prior period to evaluate. Past due / unpaid: Stripe's dunning logic owns the lifecycle; we don't fight it. | +| Period storage | Cache `sub_current_period_start` and `stripe_subscription_id` on `users` for hot-path reads (auto-reverse middleware gate). Authoritative read for cancel decisions = fresh Stripe `subscription.Get`. | Cache supports the cheap "is this user's sub auto-canceled and they just acted?" check. Authoritative reads avoid stale-cache risk on the cancel decision itself. | +| Multiple subscriptions per user | Out of scope | Drill assumes one subscription per user (`users.plan` is single-valued). If/when multi-sub is added, this feature is revisited. | + +--- + +## 3. Trigger Rule + +**Rule:** at the moment Stripe fires `invoice.upcoming` for subscription S of user U, evaluate: + +``` +// 1. Early returns +fetch S = subscription.Get(subID) +if S.Status not in {active}: return // skip trial/past_due/unpaid +if S.CancelAtPeriodEnd is already true: return // already canceled (by us or user) +if NOT TryClaimWebhookEvent(event.ID): return // retry-storm dedup + // (insert-on-conflict-do-nothing; + // no-row return = already handled) +if mostRecentPeriodDecisionWasKeep(U, subID, period): return // post-keep multi-firing dedup +if alreadyAutoCanceledThisPeriod(U, subID, period): return // pre-keep multi-firing dedup + +// 2. Read state +last_active = SELECT MAX(last_active) + FROM auth_sessions + WHERE user_id = U.id + -- (no revoked_at filter; activity is across history) + +cur_period_start = S.Items.Data[0].CurrentPeriodStart // see Data Sources below +interval = S.Items.Data[0].Price.Recurring.Interval // e.g., "month" +threshold = cur_period_start - 1 interval // start of period N-1 + +// 3. Trigger +if last_active < threshold AND cur_period_start >= U.idle_eligible_after: + set cancel_at_period_end = true on S + persist user_events row, cache update, queue email (see §4.1) +``` + +### Data Sources + +The Stripe Go SDK v82 splits subscription period fields between `Subscription` and `SubscriptionItem`. The fields the rule cares about live here: + +- `S.Items.Data[0].CurrentPeriodStart` (`int64` Unix seconds) — start of the period currently ending. +- `S.Items.Data[0].CurrentPeriodEnd` — end of the period currently ending; equals the next renewal moment. +- `S.Items.Data[0].Price.Recurring.Interval` — `"month"`, `"year"`, etc. + +The `invoice.upcoming` event payload (`stripe.Invoice`) carries `Invoice.Lines.Data[0].Period.Start` and `.End` for the *upcoming* period (after renewal) — not what we want. We fetch the subscription explicitly at evaluation time. + +### Example timeline + +User signs up Feb 1 with monthly billing. The trigger fires for each upcoming invoice: + +| When | Event | `last_active` | `cur_period_start` | `threshold` | Decision | +|---|---|---|---|---|---| +| Feb 22 | invoice.upcoming for Mar 1 renewal | Feb 8 (signup + early use) | Feb 1 | Jan 1 | last_active ≥ threshold → no cancel | +| Mar 22 | invoice.upcoming for Apr 1 renewal | Feb 8 (no Mar activity) | Mar 1 | Feb 1 | last_active ≥ threshold → no cancel (Feb activity counts) | +| Apr 22 | invoice.upcoming for May 1 renewal | Feb 8 (no Mar, no Apr activity) | Apr 1 | Mar 1 | last_active < threshold → **cancel_at_period_end=true** | + +The user pays for **two** unused periods (March and April in this example) before the cancel kicks in — the first while we're establishing the idle baseline, the second while we're confirming sustained idleness. The cancel prevents the *next* charge (May 1) and every charge after. + +A user who signs up Feb 1 and never returns has their cancel triggered at the Apr ~22 evaluation and loses access at May 1 — about 90 days after signup, after paying 3 monthly bills (Feb, Mar, Apr) of which two were unused. + +This two-billed-but-unused-periods cost is the deliberate price of the two-period rule. The alternative (one-period rule, which would prevent the second charge) was rejected because it can't be evaluated reliably at `invoice.upcoming` (only ~23 of 30 days are observed at that moment) and because a single quiet month is a noisy signal. + +### Why two periods, not one + +A one-period rule would have to evaluate at `invoice.upcoming` (~T-7 days before period_end), but that means the current period isn't actually fully observed yet — we'd be calling 23 of 30 days "the full period." Inconsistent. The two-period rule resolves it: at evaluation time, period N−1 is provably 100% complete and idle, regardless of what happens in the remaining 7 days of period N. + +A one-period rule is also too aggressive: a user who happens to take a single quiet month (vacation, busy quarter, surgery, parental leave) gets canceled. Two periods is meaningful evidence; one is noise. + +### Plan changes, pause/resume + +Stripe shifts `current_period_start` to the change date on plan changes. The same is true for portal-driven pause + resume (`pause_collection` flag + later resume). We accept both: a user paying enough attention to upgrade or resume is by definition not idle, so resetting the idle clock is correct. Documented as a known property, not a bug. + +Note: paused subscriptions retain `Status='active'` (`pause_collection` is a separate field). They pass our status early-return and are evaluated normally. An idle paused sub is still idle from this feature's perspective. If we later decide we want to *exclude* paused subs from evaluation, the gate is `S.PauseCollection != nil`. Out of scope for v1. + +--- + +## 4. Behavior + +### 4.1 Cancel path + +1. **T-7**: Stripe fires `invoice.upcoming`. Webhook handler in `internal/backend/billing.go` delegates to `idleunsub.HandleInvoiceUpcoming`. +2. Fetch the subscription from Stripe (`subscription.Get(subID, &SubscriptionParams{Expand: ["items"]})`). +3. Run early returns from §3 (status, already-canceled). +4. **Begin DB transaction:** + a. `SELECT id FROM users WHERE id = $1 FOR UPDATE`. **This is the per-user mutex**: two concurrent `invoice.upcoming` deliveries for the same user cannot both pass the dedup queries below before either has committed. Without this lock, two transactions starting under READ COMMITTED can both see no prior `subscription_auto_canceled` row, both proceed, and both fire Stripe + email. The lock serializes evaluations per user; webhook concurrency is low so contention is negligible. + b. `TryClaimWebhookEvent(event.ID, 'invoice.upcoming')` — `INSERT ... ON CONFLICT DO NOTHING RETURNING event_id`. If no row returned, ROLLBACK and return (event already handled). + c. Run period-keyed dedup queries (`mostRecentPeriodDecisionWasKeep`, `alreadyAutoCanceledThisPeriod`). If either signals a prior decision for this period, ROLLBACK and return. Both queries are kept (rather than collapsing to just `alreadyAutoCanceledThisPeriod`) for log/observability clarity — `cancel.skipped{reason=already_kept_this_period}` vs `cancel.skipped{reason=already_canceled_this_period}` distinguish two operationally interesting cases. + d. Read `last_active`, compute `threshold`. If trigger does not fire, ROLLBACK and return (no point claiming the event — let a future re-fire of `invoice.upcoming` for this same period have a fresh shot if state changes). + e. **Trigger fires:** insert a `user_events` row with `event_type='subscription_auto_canceled'`, metadata `{subscription_id, stripe_event_id, current_period_start, current_period_end}` (timestamps as RFC3339 strings via the typed metadata struct, canonicalized to second precision via `time.Unix(stripeInt64, 0).UTC()`). + f. Update `users` cache: `sub_cancel_at_period_end = true`, `sub_cancel_is_auto = true`, `sub_current_period_start = ...`, `stripe_subscription_id = ...`. + g. **COMMIT.** The lock, the dedup row, the audit row, and the cache update are now durable together. If the transaction rolls back at any point, the next webhook retry sees an unclaimed event and starts fresh. +5. Call Stripe: `subscription.Update(id, cancel_at_period_end=true)` with `Idempotency-Key: `. **Outside the transaction.** Stripe `Update` is naturally idempotent. +6. Queue the cancel email via `SendEmailJob` (River) using the `current_period_end` from step 4d's metadata. +7. **T-0** (period_end): Stripe naturally lets the subscription end. The existing `customer.subscription.deleted` handler resets `users.plan = 'free'` and clears the cache columns. + +**Failure-mode reasoning:** +- DB transaction fails before COMMIT → no Stripe call attempted, dedup not claimed, retry will see unclaimed event and start fresh. ✓ +- COMMIT succeeds, Stripe `Update` fails → cache says canceled, Stripe says not canceled. **This is the partial-failure window.** It must be handled in two places: + 1. **Server-side**: emit `idleunsub.cancel.error{reason=stripe_update_failed}` counter, page ops. Ops manually verifies Stripe state and either re-issues the cancel via Stripe API or clears the cache flag (`UPDATE users SET sub_cancel_at_period_end=false, sub_cancel_is_auto=false WHERE id=$1`). + 2. **Client-side**: `AutoReverse` (§4.3) verifies Stripe state via `subscription.Get` before calling `subscription.Update`. If Stripe says `cancel_at_period_end` is already `false`, AutoReverse silently clears the cache flag, logs a warning, and skips the email/`subscription_kept` event row. This prevents a spurious "subscription kept" email and audit row when no real reversal happened. Cost: one extra Stripe API call on the cold path that only fires for users whose cache flag is true. +- COMMIT succeeds, Stripe `Update` succeeds, email enqueue fails → cache and Stripe agree; user sees the banner on next visit and Stripe's own renewal-prevention behavior. Logged. +- Concurrent webhook deliveries for the same user → serialized by the `SELECT ... FOR UPDATE` in step 4a; only one transaction proceeds, the other waits and then sees the prior decision via the dedup queries. + +### 4.2 Reversal — email link click + +1. User receives the cancel email containing a signed token URL: `https://sabermatic.dev/sub/keep?t=`. +2. The token is HMAC-signed JSON serialized via a typed Go struct with explicit field order: `{user_id, subscription_id, action: "keep_subscription", current_period_end, iat, exp: current_period_end}`. The token carries `current_period_end` so the confirmation page (and replay) has the date without re-fetching from Stripe or relying on a possibly-stale cache. Encoded as URL-safe base64. +3. **Single-use enforcement**: the public endpoint `GET /sub/keep?t=...` (in `internal/handler/keep.go`): + a. Verifies signature. + - Bad signature / malformed token → **400** with a generic "this link is invalid" page. Distinguished from expiry to avoid leaking whether a given (user, sub, period) tuple ever existed. + - Valid signature but expired (current time > `exp`) → **200** with a friendly "your subscription period has already ended" page (no Stripe call, no email). The token doesn't grant any access; expiry just means the keep window is over. A bare 400 here is user-hostile per the tenet — distinguish from tampering. + - Spec also enforces invariant: `claims.ExpiresAt == claims.CurrentPeriodEnd.Unix()`. If they disagree (signer bug or struct drift), reject as malformed. + b. Valid signature + not expired: computes `token_hash = sha256(token)`. + c. INSERTs into `keep_link_token_uses (token_hash PRIMARY KEY, used_at)` with `ON CONFLICT DO NOTHING RETURNING token_hash`. If no row returned, the token was already used — render the same confirmation page using `claims.CurrentPeriodEnd` from the (still-valid-signature) token. No Stripe call, no email. + d. On first use: calls `idleunsub.KeepSubscription(ctx, claims)`, which: + - Confirms the subscription is still in `cancel_at_period_end=true` and `sub_cancel_is_auto=true` state (refuses to reverse a manual cancel). + - Calls `subscription.Update(id, cancel_at_period_end=false)`. Captures the returned `subscription` object for its updated period info. + - Updates `users` cache: `sub_cancel_at_period_end = false`, `sub_cancel_is_auto = false`, `pending_kept_banner = true`, `sub_current_period_start = `. + - Sends the "subscription kept" confirmation email; uses the returned subscription's `CurrentPeriodEnd` for the renewal date in the body. + - Inserts a `user_events` row with `event_type='subscription_kept'`, metadata `{subscription_id, via: 'link', current_period_start: }`. The `current_period_start` here matches the period the keep applies to and is what `mostRecentPeriodDecisionWasKeep` queries against. + e. Renders a confirmation page ("You're all set — your Sabermatic subscription is still active. Next renewal: {claims.CurrentPeriodEnd}."). +4. The endpoint is mounted publicly (no `RequireAuth`); the token IS the authentication. + +### 4.3 Reversal — auto-reverse on activity + +While `cancel_at_period_end=true` AND `sub_cancel_is_auto=true`, any authenticated request from the user invalidates the trigger condition. The hook lives in `Backend.AuthenticateSession` alongside the existing `TouchAuthSession` fire-and-forget pattern (`internal/backend/backend.go:280-285`): + +```go +// After existing TouchAuthSession goroutine, before returning user: +if user.SubCancelAtPeriodEnd && user.SubCancelIsAuto { + go func() { + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + defer cancel() + _ = idleunsub.AutoReverse(ctx, user.ID) // best-effort + }() +} +``` + +`AutoReverse`: +1. Re-reads `users.sub_cancel_at_period_end` and `users.sub_cancel_is_auto` (defensive: these may have flipped via webhook between the gate read and goroutine execution). If either is false, no-op. +2. **The current request itself is the activity signal.** We do *not* re-read `MAX(last_active)` — that would race against the in-flight `TouchAuthSession` goroutine, producing a stale read. The gate `SubCancelAtPeriodEnd && SubCancelIsAuto` plus the fact that this request authenticated successfully is sufficient evidence that the user is active in the current period. +3. **Verifies Stripe state** via `subscription.Get`. If Stripe says `cancel_at_period_end` is already `false`, the cache is stale (probably from a prior partial failure where COMMIT succeeded but Stripe `Update` did not). Silently clear the cache flags (`sub_cancel_at_period_end=false`, `sub_cancel_is_auto=false`), log a warning, emit `idleunsub.cancel.cache_drift_corrected`, and return without sending email or inserting `subscription_kept`. No real reversal happened. +4. Otherwise, calls `subscription.Update(id, cancel_at_period_end=false)`. Stripe call is idempotent (setting `false` when already `false` is a no-op, but step 3 already filtered that case). Captures the returned subscription for its `CurrentPeriodStart` and `CurrentPeriodEnd`. +5. Updates `users` cache: `sub_cancel_at_period_end = false`, `sub_cancel_is_auto = false`, `pending_kept_banner = true`, `sub_current_period_start = `. +6. Sends the confirmation email via `SendEmailJob`, using the returned subscription's `CurrentPeriodEnd` for the renewal date. +7. Inserts a `user_events` row with `event_type='subscription_kept'`, metadata `{subscription_id, via: 'auto_activity', current_period_start: }`. The `current_period_start` is what `mostRecentPeriodDecisionWasKeep` queries against — without it, multi-firing dedup would wrongly re-cancel after a keep. + +The TOCTOU window between step 1 (gate read) and step 4 (Stripe `Update`) is benign: setting `cancel_at_period_end=false` when Stripe says it's already `false` is a no-op, and the post-Get verification in step 3 short-circuits anyway. + +The cached `sub_cancel_at_period_end` flag is the gate that prevents Stripe API calls on every authed request — it's `false` for almost every user almost all the time. After AutoReverse flips it, subsequent requests skip the entire path. The `sub_cancel_is_auto` second gate prevents this code from reversing a user's manual portal cancellation. + +### 4.4 No reversal (cancel completes) + +User does nothing. At period_end, Stripe naturally deletes the subscription. The existing `customer.subscription.deleted` webhook handler runs, resets `users.plan = 'free'`, clears the cache columns (`sub_cancel_at_period_end`, `sub_cancel_is_auto`, `sub_current_period_start`, `stripe_subscription_id`). User retains anything they own (purchased grants, free-tier minutes), loses Pro features. + +### 4.5 Banner + +The `pending_kept_banner` flag on `users` is set during reversal (link or auto). The frontend reads it from the user-info RPC payload. The banner clears on **dismiss** (explicit user action — close button, a small RPC call sets the flag to `false`), not on render. This way, a user who closes the tab before noticing the banner gets it on the next page load. Auto-dismiss after 14 days as a hygiene measure. + +--- + +## 5. Schema Changes + +### Migration 015: `auth_sessions` soft delete + +```sql +-- 015_auth_sessions_soft_delete.up.sql +ALTER TABLE auth_sessions + ADD COLUMN revoked_at TIMESTAMPTZ; + +CREATE INDEX idx_auth_sessions_user_last_active + ON auth_sessions(user_id, last_active DESC); +``` + +```sql +-- 015_auth_sessions_soft_delete.down.sql +DROP INDEX IF EXISTS idx_auth_sessions_user_last_active; +ALTER TABLE auth_sessions DROP COLUMN revoked_at; +``` + +The new index is `DESC` on `last_active` so `MAX(last_active) WHERE user_id = $1` is a one-row scan. No `WHERE revoked_at IS NULL` filter — the activity query reads across history. + +**Write amplification note:** this index is updated on every `TouchAuthSession` (one per authed request). Cost is one additional B-tree update per request. Acceptable; the same row is hot anyway. + +**Retention:** rows are never physically deleted on logout under this migration. Account deletion still hard-deletes (§7.2). Rows are tiny (~100 bytes); we accept unbounded growth for v1 and revisit if it becomes a real cost. A future cleanup job could hard-delete `revoked_at < NOW() - 1 year` while preserving each user's most recent revoked row — out of scope here. + +### Migration 016: cache subscription state on users + +```sql +-- 016_users_sub_state.up.sql +ALTER TABLE users + ADD COLUMN stripe_subscription_id TEXT, + ADD COLUMN sub_cancel_at_period_end BOOLEAN NOT NULL DEFAULT FALSE, + ADD COLUMN sub_cancel_is_auto BOOLEAN NOT NULL DEFAULT FALSE, + ADD COLUMN sub_current_period_start TIMESTAMPTZ, + ADD COLUMN pending_kept_banner BOOLEAN NOT NULL DEFAULT FALSE, + ADD COLUMN idle_eligible_after TIMESTAMPTZ NOT NULL DEFAULT NOW(); +``` + +| Column | Purpose | Populated by | +|---|---|---| +| `stripe_subscription_id` | The sub ID we need to call `subscription.Update` on | `handleSubscriptionUpdated` (Stripe fires this on subscription create as well as update — see §7.2 note). Cleared on `customer.subscription.deleted`. | +| `sub_cancel_at_period_end` | Hot-path gate for auto-reverse middleware | Synced from `customer.subscription.updated` events; written directly by `idleunsub` cancel/keep paths. | +| `sub_cancel_is_auto` | Distinguishes our auto-cancel from a user's manual portal cancel | Set `true` by `idleunsub.HandleInvoiceUpcoming` only. Set `false` by `KeepSubscription`/`AutoReverse`. **Never** set by webhook sync. | +| `sub_current_period_start` | Cached so we can detect "user came back since the cancel decision" without a Stripe round-trip; also populated by reversal paths from the `subscription.Update` response | Populated by `handleSubscriptionUpdated`, the cancel path (from the fetched-at-eval-time subscription), and the keep/auto-reverse paths (from the `subscription.Update` response). | +| `pending_kept_banner` | "Welcome back" banner state | Set by reversal paths; cleared on banner dismiss; auto-cleared after 14 days by a tiny periodic job. | +| `idle_eligible_after` | Period-grandfathering anchor for launch-day wave protection | Defaults to `NOW()` at migration. Migration applies to existing users → their first eligible period_start is whatever begins after deploy. New users get `NOW()` at signup. | + +### Migration 017: webhook event idempotency + +```sql +-- 017_stripe_webhook_dedup.up.sql +CREATE TABLE stripe_webhook_dedup ( + event_id TEXT PRIMARY KEY, + event_type TEXT NOT NULL, + processed_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); +``` + +Generic table for webhook retry-storm protection. Other webhook handlers can adopt it incrementally; the existing `grants.stripe_event_id UNIQUE` pattern continues to work for grant creation. + +### Migration 018: keep-link token single-use + +```sql +-- 018_keep_link_token_uses.up.sql +CREATE TABLE keep_link_token_uses ( + token_hash BYTEA PRIMARY KEY, + user_id UUID NOT NULL REFERENCES users(id), + used_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); +``` + +`token_hash` is `sha256(raw_token)`; storing the hash, not the token, prevents leak risk if the table is ever exposed. + +### sqlc query changes (`sql/queries/auth_sessions.sql`) + +| Existing | Change | +|---|---| +| `GetAuthSessionByToken` | Add `AND s.revoked_at IS NULL` to WHERE; project the new `users` cache columns into the returned row. | +| `DeleteAuthSession` (single logout — sole call site is `internal/backend/auth.go:261`) | Rename to `RevokeAuthSession`; becomes `UPDATE auth_sessions SET revoked_at = NOW() WHERE id = $1`. | +| `DeleteUserAuthSessions` (logout-everywhere) | Rename to `RevokeUserAuthSessions`; becomes `UPDATE auth_sessions SET revoked_at = NOW() WHERE user_id = $1 AND revoked_at IS NULL`. | +| `users.sql` `DELETE FROM auth_sessions WHERE user_id = @id` (account deletion) | Stays as hard DELETE, renamed for clarity to `HardDeleteUserAuthSessions`. | + +### New sqlc queries + +```sql +-- name: GetUserLastActive :one +-- Reads across all history (including revoked sessions) — this is a +-- "when did the user last act, ever?" query, not a session-validity check. +SELECT MAX(last_active)::timestamptz +FROM auth_sessions +WHERE user_id = $1; + +-- name: SetUserAutoCancelState :exec +UPDATE users +SET sub_cancel_at_period_end = TRUE, + sub_cancel_is_auto = TRUE, + sub_current_period_start = $2 +WHERE id = $1; + +-- name: ClearUserAutoCancelState :exec +UPDATE users +SET sub_cancel_at_period_end = FALSE, + sub_cancel_is_auto = FALSE, + pending_kept_banner = TRUE +WHERE id = $1; + +-- name: SyncSubStateFromWebhook :exec +-- Used by handleSubscriptionUpdated. Does NOT touch sub_cancel_is_auto: +-- that flag is only set by our handler, never by webhook sync. +UPDATE users +SET stripe_subscription_id = $2, + sub_cancel_at_period_end = $3, + sub_current_period_start = $4 +WHERE id = $1; + +-- name: ClearSubStateOnDeletion :exec +-- Used by handleSubscriptionDeleted. +UPDATE users +SET stripe_subscription_id = NULL, + sub_cancel_at_period_end = FALSE, + sub_cancel_is_auto = FALSE, + sub_current_period_start = NULL, + plan = 'free' +WHERE id = $1; + +-- name: ClearKeptBanner :exec +UPDATE users SET pending_kept_banner = FALSE WHERE id = $1; + +-- name: TryClaimWebhookEvent :one +-- Returns the event_id on first claim, no row on subsequent claims. +INSERT INTO stripe_webhook_dedup (event_id, event_type) +VALUES ($1, $2) +ON CONFLICT (event_id) DO NOTHING +RETURNING event_id; + +-- name: TryClaimKeepToken :one +INSERT INTO keep_link_token_uses (token_hash, user_id) +VALUES ($1, $2) +ON CONFLICT (token_hash) DO NOTHING +RETURNING token_hash; + +-- name: HasAutoCanceledThisPeriod :one +-- Returns true if a subscription_auto_canceled event already exists for this period. +-- metadata->>'current_period_start' is RFC3339 text written by the typed metadata struct. +SELECT EXISTS ( + SELECT 1 FROM user_events + WHERE user_id = $1 + AND event_type = 'subscription_auto_canceled' + AND (metadata->>'subscription_id') = $2 + AND (metadata->>'current_period_start')::timestamptz = $3 +); + +-- name: GetMostRecentKeptOrCanceledForPeriod :one +-- For multi-firing dedup: did a 'subscription_kept' event arrive after the +-- most recent 'subscription_auto_canceled' for this period? If yes, the user +-- already kept their sub for this period; don't re-cancel. +-- Both event types include current_period_start in their metadata (RFC3339 string). +SELECT event_type +FROM user_events +WHERE user_id = $1 + AND event_type IN ('subscription_auto_canceled', 'subscription_kept') + AND (metadata->>'subscription_id') = $2 + AND (metadata->>'current_period_start')::timestamptz = $3 +ORDER BY created_at DESC +LIMIT 1; +``` + +**Metadata schema** (the canonical Go writer struct, ensuring RFC3339 timestamps): + +```go +package idleunsub + +// All timestamps are canonicalized to second precision via +// time.Unix(stripeInt64, 0).UTC() before assignment. Stripe period fields +// arrive as int64 Unix seconds (no sub-second component); explicit second +// precision and UTC guard against future drift if a code path ever +// constructs a time.Time from a different source. JSON encoding via +// encoding/json produces RFC3339 ("...Z") which Postgres ::timestamptz +// parses reliably. + +type cancelMetadata struct { + SubscriptionID string `json:"subscription_id"` + StripeEventID string `json:"stripe_event_id"` + CurrentPeriodStart time.Time `json:"current_period_start"` + CurrentPeriodEnd time.Time `json:"current_period_end"` +} + +type keptMetadata struct { + SubscriptionID string `json:"subscription_id"` + Via string `json:"via"` // "link" | "auto_activity" + CurrentPeriodStart time.Time `json:"current_period_start"` // identifies the period kept; + // load-bearing for mostRecentPeriodDecisionWasKeep dedup +} +``` + +--- + +## 6. Emails + +Two messages, both sent via the existing `email.Sender` (Mailgun in prod, log-only in test). Templates live in `internal/feat/idleunsub/templates/`. The phrase "the last two billing periods" is composed at send time so it works for non-monthly intervals (annual: "the last two annual billing periods"; weekly: "the last two weeks"). + +### 6.1 Cancel decision email + +``` +Subject: We won't charge you for the next period + +Hi {first_name}, + +We noticed you haven't been around Sabermatic in the last two billing +periods, so we've stopped your auto-renewal. You'll keep access through +{current_period_end} — we won't charge you for the next period. + +If you'd like to keep your subscription active, one click does it: + + [Keep my subscription] + +If you're done for now, no action needed. We'll be here whenever you +want to come back. + +— Sabermatic +``` + +### 6.2 Subscription kept (confirmation) + +``` +Subject: Your subscription is still active + +Hi {first_name}, + +You're all set — your Sabermatic subscription will renew normally on +{next_renewal_date}. + +Welcome back. + +— Sabermatic +``` + +A "period has ended" email is intentionally out of scope. The existing `customer.subscription.deleted` handler does not need a new message under this feature; if we want one, it lives in the existing subscription-deletion path, not here. + +--- + +## 7. Implementation Surface + +### 7.1 Feature package — `internal/feat/idleunsub/` + +``` +internal/feat/idleunsub/ + idleunsub.go // public API + decision/action logic + email.go // compose the cancel + kept emails + templates/ + cancel.html.tmpl + cancel.txt.tmpl + kept.html.tmpl + kept.txt.tmpl + token.go // HMAC sign/verify for the keep-link + metrics.go // OTEL counter wrappers (see §9) + idleunsub_test.go // domain tests +``` + +Public API (Go): + +```go +package idleunsub + +// HandleInvoiceUpcoming evaluates the trigger and, if it fires, +// sets cancel_at_period_end on Stripe and sends the cancel email. +// Idempotent: safe to call multiple times with the same event. +func (s *Service) HandleInvoiceUpcoming(ctx context.Context, event stripe.Event) error + +// KeepSubscription reverses cancel_at_period_end. Called from the +// keep-link endpoint after token verification + single-use claim. +// Refuses to reverse if sub_cancel_is_auto is false (manual cancel). +// Takes the verified token claims (which carry CurrentPeriodEnd for the +// confirmation email) so we don't need an extra Stripe round-trip just +// for the date. +func (s *Service) KeepSubscription(ctx context.Context, claims KeepTokenClaims) error + +// AutoReverse reverses cancel_at_period_end when activity is detected +// in the current period. Looks up subscription state from cached users row. +// No-op if cache says cancel is off, or if cancel is not the auto kind. +func (s *Service) AutoReverse(ctx context.Context, userID uuid.UUID) error + +// SignKeepToken returns a signed token for embedding in the cancel email. +// Token is a Go struct serialized via canonical JSON (typed fields, fixed +// order); HMAC-SHA256; URL-safe base64. +func (s *Service) SignKeepToken(claims KeepTokenClaims) string + +// VerifyKeepToken parses and validates a token from the keep-link URL. +// Does NOT consume single-use; that's the endpoint's responsibility. +func (s *Service) VerifyKeepToken(token string) (KeepTokenClaims, error) + +type KeepTokenClaims struct { + UserID uuid.UUID `json:"user_id"` + SubscriptionID string `json:"subscription_id"` + Action string `json:"action"` // "keep_subscription" + CurrentPeriodEnd time.Time `json:"current_period_end"` // RFC3339; for confirmation email + replay page + IssuedAt int64 `json:"iat"` // Unix seconds (JWT-style) + ExpiresAt int64 `json:"exp"` // Unix seconds; INVARIANT: == CurrentPeriodEnd.Unix() +} + +// Invariant enforced by both Sign and Verify: +// ExpiresAt == CurrentPeriodEnd.Unix() +// SignKeepToken populates ExpiresAt from CurrentPeriodEnd; VerifyKeepToken +// rejects tokens where the two disagree (struct drift / forged token). +``` + +`Service` is constructed once at startup with: a Stripe client, the database, the email sender, the HMAC signing key (already used elsewhere — see existing tokens.go), a clock, and an `slog.Logger`. + +### 7.2 Integration adapters (live in conventional places) + +| File | Change | +|---|---| +| `internal/backend/billing.go` `HandleStripeWebhook` | Add `case "invoice.upcoming"` → delegate to `b.idleunsub.HandleInvoiceUpcoming`. Also handle `customer.subscription.created` (currently unhandled) → route to `handleSubscriptionUpdated` (same body works for both create and update — Stripe sends both events on subscription creation, with identical payload shape). | +| `internal/backend/billing.go` `handleSubscriptionUpdated` | **Extends the existing function body** (currently calls only `UpdatePlanByStripeCustomer`). Add a `SyncSubStateFromWebhook` call that reads `id`, `current_period_start`, and `cancel_at_period_end` from the event's Subscription object and writes them to `users`. This becomes the sole population path for `users.stripe_subscription_id` and `sub_current_period_start` — Stripe fires `customer.subscription.updated` (and `.created`) immediately after checkout completion, on every period renewal, on plan changes, on portal cancel/resume, so this single hook covers all cases. **Does not touch `sub_cancel_is_auto`** (only `idleunsub.HandleInvoiceUpcoming` sets it). The existing `handleCheckoutCompleted` is unchanged — its `mode != "payment"` early return still applies; subscription-mode checkouts populate state via `customer.subscription.created`/`.updated` instead. **Round-trip note:** when our cancel path sets both `sub_cancel_at_period_end=true` and `sub_cancel_is_auto=true`, Stripe then fires `customer.subscription.updated` with `cancel_at_period_end=true` in the payload. `SyncSubStateFromWebhook` overwrites our `sub_cancel_at_period_end` (with the same `true` value) but does *not* touch `sub_cancel_is_auto`, which stays `true`. The flag round-trip is correct by construction. | +| `internal/backend/billing.go` `handleSubscriptionDeleted` | Use `ClearSubStateOnDeletion`. | +| `internal/backend/backend.go` | Wire the `idleunsub.Service` into `Backend` at construction. | +| `internal/handler/keep.go` (new) | `GET /sub/keep` endpoint: verify token, claim single-use, call `idleunsub.KeepSubscription`, render confirmation page. Mounted publicly (no `RequireAuth`). | +| `internal/backend/backend.go` `AuthenticateSession` | After the existing `TouchAuthSession` goroutine, add a second fire-and-forget goroutine that calls `idleunsub.AutoReverse(ctx, user.ID)` iff `user.SubCancelAtPeriodEnd && user.SubCancelIsAuto`. Same 5s timeout pattern. | +| `internal/auth/user_context.go` | Extend `AuthUser` struct with `SubCancelAtPeriodEnd bool`, `SubCancelIsAuto bool`, `PendingKeptBanner bool`. | +| `internal/backend/auth.go:261` (Logout) | Switch from `DeleteAuthSession` to `RevokeAuthSession` (the renamed query). Sole caller. | +| `web/src/components/KeptBanner.tsx` (new) | One-time banner. Reads `pending_kept_banner` from the user-info RPC. Cleared on user dismiss via small ack RPC. Auto-dismisses after 14 days server-side. | +| `internal/rpc/user/server.go` | Include `pending_kept_banner` in the user-info RPC response. Add an `AckKeptBanner` RPC method that calls `ClearKeptBanner`. (ConnectRPC handlers live under `internal/rpc/{service}/` per project convention.) | +| `sql/queries/auth_sessions.sql`, `sql/queries/users.sql` | Updates per §5. | +| `sql/migrations/015_*.sql`, `016_*.sql`, `017_*.sql`, `018_*.sql` | New migrations per §5. | + +### 7.3 Email link / token + +- Token payload: typed `KeepTokenClaims` struct serialized with `encoding/json` over a fixed field order. Canonical because the struct is the only writer. +- Signing: HMAC-SHA256 with a separate `KEEP_TOKEN_HMAC_KEY` env var, derived at startup. Separating from the session-token signing key prevents cross-purpose token forgery. +- Encoding: URL-safe base64. +- Expiry: `exp = current_period_end`. After period end, the cancel either took effect or was reversed; reusing the token has no effect. +- **Replay protection**: server-side single-use via `keep_link_token_uses` (token-hash PRIMARY KEY). The first click "consumes" the token; subsequent clicks render the same confirmation page without side effects. +- **Leak surface**: tokens land in our own server access logs. Mitigation: the token is single-use, so a logged token cannot be replayed for action. The token does not grant any other access (no session, no API auth) — it specifically and only triggers the keep action for the one subscription it was minted for. + +--- + +## 8. Testing + +### 8.1 Domain tests (in `idleunsub_test.go`) + +- **Trigger rule** — table-driven cases: + - `last_active` exactly equals threshold → not idle (`<` is strict). + - `last_active` one second before threshold → idle, cancel. + - `last_active` is `NULL` (no rows for user — should be unreachable for a Pro user, but defensive) → no cancel; no evidence to act on. + - First-period evaluation (user signed up in current period) → threshold predates signup → trivially not idle → no cancel. + - `cur_period_start < idle_eligible_after` → period grandfathered → no cancel even if idle. + - Sub status `trialing` / `past_due` / `unpaid` / `incomplete` → early return. + - Sub already `cancel_at_period_end=true` → early return (no duplicate email). + - Same `event_id` re-delivered → second call is a no-op (dedup table claim fails). + - Same period, different event_id (multi-firing) → second call sees prior `subscription_auto_canceled` row → no-op. + - Same period, prior `subscription_kept` more recent than prior `subscription_auto_canceled` → no-op (don't re-cancel after keep). +- **KeepSubscription**: + - Refuses to reverse when `sub_cancel_is_auto=false` (manual cancel scenario). + - Idempotent: token already claimed → no Stripe call, no email. +- **AutoReverse**: + - Gate: `sub_cancel_at_period_end=false` → no-op. + - Gate: `sub_cancel_is_auto=false` → no-op (don't reverse manual cancels). + - Both true → reverses, flips cache, sends email, sets banner flag. + - **Race-safety:** does not depend on `MAX(last_active)` — the request itself is the activity signal. +- **Token**: tampered token → verify fails. Expired token → verify fails. Single-use enforced. + +### 8.2 Integration tests (live DB + stub Stripe) + +Following the existing `internal/backendtest/` convention (use `backendtest.SeedUser` and the production `Signup` path; never raw SQL inserts). + +- **End-to-end happy path** — seed a Pro user with `last_active` in Feb, fire `invoice.upcoming` for May renewal, assert: Stripe `Update` called with `cancel_at_period_end=true`, `users.sub_cancel_is_auto=true` set, email sent, `user_events` row inserted with `event_type='subscription_auto_canceled'`, `stripe_webhook_dedup` row created. +- **End-to-end reversal via link** — given an auto-canceled sub, hit `GET /sub/keep?t=...` with a valid token, assert reversal + flag flips + confirmation email + `subscription_kept` row + `keep_link_token_uses` row. Re-click → idempotent confirmation page, no second email. +- **End-to-end auto-reverse via activity** — given an auto-canceled sub, simulate an authed request, assert reversal + cache flips + confirmation email + banner flag set. +- **Manual cancel guard** — user manually cancels via portal (simulate via webhook); `sub_cancel_at_period_end=true` but `sub_cancel_is_auto=false`; subsequent authed request → AutoReverse is a no-op; user's choice respected. +- **Idempotent webhook re-delivery** — fire same `invoice.upcoming` event twice → one cancel call, one email, one dedup row. +- **Multi-firing of `invoice.upcoming`** — fire two distinct events for same period → second one is no-op via period dedup. +- **Launch wave** — seed an existing Pro user with `idle_eligible_after = NOW() + 1 month`, fire `invoice.upcoming` with `cur_period_start < idle_eligible_after`, assert no cancel. +- **Token tampering** — modified token → 400 with generic "invalid link" page. No DB or Stripe state change. +- **Token expired** — past-period token (valid signature, expired) → 200 with friendly "your subscription period has already ended" page. No DB or Stripe state change. Distinguished from tampering on purpose. +- **Token signer/verifier invariant** — token where `claims.ExpiresAt != claims.CurrentPeriodEnd.Unix()` → rejected as malformed. +- **Token replay** — valid token used twice → first works, second returns the same confirmation page idempotently with no Stripe call. +- **Concurrent webhook race** — two simultaneous `invoice.upcoming` deliveries for the same user → `SELECT FOR UPDATE` serializes them; one runs to completion, the other waits and then sees the prior decision via `alreadyAutoCanceledThisPeriod`. +- **Cache drift correction** — seed a Pro user with `sub_cancel_at_period_end=true` (cache) but Stripe-side `cancel_at_period_end=false` (simulating partial-failure window) → next authed request triggers AutoReverse → AutoReverse's `subscription.Get` detects drift → silently clears cache, emits `idleunsub.cancel.cache_drift_corrected`, no email, no `subscription_kept` row. + +--- + +## 9. Operational Notes + +### Observability + +Each decision emits an `slog` record AND an OTEL counter: + +| Counter | Tags | When | +|---|---|---| +| `idleunsub.cancel.fired` | `sub_id` | Trigger fires, cancel set on Stripe | +| `idleunsub.cancel.skipped` | `reason ∈ {trialing, past_due, already_canceled, dedup, grandfathered}` | Early-return in handler | +| `idleunsub.reverse.link` | `sub_id` | Reversal via email-link click | +| `idleunsub.reverse.activity` | `sub_id` | Reversal via auto-activity middleware | +| `idleunsub.email.enqueue` | `kind ∈ {cancel, kept}, status ∈ {ok, error}` | Email enqueued (or enqueue itself failed). Delivery success/failure is the email worker's concern, not this counter. | +| `idleunsub.cancel.error` | `reason ∈ {stripe_update_failed, db_commit_failed}` | Stripe `Update` returned error after DB commit, or DB transaction failed mid-cancel. Pages ops. | +| `idleunsub.cancel.cache_drift_corrected` | `sub_id` | AutoReverse detected the local cache disagreed with Stripe (cache said canceled, Stripe said not). Cache silently corrected; no email/event row. Should be near-zero in steady state — sustained nonzero indicates a Stripe-failure pattern worth investigating. | + +A dashboard panel for `cancel.fired` minus `reverse.{link,activity}` shows net cancellations per period. Sustained large drift in one direction is worth investigating (regression where `last_active` isn't being updated, misconfigured `idle_eligible_after`, etc.). + +### Email deliverability + +§4.1 sequences: DB transaction (dedup-claim + user_events + cache) → COMMIT → Stripe call → email enqueue. If the Stripe call succeeds but email enqueue fails, the cancel still applies (Stripe and our cache agree). The user discovers via the in-app banner on next visit, or via Stripe's own renewal-prevention behavior. Email failures are logged and counted via `idleunsub.email.enqueue{status=error}`. + +### HMAC key rotation + +Rotating `KEEP_TOKEN_HMAC_KEY` invalidates all outstanding keep-link tokens (max age = period length, typically ≤ 30 days from email send for monthly subs). Mitigations, in order of preference: + +1. **Don't rotate routinely.** This key has a small surface (only signs keep-link tokens) and is not in the request hot path. +2. **Two-key verification window.** During rotation, accept tokens signed with either the old or new key for one period; retire the old key after. +3. **Accept the breakage.** Users with broken links can still use AutoReverse on next login (they just won't have the email link path during the rotation window). + +Default policy for v1: option 1 (no scheduled rotation). Document option 2 as the path forward if a security incident requires rotation. + +### Backfill + +- `auth_sessions.revoked_at` defaults to NULL — no backfill. +- `users.idle_eligible_after` defaults to `NOW()` at migration — every existing Pro user is grandfathered for at least their current period (and for their prior period, since `cur_period_start < idle_eligible_after` until the next period rolls over). +- `users.stripe_subscription_id` will be NULL for existing users until their next `customer.subscription.updated` webhook fires (which Stripe re-fires on any subscription state change). For users who don't change anything, the field stays NULL and AutoReverse is a no-op via its existing gates. Optional one-time backfill: query Stripe for every user with `stripe_customer_id IS NOT NULL` and a current sub, populate the new field. Non-blocking; can ship without. + +### Rollback + +- Migration down-scripts restore prior state. Dropping `auth_sessions.revoked_at` reverts the table; the only data loss is "knowing when sessions were revoked." +- The feature can be turned off without rolling back schema by removing `case "invoice.upcoming"` from `HandleStripeWebhook`. The package stays in place dormant. +- If a bad cancel slips through, `KeepSubscription` is the per-user remedy; for a wider issue, a one-shot ops script can scan `user_events` for `subscription_auto_canceled` rows in a time window and call `subscription.Update(cancel_at_period_end=false)` for each. + +### Launch wave protection + +The `idle_eligible_after` column (default `NOW()` at migration) ensures no existing Pro user is canceled on day 1 of deploy. Their first eligible period is the first one whose `current_period_start ≥ idle_eligible_after`. For monthly subs, this means earliest cancel for an existing user is ~60 days post-deploy (if they were already idle and continue to be). + +--- + +## 10. Out of Scope / Open Questions + +- **Multiple subscriptions per user.** Drill assumes one. If a user ever has multiple, this feature needs revisiting. +- **Annual subscriptions.** Rule generalizes — "previous period start" is just `current_period_start − 1 interval`. Annual means a user signing up and going idle wouldn't be canceled until ~24 months later. Probably correct (annual buyers are presumably committed); revisit if Sabermatic ever offers annual. +- **Toggle later?** If we ever discover users want to opt *out* of this protection (e.g., enterprise customers with shared seats where idleness doesn't reflect intent), we revisit. Default-on holds until then. +- **Spanda portability.** The *tenet* is portable; the *code* is not — each Spanda product implements its own version against its own data model. A future cross-product library might extract the trigger evaluator if it pays for itself. +- **Cancel-window UI surface beyond the banner.** Should the billing page show "your subscription will end on {date} unless you log in or click here"? Current spec says no; easy add-on if we want it. +- **Package name.** Round 1 review: both Opus and Sonnet flagged `idleunsub` as awkward (reads like newsletter-unsub). Author chose it explicitly. Open: rename to `idlecancel` or `autocancel` later if the awkwardness compounds during implementation. +- **Keep semantics for future periods.** A click on the email keep-link reverses *this* period's cancel but does not update `last_active`. If the user clicks keep but then never logs in, the rule re-fires next period (after another full idle period N+1 passes). v1 treats this as correct: clicking keep is "I want this period," not "I want to keep paying forever even without using." Revisit if real users hit a re-cancel loop. +- **Auto-prune of `auth_sessions`.** Soft-delete leaves rows forever. v1 accepts the bloat; revisit when storage cost becomes real. +- **Auto-prune of `keep_link_token_uses`.** Rows are only created on first click. After `exp = period_end` passes, the token is expired regardless and the row's no-replay protection becomes moot. A periodic `DELETE FROM keep_link_token_uses WHERE used_at < NOW() - INTERVAL '60 days'` keeps the table trim. Boy-scout cleanup; not blocking. +- **Generic webhook dedup adoption.** `stripe_webhook_dedup` is generic; other handlers (`invoice.paid`, `customer.subscription.*`) could adopt it for consistency. Out of scope for this feature. + +--- + +## 11. Glossary + +- **Period N** (a.k.a. the *current period*): the billing period the user is currently in (read from `SubscriptionItem.CurrentPeriodStart` / `.CurrentPeriodEnd` in stripe-go v82). +- **Period N−1**: the previous period (`current_period_start − 1 interval` to `current_period_start`). +- **Threshold**: `current_period_start − 1 interval` — the moment before which `last_active` must fall for the trigger to fire. +- **invoice.upcoming**: Stripe webhook fired ~7 days before each renewal (default; configurable per Stripe account). +- **`cancel_at_period_end`**: Stripe subscription flag. When `true`, Stripe completes the current paid period and then deletes the subscription instead of renewing. Reversible until period ends. +- **`sub_cancel_is_auto`**: our boolean distinguishing an auto-cancel set by this feature from a manual cancel set by the user via Stripe portal. Only the former is reversible by AutoReverse. +- **Keep-link**: signed-token URL embedded in the cancel email; clicking it reverses `cancel_at_period_end`. Single-use enforced server-side. +- **`idle_eligible_after`**: per-user timestamp before which `current_period_start` values are grandfathered (no cancel). Used to prevent a wave of cancellations at deploy time. From c5454a1f627fbc6833e090b50df7e4dbdaf4fa87 Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Thu, 7 May 2026 22:33:54 -0400 Subject: [PATCH 03/37] =?UTF-8?q?docs(spec):=20aippatch=20=E2=80=94=20appl?= =?UTF-8?q?y=20round-1=20review=20fixes?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Both opus and sonnet review agents returned ~30 findings. Major changes: HIGH - Acknowledge AIP-134 empty-mask deviation explicitly (Wire conformance note); ErrorOnEmpty stays as v0 default. - Fix s.b.Pool() — it is a method, not a field. - Drop "matches sqlc's DBTX" claim; document it is a strict subset satisfied by *pgxpool.Pool, *pgx.Conn, and pgx.Tx. - Remove mustValidate panic helper from generated file; show clean Mapping literal plus generated InitPatches() error per drill's no-panic-at-init rule. - Replace ub.SQL("RETURNING *") with ub.Returning(boundCols...) — first-class API, sends only mapped columns, excludes unmapped sensitive columns from the wire. - Add AutoSet mechanism (yaml auto_set: { updated_at: NOW() }) so PATCH bumps updated_at like the existing sqlc UPDATEs do; without this every PATCH would leave updated_at stale. - Use proto.CloneOf instead of proto.Clone(...).(T); add nil op.Message guard. - Add buf build -o buf.binpb to make codegen — current buf generate does not emit a descriptor set. MEDIUM - Add Transactions subsection: Apply takes any DBTX including pgx.Tx. - Add audit-logging hooks (returned diff) to v1 roadmap. - Document CGO requirement for aippatchgen explicitly. - Add NOT NULL extraction to migration replay; nullable columns deferred to v1. - Define module-extraction trigger as "second project consumes aippatch in production" (v0.5). - Audit existing test wording in rollout plan. - Split smallint into separate type-table row with int16-to-int32 widening on decode. - Add validated bool to Mapping; Apply rejects unvalidated mappings with Internal. LOW - Document v0 nested-mask-path constraint and emit codegen diagnostic. - List github.com/google/uuid in runtime imports. - Add bytes/float/double to deferred-types table. - State codecs are global and reusable across resources. - Make algorithm step explicit that unmatched proto fields without skip:true are diagnostic. - Add go get step to rollout plan. - Note RETURNING uses bound-columns list (privacy benefit). - Correct codec key description (yaml name, not protoreflect name). - Add grep-for-callers step before deleting UpdateDisplayName. NIT - Use op.Mask.GetPaths() throughout. - Clarify two-phase ordering (field-number processing, alphabetical emit). - Spell out "foreign-key constraints". - Document yaml string-to-Go-enum mapping for empty_mask. - Diagram label uses thirdparty/aippatch/cmd/aippatchgen. - Spanda replication wording: runtime/codegen are drill-free, not the yaml. Spec grew from 690 to ~770 lines. --- .../specs/2026-05-07-aippatch-design.md | 618 +++++++++++++----- 1 file changed, 439 insertions(+), 179 deletions(-) diff --git a/docs/superpowers/specs/2026-05-07-aippatch-design.md b/docs/superpowers/specs/2026-05-07-aippatch-design.md index 3ad62090..8db1bf7b 100644 --- a/docs/superpowers/specs/2026-05-07-aippatch-design.md +++ b/docs/superpowers/specs/2026-05-07-aippatch-design.md @@ -18,16 +18,18 @@ lifted into a standalone Go module and re-used across Spanda LLC products. - **Common case is one yaml entry.** Adding a new writable field on a resource is: declare it in `aippatch.yaml`, regenerate, ship. -- **Hard case is possible.** Name divergences, enum codecs, and opt-outs are - expressible as overrides without escape hatches into custom Go. +- **Hard case is possible.** Name divergences, enum codecs, always-set columns + (e.g. `updated_at`), and opt-outs are expressible as overrides without + escape hatches into custom Go. - **No type casting in user code.** The public API is generic over the proto message type; callers never see `proto.Message` erasure. - **Schema drift is a build-time error.** When proto fields rename, columns rename, or types diverge, the codegen tool fails CI before runtime can. - **Replicable across projects.** Three artifacts (runtime library, codegen binary, yaml file) port to any Go service speaking ConnectRPC + pgx. -- **AIP-134 compliant on the wire.** PATCH responses carry the updated - resource. `FieldMask` semantics are honored. +- **AIP-134-aligned with documented divergences.** Wire shape (resource + + FieldMask in, full updated resource out) follows AIP-134. Empty-mask + semantics deviate intentionally; see *Wire conformance note* below. ## Non-goals (v0) @@ -42,12 +44,40 @@ lifted into a standalone Go module and re-used across Spanda LLC products. - JSONB, repeated, oneof, message-as-jsonb, proto3 explicit-optional / NULL semantics. v0 codecs cover scalars, timestamps, and enums only; everything else fails at codegen with a diagnostic. +- **Nullable bound columns.** v0 bindings must reference `NOT NULL` columns. + Nullable columns require `pgtype.*` decode handling and are deferred to v1. +- **`bytes`, `float`, `double` proto kinds.** Common but unused in drill v0 + resources; deferred to v1. +- **Nested mask paths.** v0 supports top-level proto fields only. Mask paths + with dots (`address.street`) are rejected by codegen with a diagnostic. +- **Audit logging hooks.** v0 has no pre/post hook for emitting audit events. + Callers wrap `Apply` themselves until v1 adds returned-diff or hooks. - Replacing sqlc for SELECTs and non-PATCH UPDATEs. aippatch only owns dynamic PATCH UPDATEs. +## Wire conformance note + +aippatch follows AIP-134 wire shape (resource + `FieldMask` in, full updated +resource out) with two intentional divergences: + +1. **Empty `FieldMask`** is rejected with `InvalidArgument` (default + `ErrorOnEmpty` policy). AIP-134 §Update specifies that an omitted mask + "MUST" be treated as an implied mask covering all populated fields. drill + prefers explicit intent over implicit broad updates; clients that need + full-field updates must enumerate paths. The `EmptyMaskPolicy` is + wire-affecting and considered permanent for any deployed service — + document the chosen policy in the resource's API documentation. +2. **Nested mask paths** (`address.street`) are not supported in v0 because + the codec set excludes nested message types. Codegen rejects yaml entries + whose proto fields would require nested support. Roadmap: v1+. + +All other AIP-134 requirements (return the updated resource, honor mask paths +that are valid, reject unknown paths) are upheld. + ## First-principles mechanics -To turn a PATCH RPC into `UPDATE … WHERE … RETURNING *` you need nine things: +To turn a PATCH RPC into `UPDATE … WHERE … RETURNING ` you need nine +things: 1. **Presence detection** — which fields to apply. 2. **Proto-field → SQL-column mapping.** @@ -59,9 +89,14 @@ To turn a PATCH RPC into `UPDATE … WHERE … RETURNING *` you need nine things 8. **Returned representation** — the post-update resource on the wire. 9. **Optimistic concurrency.** -`aippatch` v0 owns 1, 2, 3, 4, 5, and 8. Items 6 and 7 stay in the handler; -item 9 is deferred. Read-back (item 8) requires bidirectional coercion, so the -v0 codec set covers every type the v0 target resources use. +`aippatch` v0 owns 1, 2, 3, 4, 5, and 8. Items **6** (authorization) and **7** +(validation) stay in the handler in v0; v2 promotes them to declarative. +Item **9** (concurrency) is deferred to v3. Read-back (item 8) requires +bidirectional coercion, so the v0 codec set covers every type the v0 target +resources use. + +A tenth concern — **audit logging** — is intentionally out of v0; callers wrap +`Apply` for now. v1 considers a returned-diff or hooks API. ## Architecture @@ -70,37 +105,40 @@ v0 codec set covers every type the v0 target resources use. │ pb/drill/v1/*.proto │ │ sql/migrations/*.up.sql │ │ (proto contracts) │ │ (logical schema) │ └──────────┬──────────────┘ └────────────┬─────────────┘ - │ buf build → buf.binpb │ pg_query_go + │ buf build -o buf.binpb │ pg_query_go │ (FileDescriptorSet) │ ▼ ▼ - ┌─────────────────────────────────────────────────┐ - │ cmd/aippatchgen (standalone Go binary) │ - │ reads: descriptors + SQL schema + aippatch.yaml│ - │ writes: typed Mapping[T] literals (Go) │ - └─────────────────┬───────────────────────────────┘ + ┌──────────────────────────────────────────────────────┐ + │ thirdparty/aippatch/cmd/aippatchgen/ │ + │ (standalone Go binary; CGO required for pg_query_go)│ + │ reads: descriptors + SQL schema + aippatch.yaml │ + │ writes: typed Mapping[T] literals + InitPatches() │ + └─────────────────┬────────────────────────────────────┘ │ ▲ │ │ aippatch.yaml - ▼ │ (codecs, overrides, writable) - ┌─────────────────────────────────────────────────┐ - │ internal/patches/*.gen.go (committed) │ - │ e.g. var UserPatch = aippatch.Mapping[*User]{} │ - └─────────────────┬───────────────────────────────┘ + ▼ │ (codecs, overrides, writable, auto_set) + ┌──────────────────────────────────────────────────────┐ + │ internal/patches/*.gen.go (committed) │ + │ e.g. var UserPatch = aippatch.Mapping[*User]{ … } │ + │ func InitPatches() error { Validate all mappings } │ + └─────────────────┬────────────────────────────────────┘ │ imported by ▼ - ┌─────────────────────────────────────────────────┐ - │ internal/rpc//server.go (handler) │ - │ aippatch.Apply(ctx, pool, │ - │ patches.UserPatch, Op[*User]{...}) │ - └─────────────────┬───────────────────────────────┘ + ┌──────────────────────────────────────────────────────┐ + │ internal/rpc//server.go (handler) │ + │ aippatch.Apply(ctx, pool, │ + │ patches.UserPatch, Op[*User]{...}) │ + └─────────────────┬────────────────────────────────────┘ │ uses ▼ - ┌─────────────────────────────────────────────────┐ - │ thirdparty/aippatch/ (runtime library) │ - │ • Mapping[T], Binding, Op[T], EmptyMaskPolicy │ - │ • Apply[T] — validate → build → exec → scan │ - │ • Codec dispatch: scalar / timestamp / enum │ - │ • Self-contained: no drill imports │ - └─────────────────────────────────────────────────┘ + ┌──────────────────────────────────────────────────────┐ + │ thirdparty/aippatch/ (runtime library) │ + │ • Mapping[T], Binding, AutoSetClause, Op[T], │ + │ EmptyMaskPolicy │ + │ • Apply[T] — validate → build → exec → scan │ + │ • Codec dispatch: scalar / timestamp / enum │ + │ • Self-contained: no drill imports │ + └──────────────────────────────────────────────────────┘ ``` Five components, three new: @@ -108,17 +146,21 @@ Five components, three new: 1. **`thirdparty/aippatch/`** — runtime library. Self-contained, no drill imports, ready to lift into a standalone Go module. Imports: `google.golang.org/protobuf`, `github.com/jackc/pgx/v5`, - `github.com/huandu/go-sqlbuilder`. Reads back via `pgx.Rows.Values()` and - populates the proto via reflection — no third-party row scanner is needed. + `github.com/huandu/go-sqlbuilder`, `github.com/google/uuid` (for + `[16]byte`→canonical uuid string formatting on the read side). Reads back + via `pgx.Rows.Values()` and populates the proto via reflection — no + third-party row scanner is needed. 2. **`thirdparty/aippatch/cmd/aippatchgen/`** — codegen binary. Imports: `google.golang.org/protobuf` + `github.com/pganalyze/pg_query_go/v5`. + Requires CGO (libpg_query); see *Risks* §1. 3. **`aippatch.yaml`** — at the repo root. Source of truth for codecs, - resource bindings, name overrides, and writable allow-list. + resource bindings, name overrides, writable allow-list, and always-set + columns. Pre-existing components shrink: 4. **`internal/patches/*.gen.go`** — committed generated code, one file per - resource. + resource, plus a single `init.gen.go` that emits `InitPatches() error`. 5. **`internal/rpc//server.go`** — handlers shrink to ~12 lines. ### Boundary properties @@ -130,6 +172,9 @@ Pre-existing components shrink: generated `patches` package. - sqlc still owns SELECT, INSERT, and any non-PATCH UPDATE. aippatch only writes the dynamic PATCH UPDATE. +- The runtime's `DBTX` interface is satisfied by `*pgxpool.Pool`, `*pgx.Conn`, + and `pgx.Tx` — `Apply` participates in a caller's transaction transparently + when a `pgx.Tx` is passed. ## Public API (runtime) @@ -143,31 +188,46 @@ type Mapping[T proto.Message] struct { PK string // column name (PK value comes from Op) SoftDelete string // "" if none; framework adds "AND col IS NULL" EmptyMask EmptyMaskPolicy // ErrorOnEmpty (v0 default) - Bindings []Binding // ordered, alphabetical by Proto + Bindings []Binding // ordered alphabetically by Proto for stable diff + AutoSet []AutoSetClause // always-set columns regardless of mask // Populated by Validate(); unexported. bindingsByProto map[string]*Binding bindingsByColumn map[string]*Binding + validated bool // guard against use before InitPatches() } // Binding pairs one proto field with one SQL column. type Binding struct { Proto string // proto field name, e.g. "display_name" Column string // SQL column, e.g. "display_name" (or "created_at") - SQLType string // diagnostic: "text", "timestamptz", "uuid", "boolean", "integer", … + SQLType string // diagnostic: "text", "timestamptz", "uuid", "boolean", "integer", "smallint", "bigint" Writable bool // PATCH may set this column; default false (deny-by-default) - Codec string // "" (scalar pass-through) | "timestamp" | "enum:" + Codec string // "" (scalar pass-through) | "timestamp" | "enum:" +} + +// AutoSetClause defines a SQL expression always written into the SET clause. +// Typical use: { Column: "updated_at", SQLLiteral: "NOW()" }. The literal is +// emitted as raw SQL — never sourced from user input. Codegen verifies the +// column exists in the table and is NOT NULL. +type AutoSetClause struct { + Column string + SQLLiteral string } // Op carries a single PATCH invocation's runtime data. type Op[T proto.Message] struct { - Message T // input proto carrying the new values + Message T // input proto carrying the new values; must be non-nil Mask *fieldmaskpb.FieldMask // which fields to apply PKValue any // value for the PK column (e.g. uuid.UUID) Where map[string]any // optional extra equality predicates } -// DBTX is the minimal pgx interface aippatch needs (matches sqlc's DBTX). +// DBTX is the minimal pgx interface aippatch needs. It is a strict subset of +// pgx's query surface and is satisfied by *pgxpool.Pool, *pgx.Conn, and pgx.Tx +// — Apply participates in a caller's transaction when a pgx.Tx is passed. +// (Note: this is *not* the same DBTX that sqlc generates; aippatch's is +// smaller and read-only on the connection from a control-flow perspective.) type DBTX interface { Query(ctx context.Context, sql string, args ...any) (pgx.Rows, error) } @@ -175,45 +235,68 @@ type DBTX interface { // EmptyMaskPolicy controls behavior when Op.Mask has zero paths. type EmptyMaskPolicy int const ( - ErrorOnEmpty EmptyMaskPolicy = iota // v0 default + ErrorOnEmpty EmptyMaskPolicy = iota // v0 default; rejects with InvalidArgument UpdateAllWritable // future opt-in (not implemented in v0) ) // Apply executes the PATCH described by op against m, and returns the -// updated proto message populated from the RETURNING * row. +// updated proto message populated from the RETURNING row. op.Message must +// be non-nil; the returned T is a clone (via proto.CloneOf) of op.Message +// with mapped columns overwritten from RETURNING. func Apply[T proto.Message]( ctx context.Context, db DBTX, m *Mapping[T], op Op[T], ) (T, error) -// Validate is called by generated package init; checks every binding's -// proto path against the descriptor of T and indexes binding maps. -// Returns error rather than panicking, per drill's no-panic-at-init rule. +// Validate is called by generated InitPatches(); checks every binding's +// proto path against the descriptor of T, verifies AutoSet columns exist, +// and indexes binding maps. Returns error rather than panicking, per drill's +// no-panic-at-init rule. Sets m.validated = true on success. func (m *Mapping[T]) Validate() error ``` The codec registry inside the runtime is keyed by the `Codec` string on each -Binding. v0 ships three: +Binding (the yaml codec name, prefixed with `"enum:"` for enum codecs): - `""` — scalar pass-through (`StringKind`, `BoolKind`, `Int32Kind`, `Sint32Kind`, `Int64Kind`, `Sint64Kind`) - `"timestamp"` — `google.protobuf.Timestamp` ↔ `time.Time` via `.AsTime()` / `timestamppb.New(...)` -- `"enum:"` — proto enum number ↔ SQL text via a declared map; both - directions look up the map keyed by the enum's protoreflect name. Out-of-map - values on the read side return `Internal` (data invariant violation); on the - write side return `InvalidArgument`. +- `"enum:"` — proto enum number ↔ SQL text via a declared map; both + directions look up the codec by yaml name. Out-of-map values on the read + side return `Internal` (data invariant violation); on the write side return + `InvalidArgument`. Codecs are global to the yaml file and reusable across + resources. ### Errors All errors returned by `Apply` are `*connect.Error` with appropriate codes: - `CodeInvalidArgument` — empty mask (when policy is `ErrorOnEmpty`), unknown - mask path, non-writable mask path, enum write value not in declared map. + mask path, non-writable mask path, nested mask path, enum write value not + in declared map, nil `op.Message`. - `CodeNotFound` — `UPDATE` matched zero rows (PK wrong or scope filter excluded the row). -- `CodeInternal` — pgx error, codec read-side data invariant violation, or - binding/proto desync that escaped boot-time `Validate`. +- `CodeInternal` — pgx error, codec read-side data invariant violation, + binding/proto desync that escaped boot-time `Validate`, or `Apply` called + on an unvalidated `Mapping` (`InitPatches()` not invoked). + +### Transactions + +`Apply` does not start its own transaction. `DBTX` accepts both pools and +`pgx.Tx`; passing a tx makes `Apply` participate. On error, the caller's +transaction state is the caller's responsibility — pgx aborts an open tx on +any non-nil error per its standard contract. For multi-statement atomic +operations (e.g. PATCH + audit-event insert), wrap in `pgx.BeginFunc`: + +```go +err := pgx.BeginFunc(ctx, s.b.Pool(), func(tx pgx.Tx) error { + _, err := aippatch.Apply(ctx, tx, patches.UserPatch, op) + if err != nil { return err } + _, err = tx.Exec(ctx, "INSERT INTO audit_events ...") + return err +}) +``` ## Codegen tool: `aippatchgen` @@ -226,28 +309,55 @@ aippatchgen [--check] [--config aippatch.yaml] [--out internal/patches] Defaults: reads `./aippatch.yaml`, writes to `./internal/patches/`, reads `./buf.binpb` and `./sql/migrations/`. `--check` exits non-zero if any -generated file would change. Wired into `Makefile`: +generated file would change. + +**CGO requirement.** `aippatchgen` links `libpg_query` via +`github.com/pganalyze/pg_query_go/v6`, which requires `CGO_ENABLED=1`. This +is the default in Go's `go build`, but some shops set `CGO_ENABLED=0` +globally; document at the top of `cmd/aippatchgen/main.go` and in +`thirdparty/aippatch/README.md`. The runtime library has no CGO requirement. + +Wired into `Makefile`: ``` -codegen: ; buf generate && sqlc generate && go run ./thirdparty/aippatch/cmd/aippatchgen -test: ; … && go run ./thirdparty/aippatch/cmd/aippatchgen --check && … +codegen: + buf generate + buf build -o buf.binpb + sqlc generate + go run ./thirdparty/aippatch/cmd/aippatchgen + +test: + ... && go run ./thirdparty/aippatch/cmd/aippatchgen --check && ... ``` +`buf.binpb` is committed to the repo (small, deterministic; CI can regenerate +and verify if desired). + ### Inputs -1. **`buf.binpb`** — emitted by `buf build -o buf.binpb` as part of - `buf generate`. Unmarshaled into `*descriptorpb.FileDescriptorSet`; walked - via `protoreflect.FileDescriptor`. +1. **`buf.binpb`** — emitted by `buf build -o buf.binpb` (added as a new step + in `make codegen`; the existing `buf generate` does not emit a descriptor + set). Unmarshaled into `*descriptorpb.FileDescriptorSet`; walked via + `protoreflect.FileDescriptor`. 2. **`sql/migrations/*.up.sql`** — read in lexical order. Each statement parsed by `pg_query_go`. The tool accumulates a logical schema: - - `CREATE TABLE` → register table with columns `(name, type, nullable)` - - `ALTER TABLE … ADD COLUMN` → add column - - `ALTER TABLE … DROP COLUMN` → remove column - - `ALTER TABLE … ALTER COLUMN … TYPE` → change type - - `ALTER TABLE … RENAME COLUMN` → rename - - `DROP TABLE` → remove table - - Other statements (indexes, constraints, FK refs) are ignored. -3. **`aippatch.yaml`** — codecs + resource declarations (schema below). + - `CREATE TABLE` → register table with columns `(name, type, not_null, + default_present)`. + - `ALTER TABLE … ADD COLUMN` → add column. + - `ALTER TABLE … DROP COLUMN` → remove column. + - `ALTER TABLE … ALTER COLUMN … TYPE` → change type. + - `ALTER TABLE … ALTER COLUMN … SET / DROP NOT NULL` → flip nullability. + - `ALTER TABLE … RENAME COLUMN` → rename. + - `DROP TABLE` → remove table. + - Other statements (indexes, foreign-key constraints, CHECK constraints) + are ignored in v0. CHECK constraint extraction (to validate enum codec + maps) is a v1 feature. + + **Nullability** is interpreted from `NOT NULL`, `PRIMARY KEY` (implies + NOT NULL), and `SET / DROP NOT NULL` migrations. The codegen's + compatibility check enforces v0's "bound columns must be NOT NULL" rule. +3. **`aippatch.yaml`** — codecs + resource declarations + auto_set blocks + (schema below). ### Algorithm @@ -256,40 +366,60 @@ test: ; … && go run ./thirdparty/aippatch/cmd/aippatchgen --check && … 3. For each `resources[i]` in `aippatch.yaml`: 1. Look up the proto message descriptor by full name. 2. Look up the SQL table from the schema; resolve the PK column. - 3. For each proto field in the message, in field-number order: + 3. For each proto field in the message, in field-number order (so error + messages are stable): - If `overrides[field].skip` is true → drop. - If `overrides[field].column` set → use that column. - Else → snake-case name match with the SQL column list. - - If no match → record diagnostic: "field X has no matching column; - list candidates and suggest yaml fix." + - If no match → diagnostic: "field X has no matching column; suggest + `{ skip: true }` or `{ column: }`." - Compatibility check between proto kind and column type (table below). - If incompatible → diagnostic. + On v0 also enforce: column is NOT NULL. On incompatibility → + diagnostic. - Determine codec: - - `MessageKind` with full name `google.protobuf.Timestamp` → `"timestamp"`. - - Enum kind → `"enum:" + name` from `overrides[field].codec`. If missing - → diagnostic: "enum field requires explicit codec in overrides." + - `MessageKind` with full name `google.protobuf.Timestamp` → + `"timestamp"`. + - Enum kind → require `overrides[field].codec` to name a declared + codec; emit `"enum:"`. If missing → diagnostic. - Scalar kind → `""`. - Anything else → diagnostic: "unsupported in v0; mark `skip: true`." - `Writable` = field name is in `resources[i].writable`. - 4. Sort bindings alphabetically by `Proto` for stable output. -4. Emit one Go file per resource. + - Reject paths with dots (nested) — diagnostic. + 4. After processing, every proto field must either have a binding or + `skip: true`. Any unmatched field is a diagnostic (this prevents the + `proto.CloneOf`-base case from silently passing through unmapped + fields with input-message values). + 5. For each `auto_set[col]` entry: verify the column exists in the table + and is NOT NULL. The literal expression is emitted verbatim as raw SQL + — never user input. Reject if column is a writable binding (would + conflict). + 6. Sort bindings alphabetically by `Proto` for stable output. +4. Emit one Go file per resource, plus one `init.gen.go` that emits + `InitPatches() error` calling `Validate()` on each mapping. 5. If `--check`: byte-compare to existing files; exit 1 on any diff. ### Type compatibility (v0) -| Proto kind | SQL types accepted | Codec | -|---|---|---| -| `StringKind` | `text`, `varchar`, `citext`, `uuid` | `""` | -| `BoolKind` | `boolean` | `""` | -| `Int32Kind`, `Sint32Kind` | `integer`, `smallint` | `""` | -| `Int64Kind`, `Sint64Kind` | `bigint` | `""` | -| `MessageKind`: `google.protobuf.Timestamp` | `timestamptz`, `timestamp` | `"timestamp"` | -| `EnumKind` | `text`, `varchar` | `"enum:"` (declared) | -| anything else | — | codegen error | +| Proto kind | SQL types accepted | Codec | Notes | +|---|---|---|---| +| `StringKind` | `text`, `varchar`, `citext`, `uuid` | `""` | `uuid` columns: pgx returns `[16]byte`; runtime formats canonical string. | +| `BoolKind` | `boolean` | `""` | | +| `Int32Kind`, `Sint32Kind` | `integer` | `""` | pgx returns `int32`. | +| `Int32Kind`, `Sint32Kind` | `smallint` | `""` | pgx returns `int16`; runtime widens to `int32` before `protoreflect.Set`. | +| `Int64Kind`, `Sint64Kind` | `bigint` | `""` | pgx returns `int64`. | +| `MessageKind`: `google.protobuf.Timestamp` | `timestamptz`, `timestamp` | `"timestamp"` | | +| `EnumKind` | `text`, `varchar` | `"enum:"` | yaml-declared map keyed by enum value name. | +| **Deferred (v1)** | | | | +| `BytesKind` | `bytea` | (TBD) | Codegen rejects in v0. | +| `FloatKind`, `DoubleKind` | `real`, `double precision` | (TBD) | Codegen rejects in v0. | +| Any kind ↔ nullable column | | | Codegen rejects in v0; v1 adds `pgtype.*` decode. | +| `MessageKind` (non-Timestamp) | `jsonb` | `"jsonb"` | v1. | +| anything else | — | — | codegen error | `uuid`-as-string is special-cased: a proto `string` field maps to a `uuid` -column when the column type is `uuid`, with `[16]byte`↔string conversion in -the runtime. +column when the column type is `uuid`, with `[16]byte`↔canonical-string +conversion in the runtime. The runtime imports `github.com/google/uuid` for +the canonical formatter. ### Diagnostics @@ -313,11 +443,20 @@ aippatchgen: drill.v1.User: field "role" has unsupported type without codec overrides: { role: { codec: enum_role } } aippatchgen: drill.v1.User: writable field "display_name" not present in proto descriptor + +aippatchgen: drill.v1.User.password_hash → users.password_hash: nullable column not supported in v0 + hint: mark { skip: true } or wait for v1 nullable support. + +aippatchgen: drill.v1.User.address.street: nested mask paths not supported in v0 + +aippatchgen: drill.v1.User: auto_set column "updated_at" not found in table users + hint: ensure migrations have run and column exists. ``` ## Configuration: `aippatch.yaml` ```yaml +# Codecs are global and reusable across resources. codecs: enum_role: proto_enum: drill.v1.UserRole @@ -335,17 +474,26 @@ resources: table: users pk: id soft_delete: deleted_at - empty_mask: error # error (default) | update_writable + empty_mask: error # error → ErrorOnEmpty (default) | update_writable → UpdateAllWritable writable: [display_name] # AIP-203 deny-by-default + auto_set: + updated_at: NOW() # raw SQL, applied to every PATCH overrides: create_time: { column: created_at } role: { codec: enum_role } plan: { codec: enum_plan } + # password_hash, stripe_customer_id are nullable in users; not in proto; + # not in writable; codegen drops them with "no matching proto field" — no + # action needed. + free_full_educators_used: { skip: true } # column exists; no proto field; explicitly skipped ``` One file per repo. `~10–20` lines per resource. Reviewers see policy and mapping deltas in a single diff. Adding a writable field is one line. +The yaml-string `error` maps to the Go enum `ErrorOnEmpty`; +`update_writable` maps to `UpdateAllWritable` (v0 unsupported). + ## Generated file shape `internal/patches/user.gen.go`: @@ -360,7 +508,8 @@ import ( ) // UserPatch is the PATCH mapping for drill.v1.User → users. -var UserPatch = mustValidate(&aippatch.Mapping[*drillv1.User]{ +// Validate() is called from InitPatches() at startup, never at package init. +var UserPatch = &aippatch.Mapping[*drillv1.User]{ Table: "users", PK: "id", SoftDelete: "deleted_at", @@ -374,26 +523,42 @@ var UserPatch = mustValidate(&aippatch.Mapping[*drillv1.User]{ {Proto: "plan", Column: "plan", SQLType: "text", Writable: false, Codec: "enum:enum_plan"}, {Proto: "role", Column: "role", SQLType: "text", Writable: false, Codec: "enum:enum_role"}, }, -}) + AutoSet: []aippatch.AutoSetClause{ + {Column: "updated_at", SQLLiteral: "NOW()"}, + }, +} +``` -func mustValidate[T proto.Message](m *aippatch.Mapping[T]) *aippatch.Mapping[T] { - if err := m.Validate(); err != nil { - // Per drill's no-panic-at-init rule, the generated package exposes - // an Init() that returns the error; main wires it up. The - // mustValidate helper is only used in tests; production wiring uses - // an explicit constructor that returns (mappings, error). - panic(err) +`internal/patches/init.gen.go`: + +```go +// Code generated by aippatchgen. DO NOT EDIT. +package patches + +// InitPatches validates every generated mapping. Call once at server startup. +// Returns the first validation error encountered; never panics. +func InitPatches() error { + for _, fn := range []func() error{ + UserPatch.Validate, + // …one entry per resource… + } { + if err := fn(); err != nil { + return err + } } - return m + return nil } ``` -**Init wiring:** to comply with drill's no-panic-at-init rule, the production -build does *not* use the `mustValidate` helper. Instead `aippatchgen` emits an -`InitPatches() error` function that calls `Validate()` on every generated -mapping and returns the first error. `cmd/server/main.go` calls it during -startup and propagates the error normally. The `mustValidate` helper exists -only for test-side use where panicking is acceptable. +**Init wiring.** `cmd/server/main.go` calls `patches.InitPatches()` during +startup, propagating the error normally per drill's no-panic-at-init rule. No +package-level `mustValidate` helper is generated — `Validate()` is invoked +explicitly. This eliminates init-time panics and makes the validation point +greppable. + +`Apply` checks `m.validated` on every call; calling `Apply` before +`InitPatches()` returns `Internal` ("Mapping not initialized; call +InitPatches()") rather than a misleading "unknown field" error. ## Runtime: `Apply` walkthrough @@ -404,7 +569,15 @@ func Apply[T proto.Message]( ) (T, error) { var zero T - // 1. Mask validation + // 0. Sanity guards. + if m == nil || !m.validated { + return zero, connectInternal("aippatch.Mapping not initialized; call patches.InitPatches() during startup") + } + if any(op.Message) == nil { + return zero, connectInvalidArg("op.Message must not be nil") + } + + // 1. Mask validation. paths := op.Mask.GetPaths() if len(paths) == 0 { if m.EmptyMask == ErrorOnEmpty { @@ -413,13 +586,16 @@ func Apply[T proto.Message]( // Future: collect all writable bindings as paths. } - // 2. Resolve paths to bindings; reject unknown / non-writable. - sets := make([]string, 0, len(paths)) // go-sqlbuilder Assign exprs + // 2. Resolve paths to bindings; reject unknown / non-writable / nested. desc := op.Message.ProtoReflect().Descriptor() ub := sqlbuilder.PostgreSQL.NewUpdateBuilder() ub.Update(m.Table) + sets := make([]string, 0, len(paths)+len(m.AutoSet)) for _, p := range paths { + if strings.Contains(p, ".") { + return zero, connectInvalidArg("nested mask path not supported in v0: %q", p) + } b, ok := m.bindingsByProto[p] if !ok { return zero, connectInvalidArg("unknown field in update_mask: %q", p) @@ -431,13 +607,18 @@ func Apply[T proto.Message]( if fd == nil { return zero, connectInternal("binding/proto desync: %q", p) } - v, err := encode(op.Message, fd, b.Codec, m.codecs) // proto value → SQL parameter + v, err := encode(op.Message, fd, b.Codec, m.codecs) if err != nil { return zero, err } sets = append(sets, ub.Assign(b.Column, v)) } + + // 3. AutoSet columns: append raw SQL fragments unconditionally. + for _, a := range m.AutoSet { + sets = append(sets, fmt.Sprintf("%s = %s", a.Column, a.SQLLiteral)) + } ub.Set(sets...) - // 3. WHERE clauses. + // 4. WHERE clauses. ub.Where(ub.Equal(m.PK, op.PKValue)) if m.SoftDelete != "" { ub.Where(m.SoftDelete + " IS NULL") @@ -445,11 +626,15 @@ func Apply[T proto.Message]( for col, v := range op.Where { ub.Where(ub.Equal(col, v)) } - ub.SQL("RETURNING *") + + // 5. Returning explicit bound columns (not RETURNING *). + boundCols := make([]string, len(m.Bindings)) + for i, b := range m.Bindings { boundCols[i] = b.Column } + ub.Returning(boundCols...) sqlStr, args := ub.Build() - // 4. Execute and read back via RETURNING *. + // 6. Execute and read back. rows, err := db.Query(ctx, sqlStr, args...) if err != nil { return zero, connectInternal("query: %w", err) } defer rows.Close() @@ -464,12 +649,12 @@ func Apply[T proto.Message]( vals, err := rows.Values() if err != nil { return zero, connectInternal("scan: %w", err) } - // 5. Build result proto from input message + RETURNING values. - result := proto.Clone(op.Message).(T) + // 7. Build result proto from input message clone + RETURNING values. + result := proto.CloneOf(op.Message) msg := result.ProtoReflect() for i, c := range cols { b, ok := m.bindingsByColumn[string(c.Name)] - if !ok { continue } // unmapped column → ignore + if !ok { continue } fd := msg.Descriptor().Fields().ByName(protoreflect.Name(b.Proto)) if err := decode(msg, fd, vals[i], b.Codec, m.codecs); err != nil { return zero, connectInternal("decode %s: %w", b.Proto, err) @@ -479,8 +664,23 @@ func Apply[T proto.Message]( } ``` +Notable details: + +- **`proto.CloneOf`** (added in protobuf-go v1.36.6; project uses v1.36.11) + is type-safe: returns `T` directly, no `.(T)` assertion. +- **`Returning(boundCols...)`** sends only mapped columns over the wire from + Postgres to Go. Unmapped columns (`password_hash`, `stripe_customer_id`, + `free_full_educators_used`) are not transmitted at all — both a privacy + benefit and a guard against future schema additions appearing silently. +- **`ub.Set(sets...)`** is variadic over assignment strings. `ub.Assign(col, + val)` returns `"col = $N"` with parameterized placeholder; raw expressions + for `AutoSet` are formatted directly (`"updated_at = NOW()"`), with codegen + guaranteeing the column and literal are safe. +- **`ub.Returning(...)`** is the library's first-class API; we do not use the + marker-position-dependent `ub.SQL(...)` for this. + `encode` and `decode` are bounded switches over field kind × codec. The total -runtime is approximately 300 LoC including codec dispatch, error +runtime is approximately 350 LoC including codec dispatch, error constructors, and `Validate`. ### pgx native types on the read side @@ -490,18 +690,22 @@ constructors, and `Validate`. | `text`, `varchar`, `citext` | `string` | `StringKind` | direct | | `uuid` | `[16]byte` | `StringKind` | `uuid.UUID(v).String()` | | `boolean` | `bool` | `BoolKind` | direct | -| `integer`, `smallint` | `int32` | `Int32Kind`, `Sint32Kind` | direct | +| `integer` | `int32` | `Int32Kind`, `Sint32Kind` | direct | +| `smallint` | `int16` | `Int32Kind`, `Sint32Kind` | widen: `int32(v)` | | `bigint` | `int64` | `Int64Kind`, `Sint64Kind` | direct | | `timestamptz`, `timestamp` | `time.Time` | `MessageKind: Timestamp` | `timestamppb.New(v)` | | `text` (with enum codec) | `string` | `EnumKind` | reverse-map declared codec | -Any other pgx-native type encountered at runtime is an `Internal` error -(should have been caught by codegen's compatibility check). +**Nullable columns return `pgtype.Text`/`pgtype.Timestamptz`/etc. from +`rows.Values()` rather than the underlying scalar.** v0 codegen rejects +nullable bound columns; v1 adds explicit `pgtype.*` decode. Any unexpected +pgx-native type at runtime is `Internal` (should be impossible if codegen +accepted the binding). ## Handler call site drill's existing `UpdateProfile` (currently `internal/rpc/user/server.go:100-141`) -shrinks from ~45 lines to ~12: +shrinks from ~45 lines to ~14: ```go func (s *Server) UpdateProfile( @@ -522,7 +726,7 @@ func (s *Server) UpdateProfile( req.Msg.GetUser().DisplayName = trimmed } - updated, err := aippatch.Apply(ctx, s.b.Pool, patches.UserPatch, aippatch.Op[*drillv1.User]{ + updated, err := aippatch.Apply(ctx, s.b.Pool(), patches.UserPatch, aippatch.Op[*drillv1.User]{ Message: req.Msg.GetUser(), Mask: req.Msg.GetUpdateMask(), PKValue: u.ID, @@ -533,6 +737,8 @@ func (s *Server) UpdateProfile( } ``` +(`s.b.Pool()` is a method on `*backend.Backend` returning `*pgxpool.Pool`.) + The hand-rolled `implementedUserFields` allow-list and per-path validation loop disappear. Authorization (the unauthenticated check) and per-field value validation (the trim + non-empty check) remain in the handler. @@ -549,16 +755,21 @@ same approach with a small fixture schema independent of drill's migrations. Cases: - empty mask → `InvalidArgument` +- nil `op.Message` → `InvalidArgument` +- unvalidated mapping (`InitPatches` not called) → `Internal` - unknown mask path → `InvalidArgument` - non-writable mask path → `InvalidArgument` -- writable scalar (string, bool, int32, int64) write + read-back +- nested mask path → `InvalidArgument` +- writable scalar (string, bool, int32 from `integer`, int32 from `smallint`, + int64) write + read-back - writable timestamp write + read-back - writable enum write (valid & invalid) + read-back +- AutoSet column is bumped on every PATCH (e.g. `updated_at` advances) - soft-delete WHERE filters out deleted rows → `NotFound` - PK mismatch → `NotFound` - extra `Op.Where` predicate excludes row → `NotFound` -- `RETURNING *` populates fields not touched by mask -- `proto.Clone` preserves input-message fields that have no binding +- `Returning(boundCols)` does not include unmapped columns +- `proto.CloneOf` preserves input-message fields that have no binding ### Codegen golden tests (`thirdparty/aippatch/cmd/aippatchgen/aippatchgen_test.go`) @@ -566,36 +777,61 @@ Fixtures under `testdata/`: - `simple/` — proto + 2 migrations + yaml → expected `*.gen.go` - `name_divergence/` — `create_time` ↔ `created_at` - `enum_codec/` — proto enum + declared codec -- `unsupported_kind/` — proto with bytes field (not in v0); expects diagnostic +- `auto_set/` — `updated_at: NOW()` and verified output +- `unsupported_kind/` — proto with bytes field; expects diagnostic +- `nullable_bound/` — writable on a nullable column; expects diagnostic +- `nested_path/` — proto with submessage, attempted writable; expects diagnostic - `missing_column/` — proto field with no candidate; expects diagnostic +- `unmatched_proto_field/` — proto field neither matched nor `skip: true`; + expects diagnostic - `--check_drift/` — fixture with stale `*.gen.go`; expects exit 1 ### Handler integration test (`internal/rpc/user/server_test.go`) -Uses drill's existing `backendtest.SeedUser` to create a real user, then -exercises `UpdateProfile` end-to-end through the connect handler: -- valid PATCH on `display_name` → response carries updated User; DB row - updated. +Uses drill's existing `backendtest.SeedUser` to create a real user (matching +drill's "Test production parity" rule), then exercises `UpdateProfile` +end-to-end through the connect handler: +- valid PATCH on `display_name` → response carries updated User; DB row's + `display_name` and `updated_at` both change. - empty mask → `InvalidArgument`. - mask with `email` (non-writable) → `InvalidArgument`. - unauthenticated → `Unauthenticated`. -These tests already exist for the current implementation; they should pass -unchanged after the migration. +**Audit step in rollout (see below):** existing tests assert on error +strings such as `"field not supported for update: \"email\""`. The new +handler produces `"field not writable: \"email\""`. Existing tests must be +updated where they assert on error message text; those that assert on error +codes only need no change. ## Drill rollout plan -1. Add `thirdparty/aippatch/` runtime package and `cmd/aippatchgen/` binary. -2. Add `aippatch.yaml` at repo root with the `User` resource and enum codecs. -3. Add `internal/patches/` directory; wire `aippatchgen` into `make codegen`. -4. Add `aippatchgen --check` to `make test`. -5. Generate `internal/patches/user.gen.go`. Review the diff manually. -6. Wire `patches.InitPatches()` into `cmd/server/main.go` startup; surface - any error from `Validate()`. -7. Replace `UpdateProfile` handler body with the shrunk version. -8. Delete the `UpdateUserDisplayName` query from `sql/queries/users.sql` and - regenerate sqlc. -9. Run `make test`; existing handler tests should pass unchanged. +1. **Dependency adds.** `go get github.com/huandu/go-sqlbuilder + github.com/pganalyze/pg_query_go/v5` and commit `go.mod`/`go.sum`. + (`github.com/google/uuid` is already in `go.mod`.) +2. **Add framework.** Land `thirdparty/aippatch/` runtime + `cmd/aippatchgen/` + binary. +3. **Wire codegen.** Add `buf build -o buf.binpb` and the `aippatchgen` step + to `make codegen`. Add `aippatchgen --check` to `make test`. +4. **Add config.** `aippatch.yaml` at repo root with the `User` resource, + enum codecs, and `auto_set: { updated_at: NOW() }`. +5. **Generate.** Create `internal/patches/`; run `make codegen`. Review the + diff manually first time, including `internal/patches/user.gen.go` and + `internal/patches/init.gen.go`. +6. **Wire init.** Call `patches.InitPatches()` in `cmd/server/main.go` + startup; surface any error from `Validate()` per drill's no-panic rule. +7. **Switch handler.** Replace `UpdateProfile` handler body with the shrunk + version. Verify the proto wire contract is unchanged with golden response + bytes if needed. +8. **Audit existing tests.** Grep `internal/rpc/user/server_test.go` (and + anywhere else) for assertions on error strings such as `"field not + supported for update"`; update to the new wording (`"field not + writable"`) or relax to error-code only. +9. **Audit consumers of `b.UpdateDisplayName`.** Grep the codebase for + callers; confirm only `UpdateProfile` calls it before deletion. +10. **Delete stale sqlc.** Remove `UpdateUserDisplayName` from + `sql/queries/users.sql` and `b.UpdateDisplayName`; regenerate sqlc. +11. **Verify.** `make test` (full CI: buf lint, codegen check, frontend + typecheck+lint+tests, backend tests with race). Rollback: single-commit revert. The proto wire contract is unchanged. @@ -604,19 +840,27 @@ Rollback: single-commit revert. The proto wire contract is unchanged. Each Spanda repo gets three artifacts: 1. The `thirdparty/aippatch/` directory (initially copied from drill; once - stable, extracted to its own module and imported). -2. The `aippatchgen` binary — `go install ./thirdparty/aippatch/cmd/aippatchgen`. -3. A `aippatch.yaml` skeleton. + stable, extracted to its own module — see *Roadmap v0.5*). +2. The `aippatchgen` binary — `go install + ./thirdparty/aippatch/cmd/aippatchgen`. +3. An `aippatch.yaml` skeleton. Each project's `Makefile` wires `aippatchgen` into its `codegen` and `test` -targets. No drill-specific code is required. +targets. The runtime library and codegen binary contain no drill-specific +code; the yaml file, generated `*.gen.go`, and `Makefile` wiring are +project-specific by design. ## Roadmap | Tier | Feature | Notes | |---|---|---| -| v1 | JSONB codec | Marshals proto sub-messages or `[]byte` to `jsonb` columns. Likely first non-v0 demand. | -| v1 | Proto3 explicit-optional + NULL semantics | AIP-134 clearing rule (`mask path + zero value → NULL`); only meaningful for `optional` fields. | +| v0.5 | Extract `thirdparty/aippatch/` to its own Go module | Trigger: a second Spanda project consumes aippatch in production. New module path `github.com//aippatch`; drill's `go.mod` switches from local replace to versioned import; thirdparty/ directory removed. | +| v1 | Nullable bound columns (`pgtype.*` decode) | First wave of demand; many natural settings columns are nullable. | +| v1 | JSONB codec | Marshals proto sub-messages or `[]byte` to `jsonb` columns. | +| v1 | `bytes`, `float`, `double` proto kinds | Bytea / real / double precision support. | +| v1 | Pre/post hooks (or returned diff) for audit logging | `Apply` returns `(updated T, diff Diff, err error)` where Diff carries before/after for mask paths; handler emits audit events. | +| v1 | Proto3 explicit-optional + NULL semantics | AIP-134 clearing rule (`mask path + zero value → NULL`); meaningful for `optional` fields. | +| v1 | CHECK constraint extraction | Validate enum codec maps against `CHECK (col IN (…))` at codegen. | | v2 | Declarative validators | `NonEmptyTrimmed`, `LenBetween`, `URL`, `OneOf`. Per-resource yaml + handler-side composition. | | v2 | AIP-193 error mapping | pgx error inspection: `unique_violation` → `AlreadyExists`, `fk_violation` → `FailedPrecondition`, `not_null_violation` / `check_violation` → `InvalidArgument`. Per-resource override map. | | v3 | Per-field declarative authz | `admin_only_fields:` in yaml; layered with handler narrowing. | @@ -632,31 +876,43 @@ tiers ship. New features are opt-in via `aippatch.yaml`. 1. **`pg_query_go` is a CGO dependency.** It wraps `libpg_query`. drill's production binary builds may run with `CGO_ENABLED=0` in some paths. Mitigation: `aippatchgen` is a developer/CI tool, not part of the - production binary; CGO is only required where `aippatchgen` runs. Document - this in the package README. + production binary; CGO is only required where `aippatchgen` runs. + Document at the top of `cmd/aippatchgen/main.go`: + `// Requires CGO (libpg_query).` Add a `README.md` next to it stating + the same. CI runners must have a C toolchain — drill's CI already does + for testcontainers. 2. **pgx-native ↔ proto type drift.** New SQL types added to drill in the future may not be in the runtime's `decode` switch. Mitigation: - `aippatchgen` rejects unknown SQL types at codegen with a clear diagnostic; - the runtime never sees a type the codegen accepted. + `aippatchgen` rejects unknown SQL types at codegen with a clear + diagnostic; the runtime never sees a type the codegen accepted. + `Returning(boundCols...)` (not `RETURNING *`) further reduces blast + radius — unmapped columns are not transmitted from Postgres at all. -3. **`EmptyMaskPolicy` is wire-affecting.** Switching from `ErrorOnEmpty` to - `UpdateAllWritable` changes behavior visible to clients. Mitigation: - document as a per-resource permanent decision; `error` is the v0 default - and recommended. +3. **`EmptyMaskPolicy` is wire-affecting and AIP-134-divergent.** + `ErrorOnEmpty` deviates from AIP-134 §Update's "MUST treat omitted mask as + all populated fields" — this is documented in *Wire conformance note*. + Switching policies post-deploy is a breaking change visible to clients; + document choice per resource in API docs. 4. **Validation duplication in v0.** Per-field validation lives in handlers until v2. New PATCH RPCs added before v2 must hand-roll trimming / - non-empty / length checks. Mitigation: ship v2 quickly if the duplication + non-empty / length checks. Mitigation: ship v2 quickly if duplication becomes painful; document the v0 expectation in the README. -5. **`proto.Clone` for the result base.** `Apply` clones `op.Message` as the - starting point for the returned proto. Fields not in the RETURNING row - keep their input-message values. For PATCH this is fine (all DB-backed - fields are populated by RETURNING). For non-DB fields (rare; would only - exist if the proto carries computed-only fields), the input message's - values pass through. Mitigation: document; reject in codegen any proto - field that has no `skip: true` and no binding match. +5. **Audit logging is the caller's responsibility in v0.** Wrapping `Apply` + in a transaction is the documented pattern for atomically logging audit + events. v1 adds returned-diff support to remove the wrap. Mitigation: if + audit comes due before v1 ships, the wrap-in-tx pattern is sufficient. + +6. **Test wording change is observable.** Existing tests asserting on error + message strings (`"field not supported for update"`) will break. The + rollout plan includes an explicit audit step. + +7. **`buf.binpb` drift.** If `buf.binpb` is committed and a developer regens + `pb/*.pb.go` without re-running `buf build -o buf.binpb`, the codegen + will be stale. Mitigation: `make codegen` runs both in order; + `aippatchgen --check` in CI catches drift. ## Decisions (locked, with rationale) @@ -665,26 +921,30 @@ tiers ship. New features are opt-in via `aippatch.yaml`. | 1 | Runtime library + standalone codegen binary; not a buf plugin | Cleaner separation from buf's plugin machinery; reusable in non-buf contexts. | | 2 | Working name `aippatch`; lives at `drill/thirdparty/aippatch/` | Signals AIP-134 lineage; thirdparty/ prepares clean extraction. | | 3 | Generic `Mapping[T proto.Message]` (single type parameter) | No row-type coupling; framework is sqlc-independent. No type casts in user code. | -| 4 | Codegen consumes proto FileDescriptorSet + SQL migrations + yaml | Both schemas already on disk; yaml carries policy + overrides only. | +| 4 | Codegen consumes proto FileDescriptorSet + SQL migrations + yaml | Both schemas already on disk; yaml carries policy + overrides + auto_set only. | | 5 | Generated `*.gen.go` files committed to repo | Mapping is reviewable in PRs; CI checks for drift via `--check`. | -| 6 | SQL builder: `huandu/go-sqlbuilder` (private to package) | Mature; `PostgreSQL.NewUpdateBuilder()` emits `$1` placeholders cleanly. | -| 7 | Row scan: direct `pgx.Rows.Values()` + proto reflection (no third-party scanner) | A scanner like `scany/v2` would target a Go row struct; we populate a proto via reflection instead. Avoids an unnecessary dependency and a proto-aware shim. | -| 8 | Empty FieldMask rejected with `InvalidArgument` (default) | Strict; matches drill's existing behavior; relax later if a use case warrants. | +| 6 | SQL builder: `huandu/go-sqlbuilder` (private to package) | Mature; `PostgreSQL.NewUpdateBuilder()` emits `$1` placeholders cleanly; `Returning(...)` is a first-class method. | +| 7 | Row scan: direct `pgx.Rows.Values()` + proto reflection (no third-party scanner) | We populate a proto via reflection rather than a Go row struct; avoids an unnecessary dependency and a proto-aware shim. | +| 8 | Empty FieldMask rejected with `InvalidArgument` (default) | drill prefers explicit intent; documented divergence from AIP-134; relax later if a use case warrants. | | 9 | Deny-by-default writable; opt in via `writable:` list | AIP-203; security posture. | | 10 | Codegen errors on unsupported field types | Bad fields stop at codegen; runtime never sees a type it cannot handle. | -| 11 | Framework reads back via `RETURNING *` and returns the populated proto | One round-trip; AIP-134 compliant on the wire; pulls codec set up to cover every type in target protos. | -| 12 | v0 codec set: scalars + timestamps + enum | Smallest set that covers drill's `User` and most Spanda CRUD shapes. JSONB and others in v1+. | +| 11 | Framework reads back via `RETURNING ` (not `*`) and returns the populated proto | One round-trip; AIP-134 compliant; explicit column list excludes sensitive unmapped columns from the wire. | +| 12 | v0 codec set: scalars + timestamps + enum, NOT-NULL columns only | Smallest set that covers drill's `User` and most Spanda CRUD shapes. JSONB / nullable / bytes / float in v1. | | 13 | v0 first user: drill's `UpdateProfile` | Validates the framework against an existing target; replaces the most boilerplate-heavy code path today. | -| 14 | Boot validation via `Mapping.Validate() error` propagated to `main` | drill's no-panic-at-init rule. | +| 14 | Boot validation via `Mapping.Validate() error` propagated to `main` through generated `InitPatches() error` | drill's no-panic-at-init rule. No `mustValidate` panic helper in generated code. | +| 15 | `AutoSet` clauses (e.g. `updated_at: NOW()`) declared per-resource in yaml | Replaces sqlc's hand-rolled `updated_at = NOW()` in every UPDATE; codegen-checked column existence and NOT-NULL. | +| 16 | Always commit `buf.binpb` and regenerate via `buf build -o buf.binpb` | Eliminates need for a live buf-build dependency at codegen time; CI's `aippatchgen --check` catches drift. | ## Open questions (deferred) -- **JSONB shape** — for v1: do we marshal proto sub-messages as JSON via - `protojson`, or accept opaque `[]byte` from the handler? Trade-offs around - schema evolution. -- **AIP-154 ETag column type** — `bigint` counter, `uuid` token, or - per-resource choice? Defer to v3 when the use case is concrete. -- **Buf plugin migration** — when (and if) v4 ships, the yaml format remains - the source of truth for policy; only mappings move to proto annotations. - Migration path TBD. - +- **JSONB shape (v1)** — for proto sub-messages, marshal via `protojson` or + accept opaque `[]byte` from the handler? Trade-offs around schema + evolution. +- **AIP-154 ETag column type (v3)** — `bigint` counter, `uuid` token, or + per-resource choice? Defer until use case is concrete. +- **Buf plugin migration path (v4)** — when (and if) v4 ships, the yaml + format remains the source of truth for policy; only mappings move to proto + annotations. Migration mechanics TBD. +- **Diff API shape (v1)** — for audit logging, is the returned `Diff` a + `map[string]struct{Before, After any}`, or a typed proto-aware structure? + Decide when v1 work begins. From 9e74abbdd2b6b178c3073865f766b561d966e584 Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Thu, 7 May 2026 22:49:31 -0400 Subject: [PATCH 04/37] =?UTF-8?q?docs(spec):=20aippatch=20=E2=80=94=20appl?= =?UTF-8?q?y=20round-2=20review=20fixes?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round 2 surfaced one structural bug (m.codecs referenced but missing from Mapping struct), pg_query_go v5/v6 drift, several real correctness issues, and several smaller items. Fixes: HIGH - Add Codecs map[string]EnumCodec plumbing: shared Codecs registry in generated init.gen.go, passed into Mapping.Validate(codecs) which copies the relevant subset into unexported m.codecs. Apply now references m.codecs that actually exists. - Add EnumCodec type to public API (ProtoEnum, ToText, FromText). - Scrub pg_query_go to v6 throughout (latest published; v6.2.2). - Tone down Returning(boundCols) "privacy" claim — bound non-writable columns ARE transmitted (e.g. email); only unmapped columns are excluded. MEDIUM - Fix typed-nil trap: any(op.Message) == nil is false for a nil *T; switch to op.Message.ProtoReflect().IsValid(). - Sort op.Where keys before iterating to keep SQL deterministic (otherwise pgx prepared-statement cache misses on every call). - Validate op.Where keys exist in m.bindingsByColumn — prevents attacker-controlled identifier injection via the map-key surface. - Tighten AutoSet conflict check: reject if column is *any* binding (writable or not), not just writable. - Add pg_query_go-based validation for AutoSet SQL literal: parse and reject multi-statement / non-expression input. - Remove faulty test-wording audit step — verified existing tests assert via connect.CodeOf(err), not on message text. - Document soft-deleted user wire change: today's UpdateProfile returns Internal; aippatch returns NotFound. Add to Wire conformance note, rollout plan, and Risks. - Fix cmd/server/main.go → cmd/drill/main.go (drill's actual binary path). LOW - UpdateAllWritable empty-mask path: explicit Internal in v0 (was silently emitting AutoSet-only UPDATE). - Fix uuid.UUID(v).String() → uuid.UUID(v.([16]byte)).String() in type table. - Switch validated bool to atomic.Bool for race safety under -race. - Replace fictional maskHasPath with slices.Contains in handler example. - Add field-number-vs-alphabetical ordering explanation to algorithm. - Add GetPaths nil-safety note. - Note PK appears in Bindings (no dedup with PK column). - AutoSet test pattern: SELECT before / Apply / SELECT after — within a tx, NOW() is constant, so the framework's separate query advances it. - Remove dead yaml override comments (free_full_educators_used skip entry was never visited). NIT - AIP-203 → AIP-134 §Update_Mask citation. - "stale" → "superseded" sqlc deletion wording. - Architecture diagram pg_query_go shows version. Spec grew from ~770 to ~830 lines. --- .../specs/2026-05-07-aippatch-design.md | 329 ++++++++++++------ 1 file changed, 229 insertions(+), 100 deletions(-) diff --git a/docs/superpowers/specs/2026-05-07-aippatch-design.md b/docs/superpowers/specs/2026-05-07-aippatch-design.md index 8db1bf7b..72357965 100644 --- a/docs/superpowers/specs/2026-05-07-aippatch-design.md +++ b/docs/superpowers/specs/2026-05-07-aippatch-design.md @@ -71,6 +71,14 @@ resource out) with two intentional divergences: the codec set excludes nested message types. Codegen rejects yaml entries whose proto fields would require nested support. Roadmap: v1+. +A separate behavior change is wire-visible during the drill migration: today's +`UpdateProfile` returns `Internal` when the user is soft-deleted (the sqlc +`UpdateUserDisplayName` includes `AND deleted_at IS NULL`, returns +`pgx.ErrNoRows`, the handler wraps as `Internal`). aippatch returns +`NotFound` for the same case (zero-row update). This is the more correct AIP +behavior. The rollout plan calls it out so clients depending on the prior +code can adapt. + All other AIP-134 requirements (return the updated resource, honor mask paths that are valid, reject unknown paths) are upheld. @@ -105,7 +113,7 @@ A tenth concern — **audit logging** — is intentionally out of v0; callers wr │ pb/drill/v1/*.proto │ │ sql/migrations/*.up.sql │ │ (proto contracts) │ │ (logical schema) │ └──────────┬──────────────┘ └────────────┬─────────────┘ - │ buf build -o buf.binpb │ pg_query_go + │ buf build -o buf.binpb │ pg_query_go/v6 │ (FileDescriptorSet) │ ▼ ▼ ┌──────────────────────────────────────────────────────┐ @@ -120,6 +128,7 @@ A tenth concern — **audit logging** — is intentionally out of v0; callers wr ┌──────────────────────────────────────────────────────┐ │ internal/patches/*.gen.go (committed) │ │ e.g. var UserPatch = aippatch.Mapping[*User]{ … } │ + │ var Codecs = map[string]aippatch.EnumCodec{ … } │ │ func InitPatches() error { Validate all mappings } │ └─────────────────┬────────────────────────────────────┘ │ imported by @@ -134,7 +143,7 @@ A tenth concern — **audit logging** — is intentionally out of v0; callers wr ┌──────────────────────────────────────────────────────┐ │ thirdparty/aippatch/ (runtime library) │ │ • Mapping[T], Binding, AutoSetClause, Op[T], │ - │ EmptyMaskPolicy │ + │ EmptyMaskPolicy, EnumCodec │ │ • Apply[T] — validate → build → exec → scan │ │ • Codec dispatch: scalar / timestamp / enum │ │ • Self-contained: no drill imports │ @@ -151,7 +160,7 @@ Five components, three new: via `pgx.Rows.Values()` and populates the proto via reflection — no third-party row scanner is needed. 2. **`thirdparty/aippatch/cmd/aippatchgen/`** — codegen binary. Imports: - `google.golang.org/protobuf` + `github.com/pganalyze/pg_query_go/v5`. + `google.golang.org/protobuf` + `github.com/pganalyze/pg_query_go/v6`. Requires CGO (libpg_query); see *Risks* §1. 3. **`aippatch.yaml`** — at the repo root. Source of truth for codecs, resource bindings, name overrides, writable allow-list, and always-set @@ -160,7 +169,9 @@ Five components, three new: Pre-existing components shrink: 4. **`internal/patches/*.gen.go`** — committed generated code, one file per - resource, plus a single `init.gen.go` that emits `InitPatches() error`. + resource, plus a single `init.gen.go` that emits the shared + `var Codecs = map[string]aippatch.EnumCodec{...}` registry and + `InitPatches() error`. 5. **`internal/rpc//server.go`** — handlers shrink to ~12 lines. ### Boundary properties @@ -181,6 +192,8 @@ Pre-existing components shrink: ```go package aippatch +import "sync/atomic" + // Mapping is the static description of how a proto message round-trips // through a SQL table. Generated by aippatchgen; never hand-edited. type Mapping[T proto.Message] struct { @@ -194,7 +207,8 @@ type Mapping[T proto.Message] struct { // Populated by Validate(); unexported. bindingsByProto map[string]*Binding bindingsByColumn map[string]*Binding - validated bool // guard against use before InitPatches() + codecs map[string]EnumCodec // shared codec map passed in via Validate + validated atomic.Bool // Apply rejects unvalidated mappings } // Binding pairs one proto field with one SQL column. @@ -209,18 +223,30 @@ type Binding struct { // AutoSetClause defines a SQL expression always written into the SET clause. // Typical use: { Column: "updated_at", SQLLiteral: "NOW()" }. The literal is // emitted as raw SQL — never sourced from user input. Codegen verifies the -// column exists in the table and is NOT NULL. +// column exists in the table, is NOT NULL, and is not also a binding. type AutoSetClause struct { Column string SQLLiteral string } +// EnumCodec maps a proto enum to/from a SQL text column. Generated literal +// form lives in patches/init.gen.go's Codecs registry. Validate() copies +// the relevant subset into Mapping.codecs. +type EnumCodec struct { + ProtoEnum string // e.g. "drill.v1.UserRole" + ToText map[int32]string // proto enum number → SQL text + FromText map[string]int32 // SQL text → proto enum number (built by Validate from ToText) +} + // Op carries a single PATCH invocation's runtime data. type Op[T proto.Message] struct { Message T // input proto carrying the new values; must be non-nil Mask *fieldmaskpb.FieldMask // which fields to apply PKValue any // value for the PK column (e.g. uuid.UUID) Where map[string]any // optional extra equality predicates + // KEYS MUST BE BOUND COLUMNS — runtime validates + // against m.bindingsByColumn before composing SQL. + // Values are pgx-parameterized; keys are NOT escaped. } // DBTX is the minimal pgx interface aippatch needs. It is a strict subset of @@ -248,25 +274,33 @@ func Apply[T proto.Message]( m *Mapping[T], op Op[T], ) (T, error) -// Validate is called by generated InitPatches(); checks every binding's -// proto path against the descriptor of T, verifies AutoSet columns exist, -// and indexes binding maps. Returns error rather than panicking, per drill's -// no-panic-at-init rule. Sets m.validated = true on success. -func (m *Mapping[T]) Validate() error +// Validate is called once at startup, by InitPatches(). It indexes the +// binding maps, copies relevant codecs into m.codecs (subset reachable from +// m.Bindings), verifies every binding's proto path exists on T, and sets +// the validated atomic flag. Returns error rather than panicking, per +// drill's no-panic-at-init rule. Idempotent up to the validated flag — +// repeated calls return nil after the first success. +func (m *Mapping[T]) Validate(codecs map[string]EnumCodec) error ``` -The codec registry inside the runtime is keyed by the `Codec` string on each -Binding (the yaml codec name, prefixed with `"enum:"` for enum codecs): +### Codec dispatch + +The codec field on each Binding selects how the proto value is encoded to +SQL and decoded back: - `""` — scalar pass-through (`StringKind`, `BoolKind`, `Int32Kind`, - `Sint32Kind`, `Int64Kind`, `Sint64Kind`) + `Sint32Kind`, `Int64Kind`, `Sint64Kind`). - `"timestamp"` — `google.protobuf.Timestamp` ↔ `time.Time` via `.AsTime()` / - `timestamppb.New(...)` -- `"enum:"` — proto enum number ↔ SQL text via a declared map; both - directions look up the codec by yaml name. Out-of-map values on the read - side return `Internal` (data invariant violation); on the write side return - `InvalidArgument`. Codecs are global to the yaml file and reusable across - resources. + `timestamppb.New(...)`. +- `"enum:"` — proto enum number ↔ SQL text via `m.codecs[]`. + Out-of-map values on the read side return `Internal` (data invariant + violation); on the write side return `InvalidArgument`. Codecs are global + to the yaml file and reusable across resources. + +`Validate()` populates `m.codecs` from the shared `Codecs` map by selecting +only the codec names referenced by `m.Bindings`. The resulting per-Mapping +map is read-only after `Validate` returns, eliminating any concern about +shared-state mutation. ### Errors @@ -274,9 +308,9 @@ All errors returned by `Apply` are `*connect.Error` with appropriate codes: - `CodeInvalidArgument` — empty mask (when policy is `ErrorOnEmpty`), unknown mask path, non-writable mask path, nested mask path, enum write value not - in declared map, nil `op.Message`. -- `CodeNotFound` — `UPDATE` matched zero rows (PK wrong or scope filter - excluded the row). + in declared map, nil `op.Message`, `op.Where` key not in `bindingsByColumn`. +- `CodeNotFound` — `UPDATE` matched zero rows (PK wrong, scope filter + excluded the row, or row is soft-deleted). - `CodeInternal` — pgx error, codec read-side data invariant violation, binding/proto desync that escaped boot-time `Validate`, or `Apply` called on an unvalidated `Mapping` (`InitPatches()` not invoked). @@ -340,7 +374,7 @@ and verify if desired). set). Unmarshaled into `*descriptorpb.FileDescriptorSet`; walked via `protoreflect.FileDescriptor`. 2. **`sql/migrations/*.up.sql`** — read in lexical order. Each statement - parsed by `pg_query_go`. The tool accumulates a logical schema: + parsed by `pg_query_go/v6`. The tool accumulates a logical schema: - `CREATE TABLE` → register table with columns `(name, type, not_null, default_present)`. - `ALTER TABLE … ADD COLUMN` → add column. @@ -366,8 +400,8 @@ and verify if desired). 3. For each `resources[i]` in `aippatch.yaml`: 1. Look up the proto message descriptor by full name. 2. Look up the SQL table from the schema; resolve the PK column. - 3. For each proto field in the message, in field-number order (so error - messages are stable): + 3. For each proto field in the message, in **field-number order** (so + diagnostic messages line up with the proto file's declaration order): - If `overrides[field].skip` is true → drop. - If `overrides[field].column` set → use that column. - Else → snake-case name match with the SQL column list. @@ -390,19 +424,25 @@ and verify if desired). `proto.CloneOf`-base case from silently passing through unmapped fields with input-message values). 5. For each `auto_set[col]` entry: verify the column exists in the table - and is NOT NULL. The literal expression is emitted verbatim as raw SQL - — never user input. Reject if column is a writable binding (would - conflict). - 6. Sort bindings alphabetically by `Proto` for stable output. -4. Emit one Go file per resource, plus one `init.gen.go` that emits - `InitPatches() error` calling `Validate()` on each mapping. + and is NOT NULL. **Reject if the column is *any* binding** (writable + or non-writable) — auto-set must own the column entirely. The literal + expression is emitted verbatim as raw SQL; codegen further validates + the literal by parsing it with `pg_query_go` and rejecting + multi-statement input or non-expression payloads. (v0 effectively + restricts callers to a few well-known forms: `NOW()`, + `CURRENT_TIMESTAMP`, integer constants, etc.) + 6. Sort bindings alphabetically by `Proto` for **stable diff output** + (different from step 3's processing order). +4. Emit one Go file per resource, plus one `init.gen.go` that emits a + shared `var Codecs = map[string]aippatch.EnumCodec{...}` registry and + `InitPatches() error` calling `Validate(Codecs)` on each mapping. 5. If `--check`: byte-compare to existing files; exit 1 on any diff. ### Type compatibility (v0) | Proto kind | SQL types accepted | Codec | Notes | |---|---|---|---| -| `StringKind` | `text`, `varchar`, `citext`, `uuid` | `""` | `uuid` columns: pgx returns `[16]byte`; runtime formats canonical string. | +| `StringKind` | `text`, `varchar`, `citext`, `uuid` | `""` | `uuid` columns: pgx returns `[16]byte`; runtime decodes via `uuid.UUID(v.([16]byte)).String()`. | | `BoolKind` | `boolean` | `""` | | | `Int32Kind`, `Sint32Kind` | `integer` | `""` | pgx returns `int32`. | | `Int32Kind`, `Sint32Kind` | `smallint` | `""` | pgx returns `int16`; runtime widens to `int32` before `protoreflect.Set`. | @@ -451,6 +491,9 @@ aippatchgen: drill.v1.User.address.street: nested mask paths not supported in v0 aippatchgen: drill.v1.User: auto_set column "updated_at" not found in table users hint: ensure migrations have run and column exists. + +aippatchgen: drill.v1.User: auto_set column "display_name" conflicts with binding + hint: auto_set columns must not also be bindings; remove the proto field's binding or pick a different column. ``` ## Configuration: `aippatch.yaml` @@ -475,24 +518,27 @@ resources: pk: id soft_delete: deleted_at empty_mask: error # error → ErrorOnEmpty (default) | update_writable → UpdateAllWritable - writable: [display_name] # AIP-203 deny-by-default + writable: [display_name] # deny-by-default auto_set: - updated_at: NOW() # raw SQL, applied to every PATCH + updated_at: NOW() # raw SQL, applied to every PATCH; pg_query_go-validated as expression overrides: create_time: { column: created_at } role: { codec: enum_role } plan: { codec: enum_plan } - # password_hash, stripe_customer_id are nullable in users; not in proto; - # not in writable; codegen drops them with "no matching proto field" — no - # action needed. - free_full_educators_used: { skip: true } # column exists; no proto field; explicitly skipped ``` +The `users` table has SQL-only columns (`password_hash`, `stripe_customer_id`, +`free_full_educators_used`, `updated_at`, `deleted_at`) that have no +corresponding `User` proto field. Codegen iterates *proto fields*, not SQL +columns, so SQL-only columns are simply not visited and produce no diagnostic +or binding. No yaml entry is needed for them. + One file per repo. `~10–20` lines per resource. Reviewers see policy and mapping deltas in a single diff. Adding a writable field is one line. The yaml-string `error` maps to the Go enum `ErrorOnEmpty`; -`update_writable` maps to `UpdateAllWritable` (v0 unsupported). +`update_writable` maps to `UpdateAllWritable` (v0 unsupported — codegen +emits a diagnostic until v1 lands). ## Generated file shape @@ -508,7 +554,7 @@ import ( ) // UserPatch is the PATCH mapping for drill.v1.User → users. -// Validate() is called from InitPatches() at startup, never at package init. +// Validate(Codecs) is called from InitPatches() at startup, never at package init. var UserPatch = &aippatch.Mapping[*drillv1.User]{ Table: "users", PK: "id", @@ -529,20 +575,50 @@ var UserPatch = &aippatch.Mapping[*drillv1.User]{ } ``` +The `id` column appears as a non-writable binding because the proto field +`id` should round-trip in the response. The framework does not deduplicate +PK from Bindings — PK identifies the row to update via `WHERE`, while +Bindings carries the read-back representation. + `internal/patches/init.gen.go`: ```go // Code generated by aippatchgen. DO NOT EDIT. package patches +import ( + drillv1 "github.com/btc/drill/internal/pb/drill/v1" + "github.com/btc/drill/thirdparty/aippatch" +) + +// Codecs is the shared registry of enum codecs declared in aippatch.yaml. +// Validate(Codecs) selects the codecs reachable from each Mapping's Bindings. +var Codecs = map[string]aippatch.EnumCodec{ + "enum_role": { + ProtoEnum: "drill.v1.UserRole", + ToText: map[int32]string{ + int32(drillv1.UserRole_USER_ROLE_CANDIDATE): "candidate", + int32(drillv1.UserRole_USER_ROLE_ADMIN): "admin", + }, + // FromText is built by Validate from ToText; no need to emit twice. + }, + "enum_plan": { + ProtoEnum: "drill.v1.UserPlan", + ToText: map[int32]string{ + int32(drillv1.UserPlan_USER_PLAN_FREE): "free", + int32(drillv1.UserPlan_USER_PLAN_PRO): "pro", + }, + }, +} + // InitPatches validates every generated mapping. Call once at server startup. // Returns the first validation error encountered; never panics. func InitPatches() error { - for _, fn := range []func() error{ + for _, fn := range []func(map[string]aippatch.EnumCodec) error{ UserPatch.Validate, // …one entry per resource… } { - if err := fn(); err != nil { + if err := fn(Codecs); err != nil { return err } } @@ -550,15 +626,18 @@ func InitPatches() error { } ``` -**Init wiring.** `cmd/server/main.go` calls `patches.InitPatches()` during +**Init wiring.** `cmd/drill/main.go` calls `patches.InitPatches()` during startup, propagating the error normally per drill's no-panic-at-init rule. No -package-level `mustValidate` helper is generated — `Validate()` is invoked -explicitly. This eliminates init-time panics and makes the validation point -greppable. +package-level `mustValidate` helper is generated — `Validate(Codecs)` is +invoked explicitly. This eliminates init-time panics and makes the validation +point greppable. -`Apply` checks `m.validated` on every call; calling `Apply` before +`Apply` reads the `validated` atomic on every call; calling `Apply` before `InitPatches()` returns `Internal` ("Mapping not initialized; call -InitPatches()") rather than a misleading "unknown field" error. +InitPatches()") rather than a misleading "unknown field" error. The atomic +also prevents data races under `-race` if `InitPatches` and `Apply` are +hypothetically interleaved (in practice `InitPatches` runs synchronously +before any RPC handler is registered). ## Runtime: `Apply` walkthrough @@ -570,20 +649,26 @@ func Apply[T proto.Message]( var zero T // 0. Sanity guards. - if m == nil || !m.validated { + if m == nil || !m.validated.Load() { return zero, connectInternal("aippatch.Mapping not initialized; call patches.InitPatches() during startup") } - if any(op.Message) == nil { + // ProtoReflect().IsValid() returns false for typed-nil pointers and + // un-initialized messages, sidestepping the typed-nil interface trap + // (`any(op.Message) == nil` is false for a nil *T). + if op.Message.ProtoReflect() == nil || !op.Message.ProtoReflect().IsValid() { return zero, connectInvalidArg("op.Message must not be nil") } - // 1. Mask validation. + // 1. Mask validation. GetPaths is nil-safe (returns nil for both nil + // mask and empty paths), so a single len() check covers both. paths := op.Mask.GetPaths() if len(paths) == 0 { if m.EmptyMask == ErrorOnEmpty { return zero, connectInvalidArg("update_mask must not be empty") } - // Future: collect all writable bindings as paths. + // UpdateAllWritable: not implemented in v0. Reject explicitly to + // avoid silently emitting an AutoSet-only UPDATE. + return zero, connectInternal("UpdateAllWritable is unimplemented in v0") } // 2. Resolve paths to bindings; reject unknown / non-writable / nested. @@ -618,13 +703,24 @@ func Apply[T proto.Message]( } ub.Set(sets...) - // 4. WHERE clauses. + // 4. WHERE clauses. Sort op.Where keys for deterministic SQL (so pgx's + // prepared-statement cache hits across calls with the same shape). ub.Where(ub.Equal(m.PK, op.PKValue)) if m.SoftDelete != "" { ub.Where(m.SoftDelete + " IS NULL") } - for col, v := range op.Where { - ub.Where(ub.Equal(col, v)) + if len(op.Where) > 0 { + keys := make([]string, 0, len(op.Where)) + for k := range op.Where { keys = append(keys, k) } + sort.Strings(keys) + for _, col := range keys { + // Reject columns that are not bound — avoids any chance of + // attacker-controlled identifiers reaching the SQL string. + if _, ok := m.bindingsByColumn[col]; !ok { + return zero, connectInvalidArg("unknown column in op.Where: %q", col) + } + ub.Where(ub.Equal(col, op.Where[col])) + } } // 5. Returning explicit bound columns (not RETURNING *). @@ -643,7 +739,7 @@ func Apply[T proto.Message]( if err := rows.Err(); err != nil { return zero, connectInternal("query: %w", err) } - return zero, connectNotFound("resource not found or filtered out") + return zero, connectNotFound("resource not found, soft-deleted, or filtered out") } cols := rows.FieldDescriptions() vals, err := rows.Values() @@ -669,15 +765,23 @@ Notable details: - **`proto.CloneOf`** (added in protobuf-go v1.36.6; project uses v1.36.11) is type-safe: returns `T` directly, no `.(T)` assertion. - **`Returning(boundCols...)`** sends only mapped columns over the wire from - Postgres to Go. Unmapped columns (`password_hash`, `stripe_customer_id`, - `free_full_educators_used`) are not transmitted at all — both a privacy - benefit and a guard against future schema additions appearing silently. + Postgres to Go. Unmapped columns (e.g. `password_hash`, + `stripe_customer_id`) are not transmitted at all. Note: bound non-writable + columns (e.g. `email`) ARE transmitted — they appear in the response proto + per AIP-134's "return the updated resource" requirement. The benefit of + `Returning(boundCols)` over `RETURNING *` is excluding *unmapped* + columns, not all non-writable ones. - **`ub.Set(sets...)`** is variadic over assignment strings. `ub.Assign(col, val)` returns `"col = $N"` with parameterized placeholder; raw expressions for `AutoSet` are formatted directly (`"updated_at = NOW()"`), with codegen - guaranteeing the column and literal are safe. + guaranteeing the column and literal are safe (column existence + NOT NULL + + not-also-a-binding; literal parsed by `pg_query_go` as a single + expression). - **`ub.Returning(...)`** is the library's first-class API; we do not use the marker-position-dependent `ub.SQL(...)` for this. +- **`op.Where` keys are validated** against `m.bindingsByColumn` — keys must + be bound columns. Values are pgx-parameterized; keys are not escaped, and + the validation step is the safety guarantee. `encode` and `decode` are bounded switches over field kind × codec. The total runtime is approximately 350 LoC including codec dispatch, error @@ -688,12 +792,12 @@ constructors, and `Validate`. | SQL type | pgx returns | Proto kind expected | Conversion | |---|---|---|---| | `text`, `varchar`, `citext` | `string` | `StringKind` | direct | -| `uuid` | `[16]byte` | `StringKind` | `uuid.UUID(v).String()` | +| `uuid` | `[16]byte` | `StringKind` | `uuid.UUID(v.([16]byte)).String()` | | `boolean` | `bool` | `BoolKind` | direct | | `integer` | `int32` | `Int32Kind`, `Sint32Kind` | direct | -| `smallint` | `int16` | `Int32Kind`, `Sint32Kind` | widen: `int32(v)` | +| `smallint` | `int16` | `Int32Kind`, `Sint32Kind` | widen: `int32(v.(int16))` | | `bigint` | `int64` | `Int64Kind`, `Sint64Kind` | direct | -| `timestamptz`, `timestamp` | `time.Time` | `MessageKind: Timestamp` | `timestamppb.New(v)` | +| `timestamptz`, `timestamp` | `time.Time` | `MessageKind: Timestamp` | `timestamppb.New(v.(time.Time))` | | `text` (with enum codec) | `string` | `EnumKind` | reverse-map declared codec | **Nullable columns return `pgtype.Text`/`pgtype.Timestamptz`/etc. from @@ -718,7 +822,7 @@ func (s *Server) UpdateProfile( } // Per-field validation lives in the handler in v0. v2 makes it declarative. - if maskHasPath(req.Msg.GetUpdateMask(), "display_name") { + if slices.Contains(req.Msg.GetUpdateMask().GetPaths(), "display_name") { trimmed := strings.TrimSpace(req.Msg.GetUser().GetDisplayName()) if trimmed == "" { return nil, connect.NewError(connect.CodeInvalidArgument, errors.New("display_name must not be empty")) @@ -755,21 +859,28 @@ same approach with a small fixture schema independent of drill's migrations. Cases: - empty mask → `InvalidArgument` -- nil `op.Message` → `InvalidArgument` +- nil `op.Message` (typed-nil pointer) → `InvalidArgument` - unvalidated mapping (`InitPatches` not called) → `Internal` - unknown mask path → `InvalidArgument` - non-writable mask path → `InvalidArgument` - nested mask path → `InvalidArgument` +- `op.Where` key not in bindings → `InvalidArgument` - writable scalar (string, bool, int32 from `integer`, int32 from `smallint`, int64) write + read-back - writable timestamp write + read-back - writable enum write (valid & invalid) + read-back -- AutoSet column is bumped on every PATCH (e.g. `updated_at` advances) +- AutoSet column is bumped on every PATCH. Test pattern: open a + `pgx.BeginFunc`, `SELECT updated_at` before, run `Apply`, `SELECT + updated_at` after — assert post > pre. Within a single transaction `NOW()` + returns the same instant; the framework executes the UPDATE in a separate + query so `NOW()` advances. (Alternatively use `clock_timestamp()` if needed.) - soft-delete WHERE filters out deleted rows → `NotFound` - PK mismatch → `NotFound` -- extra `Op.Where` predicate excludes row → `NotFound` +- extra `Op.Where` predicate (valid bound column) excludes row → `NotFound` - `Returning(boundCols)` does not include unmapped columns - `proto.CloneOf` preserves input-message fields that have no binding +- map-iteration determinism: same `Op.Where` produces the same SQL string + (assert via `pgx.LogQuery` or capturing builder output) ### Codegen golden tests (`thirdparty/aippatch/cmd/aippatchgen/aippatchgen_test.go`) @@ -778,6 +889,8 @@ Fixtures under `testdata/`: - `name_divergence/` — `create_time` ↔ `created_at` - `enum_codec/` — proto enum + declared codec - `auto_set/` — `updated_at: NOW()` and verified output +- `auto_set_conflict/` — `auto_set` column also a binding; expects diagnostic +- `auto_set_bad_literal/` — yaml literal is multi-statement; expects diagnostic - `unsupported_kind/` — proto with bytes field; expects diagnostic - `nullable_bound/` — writable on a nullable column; expects diagnostic - `nested_path/` — proto with submessage, attempted writable; expects diagnostic @@ -788,25 +901,28 @@ Fixtures under `testdata/`: ### Handler integration test (`internal/rpc/user/server_test.go`) -Uses drill's existing `backendtest.SeedUser` to create a real user (matching -drill's "Test production parity" rule), then exercises `UpdateProfile` -end-to-end through the connect handler: +Existing tests in this file already assert via `connect.CodeOf(err)` (per +`internal/rpc/user/server_test.go:151`, `:169`, `:187`, etc.) — *not* on +error message text — so the migration to aippatch does **not** require +updating any test assertions. + +The existing cases continue to cover the right wire behaviors: - valid PATCH on `display_name` → response carries updated User; DB row's `display_name` and `updated_at` both change. - empty mask → `InvalidArgument`. -- mask with `email` (non-writable) → `InvalidArgument`. +- unknown / non-writable mask path → `InvalidArgument`. +- empty display_name → `InvalidArgument` (handler-side validation still + runs). - unauthenticated → `Unauthenticated`. -**Audit step in rollout (see below):** existing tests assert on error -strings such as `"field not supported for update: \"email\""`. The new -handler produces `"field not writable: \"email\""`. Existing tests must be -updated where they assert on error message text; those that assert on error -codes only need no change. +One new case added by the rollout: +- soft-deleted user PATCH → `NotFound` (was `Internal`; see *Wire conformance + note*). ## Drill rollout plan 1. **Dependency adds.** `go get github.com/huandu/go-sqlbuilder - github.com/pganalyze/pg_query_go/v5` and commit `go.mod`/`go.sum`. + github.com/pganalyze/pg_query_go/v6` and commit `go.mod`/`go.sum`. (`github.com/google/uuid` is already in `go.mod`.) 2. **Add framework.** Land `thirdparty/aippatch/` runtime + `cmd/aippatchgen/` binary. @@ -817,23 +933,22 @@ codes only need no change. 5. **Generate.** Create `internal/patches/`; run `make codegen`. Review the diff manually first time, including `internal/patches/user.gen.go` and `internal/patches/init.gen.go`. -6. **Wire init.** Call `patches.InitPatches()` in `cmd/server/main.go` - startup; surface any error from `Validate()` per drill's no-panic rule. +6. **Wire init.** Call `patches.InitPatches()` in `cmd/drill/main.go` + startup; surface any error from `Validate(Codecs)` per drill's no-panic + rule. 7. **Switch handler.** Replace `UpdateProfile` handler body with the shrunk - version. Verify the proto wire contract is unchanged with golden response - bytes if needed. -8. **Audit existing tests.** Grep `internal/rpc/user/server_test.go` (and - anywhere else) for assertions on error strings such as `"field not - supported for update"`; update to the new wording (`"field not - writable"`) or relax to error-code only. -9. **Audit consumers of `b.UpdateDisplayName`.** Grep the codebase for + version. Verify the proto wire contract is unchanged (or document the + one wire change: soft-deleted user now returns `NotFound`, was `Internal`). +8. **Audit consumers of `b.UpdateDisplayName`.** Grep the codebase for callers; confirm only `UpdateProfile` calls it before deletion. -10. **Delete stale sqlc.** Remove `UpdateUserDisplayName` from - `sql/queries/users.sql` and `b.UpdateDisplayName`; regenerate sqlc. -11. **Verify.** `make test` (full CI: buf lint, codegen check, frontend +9. **Remove superseded sqlc.** Delete `UpdateUserDisplayName` from + `sql/queries/users.sql` and `b.UpdateDisplayName`; regenerate sqlc. +10. **Verify.** `make test` (full CI: buf lint, codegen check, frontend typecheck+lint+tests, backend tests with race). -Rollback: single-commit revert. The proto wire contract is unchanged. +Rollback: single-commit revert. The proto wire contract is unchanged; only +the soft-deleted-user code differs (and clients should treat +`Internal`/`NotFound` symmetrically as transient or missing-resource). ## Spanda replication @@ -848,7 +963,9 @@ Each Spanda repo gets three artifacts: Each project's `Makefile` wires `aippatchgen` into its `codegen` and `test` targets. The runtime library and codegen binary contain no drill-specific code; the yaml file, generated `*.gen.go`, and `Makefile` wiring are -project-specific by design. +project-specific by design. (Note for the eventual extraction: scrub +drill-specific examples from `thirdparty/aippatch/README.md` before +publishing the module.) ## Roadmap @@ -861,6 +978,7 @@ project-specific by design. | v1 | Pre/post hooks (or returned diff) for audit logging | `Apply` returns `(updated T, diff Diff, err error)` where Diff carries before/after for mask paths; handler emits audit events. | | v1 | Proto3 explicit-optional + NULL semantics | AIP-134 clearing rule (`mask path + zero value → NULL`); meaningful for `optional` fields. | | v1 | CHECK constraint extraction | Validate enum codec maps against `CHECK (col IN (…))` at codegen. | +| v1 | `UpdateAllWritable` empty-mask policy | Implement the "all populated/writable fields" path; until then v0 returns `Internal` if the policy is set. | | v2 | Declarative validators | `NonEmptyTrimmed`, `LenBetween`, `URL`, `OneOf`. Per-resource yaml + handler-side composition. | | v2 | AIP-193 error mapping | pgx error inspection: `unique_violation` → `AlreadyExists`, `fk_violation` → `FailedPrecondition`, `not_null_violation` / `check_violation` → `InvalidArgument`. Per-resource override map. | | v3 | Per-field declarative authz | `admin_only_fields:` in yaml; layered with handler narrowing. | @@ -873,7 +991,7 @@ tiers ship. New features are opt-in via `aippatch.yaml`. ## Risks -1. **`pg_query_go` is a CGO dependency.** It wraps `libpg_query`. drill's +1. **`pg_query_go/v6` is a CGO dependency.** It wraps `libpg_query`. drill's production binary builds may run with `CGO_ENABLED=0` in some paths. Mitigation: `aippatchgen` is a developer/CI tool, not part of the production binary; CGO is only required where `aippatchgen` runs. @@ -905,15 +1023,23 @@ tiers ship. New features are opt-in via `aippatch.yaml`. events. v1 adds returned-diff support to remove the wrap. Mitigation: if audit comes due before v1 ships, the wrap-in-tx pattern is sufficient. -6. **Test wording change is observable.** Existing tests asserting on error - message strings (`"field not supported for update"`) will break. The - rollout plan includes an explicit audit step. +6. **Soft-deleted user wire change.** Today `UpdateProfile` returns + `Internal` when the user is soft-deleted; aippatch returns `NotFound`. + This is more correct AIP behavior, but is wire-visible. Mitigation: + document in *Wire conformance note* and the rollout plan. 7. **`buf.binpb` drift.** If `buf.binpb` is committed and a developer regens `pb/*.pb.go` without re-running `buf build -o buf.binpb`, the codegen will be stale. Mitigation: `make codegen` runs both in order; `aippatchgen --check` in CI catches drift. +8. **`op.Where` raw-identifier surface.** Keys in the map are interpolated + into the SQL string by `go-sqlbuilder` (`ub.Equal(col, val)` parameterizes + only the value). Mitigation: runtime validates every key against + `m.bindingsByColumn` before composing SQL — keys must be a bound column. + Document the constraint in the API; add a unit test for the rejection + path. + ## Decisions (locked, with rationale) | # | Decision | Why | @@ -926,14 +1052,17 @@ tiers ship. New features are opt-in via `aippatch.yaml`. | 6 | SQL builder: `huandu/go-sqlbuilder` (private to package) | Mature; `PostgreSQL.NewUpdateBuilder()` emits `$1` placeholders cleanly; `Returning(...)` is a first-class method. | | 7 | Row scan: direct `pgx.Rows.Values()` + proto reflection (no third-party scanner) | We populate a proto via reflection rather than a Go row struct; avoids an unnecessary dependency and a proto-aware shim. | | 8 | Empty FieldMask rejected with `InvalidArgument` (default) | drill prefers explicit intent; documented divergence from AIP-134; relax later if a use case warrants. | -| 9 | Deny-by-default writable; opt in via `writable:` list | AIP-203; security posture. | +| 9 | Deny-by-default writable; opt in via `writable:` list | Security posture; consistent with AIP-134 §Update_Mask "must not allow output-only fields." | | 10 | Codegen errors on unsupported field types | Bad fields stop at codegen; runtime never sees a type it cannot handle. | -| 11 | Framework reads back via `RETURNING ` (not `*`) and returns the populated proto | One round-trip; AIP-134 compliant; explicit column list excludes sensitive unmapped columns from the wire. | +| 11 | Framework reads back via `RETURNING ` (not `*`) and returns the populated proto | One round-trip; AIP-134 compliant; explicit column list excludes unmapped columns from the wire. (Bound non-writable columns are still returned — that is the AIP contract.) | | 12 | v0 codec set: scalars + timestamps + enum, NOT-NULL columns only | Smallest set that covers drill's `User` and most Spanda CRUD shapes. JSONB / nullable / bytes / float in v1. | | 13 | v0 first user: drill's `UpdateProfile` | Validates the framework against an existing target; replaces the most boilerplate-heavy code path today. | -| 14 | Boot validation via `Mapping.Validate() error` propagated to `main` through generated `InitPatches() error` | drill's no-panic-at-init rule. No `mustValidate` panic helper in generated code. | -| 15 | `AutoSet` clauses (e.g. `updated_at: NOW()`) declared per-resource in yaml | Replaces sqlc's hand-rolled `updated_at = NOW()` in every UPDATE; codegen-checked column existence and NOT-NULL. | +| 14 | Boot validation via `Mapping.Validate(Codecs) error` propagated to `main` through generated `InitPatches() error` | drill's no-panic-at-init rule. No `mustValidate` panic helper in generated code. | +| 15 | `AutoSet` clauses (e.g. `updated_at: NOW()`) declared per-resource in yaml | Replaces sqlc's hand-rolled `updated_at = NOW()` in every UPDATE; codegen-checked column existence, NOT-NULL, not-also-a-binding, and pg_query_go-validated literal. | | 16 | Always commit `buf.binpb` and regenerate via `buf build -o buf.binpb` | Eliminates need for a live buf-build dependency at codegen time; CI's `aippatchgen --check` catches drift. | +| 17 | Codecs declared once in yaml under `codecs:`, emitted as a shared `var Codecs` registry in `init.gen.go`, passed to each `Mapping.Validate(Codecs)` | Single source of truth; cross-resource reuse; no per-Mapping duplication; unexported `m.codecs` populated from the subset reachable from `m.Bindings`. | +| 18 | `op.Where` keys must be bound columns; runtime validates before composing SQL | Prevents identifier injection through the map-key surface; values are pgx-parameterized. | +| 19 | `validated` is `atomic.Bool`; `Apply` rejects unvalidated mappings with `Internal` | Defends against `Apply` calls that race ahead of `InitPatches()` under `-race`; production code never interleaves the two but the guard is cheap. | ## Open questions (deferred) From 73e32ae08d7a601d9de83e43cf1958299f2b4d31 Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Thu, 7 May 2026 23:01:35 -0400 Subject: [PATCH 05/37] =?UTF-8?q?docs(spec):=20aippatch=20=E2=80=94=20appl?= =?UTF-8?q?y=20round-3=20review=20fixes=20(final=20pass)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Both opus and sonnet round-3 reviews returned "Approve with minor revisions" — no HIGH severity findings. Sonnet's "H1" was a redundant nil check that opus correctly classified NIT (the IsValid check actually works); fixed as cosmetic. MEDIUM (both reviewers) - Validate(codecs) now explicitly fails when a Binding references an enum codec name not in the supplied codecs map. Closes the silent failure mode where dangling refs would only surface at first Apply. - Validate doc spells out FromText construction, dangling-codec check, and validated-stays-false-on-error. - Pin huandu/go-sqlbuilder to @v1.36.0 in `go get` step (UpdateBuilder. Returning was added in that release). LOW - UpdateAllWritable empty-mask path now returns CodeUnimplemented (was Internal); AIP-aligned for "feature not implemented". - SET-clause iteration now walks Mapping.Bindings order (alphabetical) filtered by mask membership, instead of client-supplied path order. Makes SQL deterministic regardless of how clients order the mask paths — pgx prepared-statement cache hits, and golden tests are stable. - Add codegen step rejecting EnumCodec maps with duplicate ToText values (would produce a lossy FromText). - Note `sqlbuilder.PostgreSQL.NewUpdateBuilder()` is required (default flavor would emit MySQL-style `?` placeholders). - Fix AutoSet test pattern — NOW() is constant within a transaction so SELECT-before/Apply/SELECT-after inside BeginFunc would see equal values. Use clock_timestamp() OR top-level statements OR capture pre-Apply NOW() and assert post >= captured. NIT - Drop dead `op.Message.ProtoReflect() == nil` clause; ProtoReflect never returns nil for protoc-gen-go types. - Hoist ProtoReflect() to a `src` local instead of calling repeatedly. - Use `ub.IsNull(m.SoftDelete)` instead of string concatenation. - Add bullet for ALTER TABLE ADD/DROP CONSTRAINT and other unlisted AlterTableCmd kinds being ignored in v0. - Sort AutoSet alphabetically by Column for stable diff (in addition to Bindings). - Move nested-mask-path rejection out of proto-field-iteration loop (proto field names never have dots; the check belongs at yaml validation time on writable/overrides keys). - Document CGO first-build cost (~3 min) and CI cache requirement. - Align spec on Makefile's existing `generate` target (was claiming a nonexistent `codegen` target). Final spec is implementation-ready. Both reviewers gave "Approve with minor revisions"; ending the review loop after round 3 per CLAUDE.md ("up to 3 rounds, end early if clean"). --- .../specs/2026-05-07-aippatch-design.md | 111 ++++++++++++------ 1 file changed, 75 insertions(+), 36 deletions(-) diff --git a/docs/superpowers/specs/2026-05-07-aippatch-design.md b/docs/superpowers/specs/2026-05-07-aippatch-design.md index 72357965..60ac6f46 100644 --- a/docs/superpowers/specs/2026-05-07-aippatch-design.md +++ b/docs/superpowers/specs/2026-05-07-aippatch-design.md @@ -275,11 +275,14 @@ func Apply[T proto.Message]( ) (T, error) // Validate is called once at startup, by InitPatches(). It indexes the -// binding maps, copies relevant codecs into m.codecs (subset reachable from -// m.Bindings), verifies every binding's proto path exists on T, and sets -// the validated atomic flag. Returns error rather than panicking, per -// drill's no-panic-at-init rule. Idempotent up to the validated flag — -// repeated calls return nil after the first success. +// binding maps; copies relevant codecs into m.codecs (subset reachable from +// m.Bindings); builds each EnumCodec's FromText from ToText; verifies every +// binding's proto path exists on T; verifies every binding's "enum:" +// codec reference exists in the supplied codecs map; and sets the validated +// atomic flag. Returns error rather than panicking, per drill's +// no-panic-at-init rule. Idempotent up to the validated flag — repeated +// calls return nil after the first success. On error, validated is left +// false and the caller may retry after fixing the cause. func (m *Mapping[T]) Validate(codecs map[string]EnumCodec) error ``` @@ -351,10 +354,11 @@ is the default in Go's `go build`, but some shops set `CGO_ENABLED=0` globally; document at the top of `cmd/aippatchgen/main.go` and in `thirdparty/aippatch/README.md`. The runtime library has no CGO requirement. -Wired into `Makefile`: +Wired into `Makefile` (drill's existing `generate` target gains the +`buf build -o buf.binpb` and `aippatchgen` steps): ``` -codegen: +generate: buf generate buf build -o buf.binpb sqlc generate @@ -370,7 +374,7 @@ and verify if desired). ### Inputs 1. **`buf.binpb`** — emitted by `buf build -o buf.binpb` (added as a new step - in `make codegen`; the existing `buf generate` does not emit a descriptor + in `make generate`; the existing `buf generate` does not emit a descriptor set). Unmarshaled into `*descriptorpb.FileDescriptorSet`; walked via `protoreflect.FileDescriptor`. 2. **`sql/migrations/*.up.sql`** — read in lexical order. Each statement @@ -383,6 +387,8 @@ and verify if desired). - `ALTER TABLE … ALTER COLUMN … SET / DROP NOT NULL` → flip nullability. - `ALTER TABLE … RENAME COLUMN` → rename. - `DROP TABLE` → remove table. + - `ALTER TABLE … ADD/DROP CONSTRAINT`, `ADD/DROP DEFAULT`, and any other + `AlterTableCmd` kind not enumerated above: ignored in v0. - Other statements (indexes, foreign-key constraints, CHECK constraints) are ignored in v0. CHECK constraint extraction (to validate enum codec maps) is a v1 feature. @@ -418,7 +424,6 @@ and verify if desired). - Scalar kind → `""`. - Anything else → diagnostic: "unsupported in v0; mark `skip: true`." - `Writable` = field name is in `resources[i].writable`. - - Reject paths with dots (nested) — diagnostic. 4. After processing, every proto field must either have a binding or `skip: true`. Any unmatched field is a diagnostic (this prevents the `proto.CloneOf`-base case from silently passing through unmapped @@ -431,8 +436,15 @@ and verify if desired). multi-statement input or non-expression payloads. (v0 effectively restricts callers to a few well-known forms: `NOW()`, `CURRENT_TIMESTAMP`, integer constants, etc.) - 6. Sort bindings alphabetically by `Proto` for **stable diff output** - (different from step 3's processing order). + 6. Validate enum-codec yaml entries: every codec's `map` values must be + unique (no two enum values map to the same SQL text), to prevent the + derived FromText map from being lossy. Diagnostic on duplicates. + 7. Validate yaml-side names: reject any `writable` or `overrides` key + containing dots (these would imply nested-message paths, which v0 + does not support). + 8. Sort `Bindings` alphabetically by `Proto` and `AutoSet` alphabetically + by `Column` for **stable diff output** (different from step 3's + processing order). 4. Emit one Go file per resource, plus one `init.gen.go` that emits a shared `var Codecs = map[string]aippatch.EnumCodec{...}` registry and `InitPatches() error` calling `Validate(Codecs)` on each mapping. @@ -654,8 +666,10 @@ func Apply[T proto.Message]( } // ProtoReflect().IsValid() returns false for typed-nil pointers and // un-initialized messages, sidestepping the typed-nil interface trap - // (`any(op.Message) == nil` is false for a nil *T). - if op.Message.ProtoReflect() == nil || !op.Message.ProtoReflect().IsValid() { + // (`any(op.Message) == nil` is false for a nil *T). ProtoReflect itself + // never returns a nil interface for protoc-gen-go-generated types. + src := op.Message.ProtoReflect() + if !src.IsValid() { return zero, connectInvalidArg("op.Message must not be nil") } @@ -666,31 +680,45 @@ func Apply[T proto.Message]( if m.EmptyMask == ErrorOnEmpty { return zero, connectInvalidArg("update_mask must not be empty") } - // UpdateAllWritable: not implemented in v0. Reject explicitly to - // avoid silently emitting an AutoSet-only UPDATE. - return zero, connectInternal("UpdateAllWritable is unimplemented in v0") + // UpdateAllWritable: not implemented in v0. CodeUnimplemented + // signals "feature not yet built" rather than a server bug. + return zero, connect.NewError(connect.CodeUnimplemented, + errors.New("UpdateAllWritable is unimplemented in v0")) } // 2. Resolve paths to bindings; reject unknown / non-writable / nested. - desc := op.Message.ProtoReflect().Descriptor() + // sqlbuilder.PostgreSQL.NewUpdateBuilder() is required (not the default + // NewUpdateBuilder) — only the PostgreSQL flavor emits $1 placeholders + // and a working RETURNING clause. + desc := src.Descriptor() ub := sqlbuilder.PostgreSQL.NewUpdateBuilder() ub.Update(m.Table) - sets := make([]string, 0, len(paths)+len(m.AutoSet)) + // Iterate Bindings (not client-supplied paths) in stable order so the + // emitted SQL is deterministic regardless of the order the client put + // paths in the mask. This makes pgx's prepared-statement cache hit + // across calls with the same mask shape, and makes golden-test SQL + // deterministic. + maskSet := make(map[string]struct{}, len(paths)) for _, p := range paths { if strings.Contains(p, ".") { return zero, connectInvalidArg("nested mask path not supported in v0: %q", p) } - b, ok := m.bindingsByProto[p] - if !ok { + if _, ok := m.bindingsByProto[p]; !ok { return zero, connectInvalidArg("unknown field in update_mask: %q", p) } - if !b.Writable { + if !m.bindingsByProto[p].Writable { return zero, connectInvalidArg("field not writable: %q", p) } - fd := desc.Fields().ByName(protoreflect.Name(p)) + maskSet[p] = struct{}{} + } + sets := make([]string, 0, len(maskSet)+len(m.AutoSet)) + for i := range m.Bindings { + b := &m.Bindings[i] + if _, ok := maskSet[b.Proto]; !ok { continue } + fd := desc.Fields().ByName(protoreflect.Name(b.Proto)) if fd == nil { - return zero, connectInternal("binding/proto desync: %q", p) + return zero, connectInternal("binding/proto desync: %q", b.Proto) } v, err := encode(op.Message, fd, b.Codec, m.codecs) if err != nil { return zero, err } @@ -707,7 +735,7 @@ func Apply[T proto.Message]( // prepared-statement cache hits across calls with the same shape). ub.Where(ub.Equal(m.PK, op.PKValue)) if m.SoftDelete != "" { - ub.Where(m.SoftDelete + " IS NULL") + ub.Where(ub.IsNull(m.SoftDelete)) // first-class helper, not string concat } if len(op.Where) > 0 { keys := make([]string, 0, len(op.Where)) @@ -869,11 +897,17 @@ Cases: int64) write + read-back - writable timestamp write + read-back - writable enum write (valid & invalid) + read-back -- AutoSet column is bumped on every PATCH. Test pattern: open a - `pgx.BeginFunc`, `SELECT updated_at` before, run `Apply`, `SELECT - updated_at` after — assert post > pre. Within a single transaction `NOW()` - returns the same instant; the framework executes the UPDATE in a separate - query so `NOW()` advances. (Alternatively use `clock_timestamp()` if needed.) +- AutoSet column is bumped on every PATCH. **Important:** `NOW()` returns + the transaction-start timestamp and is constant for the duration of a + single transaction, so a `BeginFunc(SELECT before; Apply; SELECT after)` + test would see equal values. Use one of: + - `clock_timestamp()` in the test fixture's `auto_set` instead of `NOW()`, + which advances within a transaction; or + - run the SELECT-before, `Apply`, SELECT-after as separate top-level + statements (no surrounding `BeginFunc`); or + - SELECT `NOW()` to capture the test's own transaction-time bound + before `Apply`, then SELECT the row's `updated_at` after `Apply`, + asserting it's >= the captured time. - soft-delete WHERE filters out deleted rows → `NotFound` - PK mismatch → `NotFound` - extra `Op.Where` predicate (valid bound column) excludes row → `NotFound` @@ -921,16 +955,18 @@ One new case added by the rollout: ## Drill rollout plan -1. **Dependency adds.** `go get github.com/huandu/go-sqlbuilder +1. **Dependency adds.** `go get github.com/huandu/go-sqlbuilder@v1.36.0 github.com/pganalyze/pg_query_go/v6` and commit `go.mod`/`go.sum`. - (`github.com/google/uuid` is already in `go.mod`.) + The `@v1.36.0` floor is required — `UpdateBuilder.Returning(...)` was + added in that release. (`github.com/google/uuid` is already in `go.mod`.) 2. **Add framework.** Land `thirdparty/aippatch/` runtime + `cmd/aippatchgen/` binary. -3. **Wire codegen.** Add `buf build -o buf.binpb` and the `aippatchgen` step - to `make codegen`. Add `aippatchgen --check` to `make test`. +3. **Wire codegen.** Extend the existing `make generate` target (line 125 of + `Makefile`) to add `buf build -o buf.binpb` and the `aippatchgen` step + after `buf generate`. Add `aippatchgen --check` to `make test`. 4. **Add config.** `aippatch.yaml` at repo root with the `User` resource, enum codecs, and `auto_set: { updated_at: NOW() }`. -5. **Generate.** Create `internal/patches/`; run `make codegen`. Review the +5. **Generate.** Create `internal/patches/`; run `make generate`. Review the diff manually first time, including `internal/patches/user.gen.go` and `internal/patches/init.gen.go`. 6. **Wire init.** Call `patches.InitPatches()` in `cmd/drill/main.go` @@ -998,7 +1034,10 @@ tiers ship. New features are opt-in via `aippatch.yaml`. Document at the top of `cmd/aippatchgen/main.go`: `// Requires CGO (libpg_query).` Add a `README.md` next to it stating the same. CI runners must have a C toolchain — drill's CI already does - for testcontainers. + for testcontainers. **First-build cost:** `pg_query_go/v6` compiles part + of the PostgreSQL parser from C source on first use; on a cold build + cache this can take ~3 minutes. CI runners should preserve `GOCACHE` + and `GOMODCACHE` across runs (drill's CI already does). 2. **pgx-native ↔ proto type drift.** New SQL types added to drill in the future may not be in the runtime's `decode` switch. Mitigation: @@ -1030,7 +1069,7 @@ tiers ship. New features are opt-in via `aippatch.yaml`. 7. **`buf.binpb` drift.** If `buf.binpb` is committed and a developer regens `pb/*.pb.go` without re-running `buf build -o buf.binpb`, the codegen - will be stale. Mitigation: `make codegen` runs both in order; + will be stale. Mitigation: `make generate` runs both in order; `aippatchgen --check` in CI catches drift. 8. **`op.Where` raw-identifier surface.** Keys in the map are interpolated From c79c7df207f5ee9e196b0822ba77fa5dab99feda Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Thu, 7 May 2026 23:12:39 -0400 Subject: [PATCH 06/37] plan: idle auto-cancel implementation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 15 tasks across 6 phases covering migrations 015–018, the internal/feat/idleunsub/ feature package (token sign/verify, trigger evaluation, KeepSubscription, AutoReverse, email composition), Stripe webhook integration, the public /sub/keep endpoint, the auth-middleware AutoReverse hook, the user-info RPC + AckKeptBanner, and the React KeptBanner component. Acceptance criteria + manual smoke test + spec out-of-scope list at the end. Spec: docs/superpowers/specs/2026-05-07-idle-auto-cancel-design.md --- .../plans/2026-05-07-idle-auto-cancel.md | 2585 +++++++++++++++++ 1 file changed, 2585 insertions(+) create mode 100644 docs/superpowers/plans/2026-05-07-idle-auto-cancel.md diff --git a/docs/superpowers/plans/2026-05-07-idle-auto-cancel.md b/docs/superpowers/plans/2026-05-07-idle-auto-cancel.md new file mode 100644 index 00000000..83af7be4 --- /dev/null +++ b/docs/superpowers/plans/2026-05-07-idle-auto-cancel.md @@ -0,0 +1,2585 @@ +# Idle Auto-Cancel Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Auto-cancel a Pro subscription via `cancel_at_period_end=true` when the user has been idle for two consecutive billing periods. No settings toggle — this is the default Sabermatic behavior. Reversible via email-link click or any authenticated activity during the cancel window. + +**Architecture:** Event-driven via Stripe `invoice.upcoming` webhook (~T-7 days before each renewal). Cancel decision happens inside a per-user-locked DB transaction (`SELECT ... FOR UPDATE`) with a generic `stripe_webhook_dedup` table for retry-storm protection and period-keyed `user_events` rows for multi-firing dedup. Reversal paths verify Stripe state before acting (handles partial-failure cache drift). Activity signal = `MAX(last_active)` from soft-deleted `auth_sessions`. + +**Tech Stack:** +- Go 1.25.x (`go.mod:5`); `github.com/stripe/stripe-go/v82` (v82.5.1) +- Postgres 16 with sqlc-generated queries +- River queue for async email send +- TypeScript/React frontend with ConnectRPC +- HMAC-SHA256 for keep-link tokens + +**Spec:** `docs/superpowers/specs/2026-05-07-idle-auto-cancel-design.md` (commit `780ef4ea`). + +**Key project conventions** (from `/Users/btc/Projects/src/drill/CLAUDE.md`): +- Frontend typecheck: `cd web && npx tsc -b` (NEVER bare `tsc --noEmit`) +- Full CI: `make test` (buf lint + codegen check + frontend tsc/lint/tests + backend tests) +- Test users via `backendtest.SeedUser` (handler tests) or `seedUser` (backend internal tests); never raw SQL inserts +- After UI-affecting commits: load in browser before declaring done +- `_ = err` for ignored errors; never blanket `nolint` +- After every code-modifying task, a separate review agent reads actual files and verifies against spec + +--- + +## Phase 1: Schema + +Each migration is a standalone task with paired `up.sql` / `down.sql`. After migration files land, sqlc generation must be re-run via `sqlc generate` (the existing project script — see `Makefile` if present, else use `sqlc generate` from the repo root). + +### Task 1: Migration 015 — `auth_sessions` soft-delete + +**Files:** +- Create: `sql/migrations/015_auth_sessions_soft_delete.up.sql` +- Create: `sql/migrations/015_auth_sessions_soft_delete.down.sql` +- Modify: `sql/queries/auth_sessions.sql` +- Modify: `sql/queries/users.sql` (rename existing logout-everywhere query for clarity) +- Modify: `internal/backend/auth.go:261` (sole caller of `DeleteAuthSession`) +- Test: `internal/backend/auth_test.go` (verify logout now soft-deletes; sessions persist for activity reads) + +- [ ] **Step 1: Write the up migration** + +```sql +-- 015_auth_sessions_soft_delete.up.sql +ALTER TABLE auth_sessions + ADD COLUMN revoked_at TIMESTAMPTZ; + +CREATE INDEX idx_auth_sessions_user_last_active + ON auth_sessions(user_id, last_active DESC); +``` + +- [ ] **Step 2: Write the down migration** + +```sql +-- 015_auth_sessions_soft_delete.down.sql +DROP INDEX IF EXISTS idx_auth_sessions_user_last_active; +ALTER TABLE auth_sessions DROP COLUMN revoked_at; +``` + +- [ ] **Step 3: Update `sql/queries/auth_sessions.sql`** + +Replace the file contents: + +```sql +-- name: CreateAuthSession :one +INSERT INTO auth_sessions (user_id, token_hash, expires_at, ip_address, user_agent) +VALUES ($1, $2, $3, $4, $5) +RETURNING *; + +-- name: GetAuthSessionByToken :one +-- Filters out soft-revoked sessions; only returns valid live sessions. +-- (Activity queries do NOT filter on revoked_at — see GetUserLastActive.) +SELECT s.*, u.email, u.display_name, u.role, u.plan, u.email_verified, + u.created_at AS user_created_at +FROM auth_sessions s +JOIN users u ON u.id = s.user_id +WHERE s.token_hash = $1 + AND s.expires_at > NOW() + AND s.revoked_at IS NULL + AND u.deleted_at IS NULL; + +-- name: TouchAuthSession :exec +UPDATE auth_sessions SET last_active = NOW() +WHERE id = $1; + +-- name: RevokeAuthSession :exec +-- Soft-delete: marks the row revoked but preserves it for the activity query. +UPDATE auth_sessions SET revoked_at = NOW() +WHERE id = $1 AND revoked_at IS NULL; + +-- name: RevokeUserAuthSessions :exec +-- Soft-delete every active session for a user (logout-everywhere). +UPDATE auth_sessions SET revoked_at = NOW() +WHERE user_id = $1 AND revoked_at IS NULL; + +-- name: GetUserLastActive :one +-- Reads across all history (including revoked sessions) — this is a +-- "when did the user last act, ever?" query, not a session-validity check. +SELECT MAX(last_active)::timestamptz +FROM auth_sessions +WHERE user_id = $1; +``` + +- [ ] **Step 4: Update `sql/queries/users.sql`** + +Locate the existing `DELETE FROM auth_sessions WHERE user_id = @id;` query (the account-deletion path). Rename for clarity: + +```sql +-- name: HardDeleteUserAuthSessions :exec +-- Used ONLY by account deletion (data minimization / GDPR). Logout uses +-- RevokeUserAuthSessions in auth_sessions.sql instead. +DELETE FROM auth_sessions WHERE user_id = @id; +``` + +- [ ] **Step 5: Update the sole caller in `internal/backend/auth.go:261`** + +Find the call to `queries.DeleteAuthSession` (Logout path) and replace with `queries.RevokeAuthSession`. The signatures are identical (`exec` query taking the session ID). + +Also locate any caller of `DeleteUserAuthSessions` (logout-everywhere paths) and rename to `RevokeUserAuthSessions`. Find with: `grep -rn "DeleteUserAuthSessions\|DeleteAuthSession" internal/`. + +For the account-deletion caller, use `HardDeleteUserAuthSessions`. + +- [ ] **Step 6: Run sqlc generate and verify** + +```bash +sqlc generate +``` + +Expected: regenerates `internal/db/auth_sessions.sql.go` and `internal/db/users.sql.go` with new query method names. + +- [ ] **Step 7: Run go build to confirm callsites compile** + +```bash +go build ./... +``` + +Expected: builds clean. Any remaining caller of the old query names will fail compilation — fix in this task before committing. + +- [ ] **Step 8: Write a test in `internal/backend/auth_test.go`** + +Add this test alongside existing logout tests: + +```go +func TestLogout_SoftDeletes_PreservesActivityHistory(t *testing.T) { + t.Parallel() + b := pg.NewBackend(t) + ctx := context.Background() + userID := backendtest.SeedUser(t, b) + + // Sign in to create an auth session, then log out. + // (Use whatever existing helper your project has for sign-in; + // search for an existing test that logs in then out and mirror it.) + tok := signinHelper(t, b, userID) + require.NoError(t, b.Logout(ctx, tok)) + + // The session row should still exist (soft delete) with revoked_at set. + var revokedAt sql.NullTime + err := b.Pool().QueryRow(ctx, + `SELECT revoked_at FROM auth_sessions WHERE user_id = $1`, + userID).Scan(&revokedAt) + require.NoError(t, err) + require.True(t, revokedAt.Valid, "revoked_at should be set after logout") + + // GetUserLastActive should still return a value (across all history). + last, err := db.New(b.Pool()).GetUserLastActive(ctx, userID) + require.NoError(t, err) + require.NotZero(t, last, "last_active must be visible across revoked sessions") +} +``` + +- [ ] **Step 9: Run the test to verify it passes** + +```bash +go test ./internal/backend/ -run TestLogout_SoftDeletes_PreservesActivityHistory -race -count=1 -v +``` + +Expected: PASS. + +- [ ] **Step 10: Run the full backend test suite to catch regressions** + +```bash +go test ./internal/... ./cmd/... -race -count=1 -timeout=300s +``` + +Expected: all pass. If any existing test fails, the soft-delete change broke an assumption — fix in this task. + +- [ ] **Step 11: Commit** + +```bash +git add sql/migrations/015_*.sql sql/queries/auth_sessions.sql sql/queries/users.sql \ + internal/backend/auth.go internal/backend/auth_test.go \ + internal/db/auth_sessions.sql.go internal/db/users.sql.go internal/db/models.go +git commit -m "auth_sessions: soft-delete on logout (migration 015)" +``` + +--- + +### Task 2: Migration 016 — users subscription state cache + +**Files:** +- Create: `sql/migrations/016_users_sub_state.up.sql` +- Create: `sql/migrations/016_users_sub_state.down.sql` +- Modify: `sql/queries/users.sql` (add new sub-state queries) +- Test: deferred to Task 5 where these columns are exercised end-to-end via AuthUser projection. + +- [ ] **Step 1: Write the up migration** + +```sql +-- 016_users_sub_state.up.sql +ALTER TABLE users + ADD COLUMN stripe_subscription_id TEXT, + ADD COLUMN sub_cancel_at_period_end BOOLEAN NOT NULL DEFAULT FALSE, + ADD COLUMN sub_cancel_is_auto BOOLEAN NOT NULL DEFAULT FALSE, + ADD COLUMN sub_current_period_start TIMESTAMPTZ, + ADD COLUMN pending_kept_banner BOOLEAN NOT NULL DEFAULT FALSE, + ADD COLUMN idle_eligible_after TIMESTAMPTZ NOT NULL DEFAULT NOW(); +``` + +- [ ] **Step 2: Write the down migration** + +```sql +-- 016_users_sub_state.down.sql +ALTER TABLE users + DROP COLUMN idle_eligible_after, + DROP COLUMN pending_kept_banner, + DROP COLUMN sub_current_period_start, + DROP COLUMN sub_cancel_is_auto, + DROP COLUMN sub_cancel_at_period_end, + DROP COLUMN stripe_subscription_id; +``` + +- [ ] **Step 3: Append new queries to `sql/queries/users.sql`** + +```sql +-- name: SetUserAutoCancelState :exec +-- Called by idleunsub.HandleInvoiceUpcoming when our trigger fires. +-- Sets BOTH cache flags so the auto-reverse middleware gate fires for this user. +UPDATE users +SET sub_cancel_at_period_end = TRUE, + sub_cancel_is_auto = TRUE, + sub_current_period_start = $2 +WHERE id = $1; + +-- name: ClearUserAutoCancelState :exec +-- Called by KeepSubscription / AutoReverse on reversal. Sets banner flag. +UPDATE users +SET sub_cancel_at_period_end = FALSE, + sub_cancel_is_auto = FALSE, + pending_kept_banner = TRUE, + sub_current_period_start = $2 +WHERE id = $1; + +-- name: SyncSubStateFromWebhook :exec +-- Called by handleSubscriptionUpdated. Does NOT touch sub_cancel_is_auto: +-- only our handler sets that flag; webhook sync must not overwrite it. +UPDATE users +SET stripe_subscription_id = $2, + sub_cancel_at_period_end = $3, + sub_current_period_start = $4 +WHERE id = $1; + +-- name: ClearSubStateOnDeletion :exec +-- Called by handleSubscriptionDeleted. Clears all sub state and downgrades plan. +UPDATE users +SET stripe_subscription_id = NULL, + sub_cancel_at_period_end = FALSE, + sub_cancel_is_auto = FALSE, + sub_current_period_start = NULL, + plan = 'free' +WHERE id = $1; + +-- name: ClearKeptBanner :exec +UPDATE users SET pending_kept_banner = FALSE WHERE id = $1; + +-- name: ClearStaleKeptBanners :exec +-- Periodic hygiene: clear banners that are older than 14 days. +-- (Run from a tiny daily cron; the spec calls this out as low-priority cleanup.) +UPDATE users +SET pending_kept_banner = FALSE +WHERE pending_kept_banner = TRUE + AND id IN ( + SELECT user_id FROM user_events + WHERE event_type = 'subscription_kept' + AND created_at < NOW() - INTERVAL '14 days' + ); + +-- name: GetUserAutoCancelGates :one +-- Used by AutoReverse to defensively re-read both gates. +SELECT sub_cancel_at_period_end, sub_cancel_is_auto, + stripe_subscription_id, sub_current_period_start +FROM users +WHERE id = $1; + +-- name: LockUserForSubDecision :one +-- Per-user mutex for the cancel transaction. Acquires a row-level lock that +-- serializes concurrent invoice.upcoming evaluations for the same user. +-- Must be inside a transaction; releases on COMMIT or ROLLBACK. +SELECT id FROM users WHERE id = $1 FOR UPDATE; +``` + +- [ ] **Step 4: Run sqlc generate** + +```bash +sqlc generate +``` + +Expected: `internal/db/users.sql.go` gains the new methods; `internal/db/models.go` `User` struct gains the new fields. + +- [ ] **Step 5: Run go build** + +```bash +go build ./... +``` + +Expected: builds clean (no callers reference the new methods yet). + +- [ ] **Step 6: Commit** + +```bash +git add sql/migrations/016_*.sql sql/queries/users.sql \ + internal/db/users.sql.go internal/db/models.go +git commit -m "users: sub state cache columns (migration 016)" +``` + +--- + +### Task 3: Migration 017 — `stripe_webhook_dedup` table + +**Files:** +- Create: `sql/migrations/017_stripe_webhook_dedup.up.sql` +- Create: `sql/migrations/017_stripe_webhook_dedup.down.sql` +- Create: `sql/queries/stripe_webhook_dedup.sql` + +- [ ] **Step 1: Write the up migration** + +```sql +-- 017_stripe_webhook_dedup.up.sql +CREATE TABLE stripe_webhook_dedup ( + event_id TEXT PRIMARY KEY, + event_type TEXT NOT NULL, + processed_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); +``` + +- [ ] **Step 2: Write the down migration** + +```sql +-- 017_stripe_webhook_dedup.down.sql +DROP TABLE IF EXISTS stripe_webhook_dedup; +``` + +- [ ] **Step 3: Create `sql/queries/stripe_webhook_dedup.sql`** + +```sql +-- name: TryClaimWebhookEvent :one +-- Returns the event_id on first claim; returns no row on subsequent claims. +-- Use the no-row return as the signal "this event was already handled". +INSERT INTO stripe_webhook_dedup (event_id, event_type) +VALUES ($1, $2) +ON CONFLICT (event_id) DO NOTHING +RETURNING event_id; +``` + +- [ ] **Step 4: Run sqlc generate and go build** + +```bash +sqlc generate && go build ./... +``` + +Expected: clean build. + +- [ ] **Step 5: Commit** + +```bash +git add sql/migrations/017_*.sql sql/queries/stripe_webhook_dedup.sql \ + internal/db/stripe_webhook_dedup.sql.go internal/db/models.go +git commit -m "stripe_webhook_dedup: generic webhook idempotency (migration 017)" +``` + +--- + +### Task 4: Migration 018 — `keep_link_token_uses` + period-keyed dedup queries + +**Files:** +- Create: `sql/migrations/018_keep_link_token_uses.up.sql` +- Create: `sql/migrations/018_keep_link_token_uses.down.sql` +- Create: `sql/queries/keep_link_token_uses.sql` +- Modify: `sql/queries/user_events.sql` (add the two period-keyed dedup queries; create the file if it doesn't exist) + +- [ ] **Step 1: Write the up migration** + +```sql +-- 018_keep_link_token_uses.up.sql +CREATE TABLE keep_link_token_uses ( + token_hash BYTEA PRIMARY KEY, + user_id UUID NOT NULL REFERENCES users(id), + used_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); +``` + +- [ ] **Step 2: Write the down migration** + +```sql +-- 018_keep_link_token_uses.down.sql +DROP TABLE IF EXISTS keep_link_token_uses; +``` + +- [ ] **Step 3: Create `sql/queries/keep_link_token_uses.sql`** + +```sql +-- name: TryClaimKeepToken :one +-- Single-use enforcement. Returns the hash on first claim, no row otherwise. +INSERT INTO keep_link_token_uses (token_hash, user_id) +VALUES ($1, $2) +ON CONFLICT (token_hash) DO NOTHING +RETURNING token_hash; +``` + +- [ ] **Step 4: Add period-keyed dedup queries** + +If `sql/queries/user_events.sql` exists, append; else create it with these queries: + +```sql +-- name: HasAutoCanceledThisPeriod :one +-- Returns true if a subscription_auto_canceled event already exists for +-- this (user, subscription, period). Backs the multi-firing dedup in §4.1. +-- metadata->>'current_period_start' is RFC3339 text written by cancelMetadata. +SELECT EXISTS ( + SELECT 1 FROM user_events + WHERE user_id = $1 + AND event_type = 'subscription_auto_canceled' + AND (metadata->>'subscription_id') = $2 + AND (metadata->>'current_period_start')::timestamptz = $3 +); + +-- name: GetMostRecentKeptOrCanceledForPeriod :one +-- For multi-firing dedup: did a 'subscription_kept' event arrive after the +-- most recent 'subscription_auto_canceled' for this period? If yes, the user +-- already kept their sub for this period; don't re-cancel. +SELECT event_type +FROM user_events +WHERE user_id = $1 + AND event_type IN ('subscription_auto_canceled', 'subscription_kept') + AND (metadata->>'subscription_id') = $2 + AND (metadata->>'current_period_start')::timestamptz = $3 +ORDER BY created_at DESC +LIMIT 1; + +-- name: InsertSubscriptionAutoCanceledEvent :exec +-- Composed insert for the cancel decision. metadata is already-marshaled JSON. +INSERT INTO user_events (user_id, event_type, metadata) +VALUES ($1, 'subscription_auto_canceled', $2); + +-- name: InsertSubscriptionKeptEvent :exec +INSERT INTO user_events (user_id, event_type, metadata) +VALUES ($1, 'subscription_kept', $2); +``` + +- [ ] **Step 5: Run sqlc generate and go build** + +```bash +sqlc generate && go build ./... +``` + +- [ ] **Step 6: Commit** + +```bash +git add sql/migrations/018_*.sql sql/queries/keep_link_token_uses.sql \ + sql/queries/user_events.sql \ + internal/db/keep_link_token_uses.sql.go internal/db/user_events.sql.go \ + internal/db/models.go +git commit -m "keep_link_token_uses + user_events dedup queries (migration 018)" +``` + +--- + +## Phase 2: AuthUser extension + +### Task 5: Project new sub-state columns into `AuthUser` + +The auto-reverse hook in `Backend.AuthenticateSession` gates on `user.SubCancelAtPeriodEnd && user.SubCancelIsAuto`. The `AuthUser` struct must carry these. + +**Files:** +- Modify: `internal/auth/user_context.go` (extend `AuthUser` struct) +- Modify: `sql/queries/auth_sessions.sql` (`GetAuthSessionByToken` projects new columns) +- Modify: `internal/backend/backend.go:267-285` (`AuthenticateSession` populates new fields) +- Test: `internal/backend/auth_test.go` (verify projection) + +- [ ] **Step 1: Update `internal/auth/user_context.go`** + +Add three fields to the `AuthUser` struct: + +```go +type AuthUser struct { + ID uuid.UUID + Email string + DisplayName string + Role string + Plan string + EmailVerified bool + CreatedAt time.Time + + // Auto-cancel state (migration 016). + SubCancelAtPeriodEnd bool + SubCancelIsAuto bool + PendingKeptBanner bool +} +``` + +- [ ] **Step 2: Update `GetAuthSessionByToken` in `sql/queries/auth_sessions.sql`** + +Extend the SELECT projection: + +```sql +-- name: GetAuthSessionByToken :one +SELECT s.*, u.email, u.display_name, u.role, u.plan, u.email_verified, + u.created_at AS user_created_at, + u.sub_cancel_at_period_end, u.sub_cancel_is_auto, u.pending_kept_banner +FROM auth_sessions s +JOIN users u ON u.id = s.user_id +WHERE s.token_hash = $1 + AND s.expires_at > NOW() + AND s.revoked_at IS NULL + AND u.deleted_at IS NULL; +``` + +- [ ] **Step 3: Run sqlc generate** + +```bash +sqlc generate +``` + +Expected: `GetAuthSessionByTokenRow` struct gains the three new fields. + +- [ ] **Step 4: Update `Backend.AuthenticateSession` in `internal/backend/backend.go`** + +Find the function (around line 267). After `row, err := queries.GetAuthSessionByToken(...)`, populate the new fields when constructing the returned `AuthUser`: + +```go +return &auth.AuthUser{ + ID: row.UserID, + Email: row.Email, + DisplayName: row.DisplayName, + Role: row.Role, + Plan: row.Plan, + EmailVerified: row.EmailVerified, + CreatedAt: row.UserCreatedAt, + SubCancelAtPeriodEnd: row.SubCancelAtPeriodEnd, + SubCancelIsAuto: row.SubCancelIsAuto, + PendingKeptBanner: row.PendingKeptBanner, +}, nil +``` + +(Match the existing field-assignment style; the new fields are appended.) + +- [ ] **Step 5: Write a test** + +In `internal/backend/auth_test.go` add: + +```go +func TestAuthenticateSession_ProjectsSubState(t *testing.T) { + t.Parallel() + b := pg.NewBackend(t) + ctx := context.Background() + userID := backendtest.SeedUser(t, b) + + // Force the cache columns to known values. + _, err := b.Pool().Exec(ctx, ` + UPDATE users + SET sub_cancel_at_period_end = TRUE, + sub_cancel_is_auto = TRUE, + pending_kept_banner = TRUE + WHERE id = $1`, userID) + require.NoError(t, err) + + tok := signinHelper(t, b, userID) // existing helper; mirror existing tests + tokenHash := auth.HashSessionToken(tok) + + user, err := b.AuthenticateSession(ctx, tokenHash) + require.NoError(t, err) + require.True(t, user.SubCancelAtPeriodEnd) + require.True(t, user.SubCancelIsAuto) + require.True(t, user.PendingKeptBanner) +} +``` + +- [ ] **Step 6: Run the test** + +```bash +go test ./internal/backend/ -run TestAuthenticateSession_ProjectsSubState -race -count=1 -v +``` + +Expected: PASS. + +- [ ] **Step 7: Run all backend tests** + +```bash +go test ./internal/... ./cmd/... -race -count=1 -timeout=300s +``` + +Expected: all pass. + +- [ ] **Step 8: Commit** + +```bash +git add internal/auth/user_context.go sql/queries/auth_sessions.sql \ + internal/backend/backend.go internal/backend/auth_test.go \ + internal/db/auth_sessions.sql.go +git commit -m "auth: project sub_cancel_at_period_end and friends into AuthUser" +``` + +--- + +## Phase 3: idleunsub package + +The feature package home. New convention: `internal/feat//` for cohesive feature packages. This is the first one. + +### Task 6: Token sign/verify + +**Files:** +- Create: `internal/feat/idleunsub/token.go` +- Create: `internal/feat/idleunsub/token_test.go` + +- [ ] **Step 1: Write the failing test first** + +Create `internal/feat/idleunsub/token_test.go`: + +```go +package idleunsub_test + +import ( + "crypto/rand" + "testing" + "time" + + "github.com/google/uuid" + "github.com/stretchr/testify/require" + + "github.com/btc/drill/internal/feat/idleunsub" +) + +func newSigner(t *testing.T) *idleunsub.TokenSigner { + t.Helper() + key := make([]byte, 32) + _, err := rand.Read(key) + require.NoError(t, err) + return idleunsub.NewTokenSigner(key) +} + +func TestSignVerify_RoundTrip(t *testing.T) { + t.Parallel() + s := newSigner(t) + now := time.Now().UTC().Truncate(time.Second) + end := now.Add(7 * 24 * time.Hour) + + claims := idleunsub.KeepTokenClaims{ + UserID: uuid.New(), + SubscriptionID: "sub_123", + Action: "keep_subscription", + CurrentPeriodEnd: end, + IssuedAt: now.Unix(), + ExpiresAt: end.Unix(), + } + + tok := s.Sign(claims) + got, err := s.Verify(tok) + require.NoError(t, err) + require.Equal(t, claims.UserID, got.UserID) + require.Equal(t, claims.SubscriptionID, got.SubscriptionID) + require.Equal(t, claims.Action, got.Action) + require.True(t, claims.CurrentPeriodEnd.Equal(got.CurrentPeriodEnd)) + require.Equal(t, claims.ExpiresAt, got.ExpiresAt) +} + +func TestVerify_RejectsTampered(t *testing.T) { + t.Parallel() + s := newSigner(t) + tok := s.Sign(idleunsub.KeepTokenClaims{ + UserID: uuid.New(), SubscriptionID: "sub_x", Action: "keep_subscription", + CurrentPeriodEnd: time.Now().Add(time.Hour).UTC(), ExpiresAt: time.Now().Add(time.Hour).Unix(), + }) + tampered := tok[:len(tok)-2] + "AA" // mutate last two characters + _, err := s.Verify(tampered) + require.ErrorIs(t, err, idleunsub.ErrTokenInvalid) +} + +func TestVerify_RejectsExpired(t *testing.T) { + t.Parallel() + s := newSigner(t) + past := time.Now().Add(-time.Hour).UTC().Truncate(time.Second) + tok := s.Sign(idleunsub.KeepTokenClaims{ + UserID: uuid.New(), SubscriptionID: "sub_x", Action: "keep_subscription", + CurrentPeriodEnd: past, IssuedAt: past.Add(-24 * time.Hour).Unix(), ExpiresAt: past.Unix(), + }) + _, err := s.Verify(tok) + require.ErrorIs(t, err, idleunsub.ErrTokenExpired) +} + +func TestVerify_RejectsExpDriftFromPeriodEnd(t *testing.T) { + t.Parallel() + s := newSigner(t) + end := time.Now().Add(time.Hour).UTC().Truncate(time.Second) + + claims := idleunsub.KeepTokenClaims{ + UserID: uuid.New(), SubscriptionID: "sub_x", Action: "keep_subscription", + CurrentPeriodEnd: end, + IssuedAt: time.Now().Unix(), + ExpiresAt: end.Add(24 * time.Hour).Unix(), // drift! + } + tok := s.Sign(claims) + _, err := s.Verify(tok) + require.ErrorIs(t, err, idleunsub.ErrTokenInvalid) +} +``` + +- [ ] **Step 2: Run the test to verify failure** + +```bash +go test ./internal/feat/idleunsub/ -run TestSignVerify -v +``` + +Expected: FAIL — package doesn't exist yet. + +- [ ] **Step 3: Implement `internal/feat/idleunsub/token.go`** + +```go +package idleunsub + +import ( + "crypto/hmac" + "crypto/sha256" + "encoding/base64" + "encoding/json" + "errors" + "strings" + "time" + + "github.com/google/uuid" +) + +var ( + ErrTokenInvalid = errors.New("idleunsub: token invalid") + ErrTokenExpired = errors.New("idleunsub: token expired") +) + +// KeepTokenClaims is the canonical signed payload. Field order is fixed by +// struct definition; encoding/json over a typed struct produces a stable +// serialization (no map iteration order ambiguity). +type KeepTokenClaims struct { + UserID uuid.UUID `json:"user_id"` + SubscriptionID string `json:"subscription_id"` + Action string `json:"action"` // "keep_subscription" + CurrentPeriodEnd time.Time `json:"current_period_end"` + IssuedAt int64 `json:"iat"` + ExpiresAt int64 `json:"exp"` // INVARIANT: == CurrentPeriodEnd.Unix() +} + +// TokenSigner signs and verifies KeepTokenClaims with HMAC-SHA256. +type TokenSigner struct { + key []byte + now func() time.Time +} + +// NewTokenSigner constructs a signer with the given HMAC key. +func NewTokenSigner(key []byte) *TokenSigner { + return &TokenSigner{key: key, now: func() time.Time { return time.Now() }} +} + +// Sign serializes and HMAC-signs the claims, returning a URL-safe base64 +// string of the form ".". +func (s *TokenSigner) Sign(c KeepTokenClaims) string { + body, _ := json.Marshal(c) // typed struct never errors + bodyB64 := base64.RawURLEncoding.EncodeToString(body) + sig := hmac.New(sha256.New, s.key) + sig.Write([]byte(bodyB64)) + sigB64 := base64.RawURLEncoding.EncodeToString(sig.Sum(nil)) + return bodyB64 + "." + sigB64 +} + +// Verify parses and validates a token. Returns the claims or an error. +// - ErrTokenInvalid: bad signature, malformed, or invariant violated +// - ErrTokenExpired: signature valid but exp < now +func (s *TokenSigner) Verify(tok string) (KeepTokenClaims, error) { + var zero KeepTokenClaims + parts := strings.SplitN(tok, ".", 2) + if len(parts) != 2 { + return zero, ErrTokenInvalid + } + bodyB64, sigB64 := parts[0], parts[1] + + expectedSig := hmac.New(sha256.New, s.key) + expectedSig.Write([]byte(bodyB64)) + givenSig, err := base64.RawURLEncoding.DecodeString(sigB64) + if err != nil || !hmac.Equal(givenSig, expectedSig.Sum(nil)) { + return zero, ErrTokenInvalid + } + + body, err := base64.RawURLEncoding.DecodeString(bodyB64) + if err != nil { + return zero, ErrTokenInvalid + } + var c KeepTokenClaims + if err := json.Unmarshal(body, &c); err != nil { + return zero, ErrTokenInvalid + } + + // Spec invariant: ExpiresAt must equal CurrentPeriodEnd.Unix(). + if c.ExpiresAt != c.CurrentPeriodEnd.Unix() { + return zero, ErrTokenInvalid + } + + if s.now().Unix() >= c.ExpiresAt { + return zero, ErrTokenExpired + } + return c, nil +} +``` + +- [ ] **Step 4: Run tests** + +```bash +go test ./internal/feat/idleunsub/ -race -count=1 -v +``` + +Expected: all four tests PASS. + +- [ ] **Step 5: Commit** + +```bash +git add internal/feat/idleunsub/token.go internal/feat/idleunsub/token_test.go +git commit -m "idleunsub: HMAC keep-token sign/verify" +``` + +--- + +### Task 7: Service skeleton + `HandleInvoiceUpcoming` (the cancel decision) + +This is the largest task. It builds the trigger evaluation and the entire cancel transaction. Tests are integration-shaped (live DB, stubbed Stripe). + +**Files:** +- Create: `internal/feat/idleunsub/idleunsub.go` (Service + HandleInvoiceUpcoming) +- Create: `internal/feat/idleunsub/metadata.go` (cancelMetadata / keptMetadata structs) +- Create: `internal/feat/idleunsub/metrics.go` (OTEL counter wrappers) +- Create: `internal/feat/idleunsub/idleunsub_test.go` +- Create: `internal/feat/idleunsub/stripestub_test.go` (in-package fake Stripe client used only by tests) + +**Stripe test pattern note:** `internal/backend/billing_test.go` imports `stripe-go/v82` directly and constructs Stripe events via fixture data. We'll mirror that: hand-construct `stripe.Event` and `stripe.Subscription` payloads in tests; the Service takes a `StripeClient` interface so tests inject a fake. + +- [ ] **Step 1: Define the StripeClient interface and metadata structs** + +Create `internal/feat/idleunsub/metadata.go`: + +```go +package idleunsub + +import "time" + +// All timestamps are canonicalized to second precision via +// time.Unix(stripeInt64, 0).UTC() before assignment. Stripe period fields +// arrive as int64 Unix seconds; explicit second precision and UTC guard +// against future drift if a code path ever constructs a time.Time from a +// different source. JSON encoding via encoding/json produces RFC3339 +// ("...Z") which Postgres ::timestamptz parses reliably. + +type cancelMetadata struct { + SubscriptionID string `json:"subscription_id"` + StripeEventID string `json:"stripe_event_id"` + CurrentPeriodStart time.Time `json:"current_period_start"` + CurrentPeriodEnd time.Time `json:"current_period_end"` +} + +type keptMetadata struct { + SubscriptionID string `json:"subscription_id"` + Via string `json:"via"` // "link" | "auto_activity" + CurrentPeriodStart time.Time `json:"current_period_start"` // identifies the period kept +} +``` + +- [ ] **Step 2: Define the metrics shim** + +Create `internal/feat/idleunsub/metrics.go`: + +```go +package idleunsub + +import ( + "context" + + "go.opentelemetry.io/otel" + "go.opentelemetry.io/otel/attribute" + "go.opentelemetry.io/otel/metric" +) + +var ( + meter = otel.Meter("idleunsub") + cancelFired, _ = meter.Int64Counter("idleunsub.cancel.fired") + cancelSkipped, _ = meter.Int64Counter("idleunsub.cancel.skipped") + cancelError, _ = meter.Int64Counter("idleunsub.cancel.error") + cacheDrift, _ = meter.Int64Counter("idleunsub.cancel.cache_drift_corrected") + reverseLink, _ = meter.Int64Counter("idleunsub.reverse.link") + reverseAct, _ = meter.Int64Counter("idleunsub.reverse.activity") + emailEnqueue, _ = meter.Int64Counter("idleunsub.email.enqueue") +) + +func mCancelFired(ctx context.Context, subID string) { + cancelFired.Add(ctx, 1, metric.WithAttributes(attribute.String("sub_id", subID))) +} +func mCancelSkipped(ctx context.Context, reason string) { + cancelSkipped.Add(ctx, 1, metric.WithAttributes(attribute.String("reason", reason))) +} +func mCancelError(ctx context.Context, reason string) { + cancelError.Add(ctx, 1, metric.WithAttributes(attribute.String("reason", reason))) +} +func mCacheDriftCorrected(ctx context.Context, subID string) { + cacheDrift.Add(ctx, 1, metric.WithAttributes(attribute.String("sub_id", subID))) +} +func mReverseLink(ctx context.Context, subID string) { + reverseLink.Add(ctx, 1, metric.WithAttributes(attribute.String("sub_id", subID))) +} +func mReverseActivity(ctx context.Context, subID string) { + reverseAct.Add(ctx, 1, metric.WithAttributes(attribute.String("sub_id", subID))) +} +func mEmailEnqueue(ctx context.Context, kind, status string) { + emailEnqueue.Add(ctx, 1, metric.WithAttributes( + attribute.String("kind", kind), + attribute.String("status", status))) +} +``` + +- [ ] **Step 3: Define the Service struct + StripeClient interface** + +Create `internal/feat/idleunsub/idleunsub.go`: + +```go +package idleunsub + +import ( + "context" + "errors" + "fmt" + "log/slog" + "time" + + "github.com/google/uuid" + "github.com/jackc/pgx/v5" + "github.com/jackc/pgx/v5/pgxpool" + stripe "github.com/stripe/stripe-go/v82" + + "github.com/btc/drill/internal/db" + "github.com/btc/drill/internal/email" +) + +// StripeClient is the surface idleunsub needs from Stripe. Production wires +// this to wrappers around stripe-go's package-level functions; tests inject +// an in-memory fake. +type StripeClient interface { + GetSubscription(ctx context.Context, id string) (*stripe.Subscription, error) + UpdateSubscriptionCancel(ctx context.Context, id string, cancelAtPeriodEnd bool, idempotencyKey string) (*stripe.Subscription, error) +} + +// Service owns the cancel/keep/auto-reverse logic. +type Service struct { + pool *pgxpool.Pool + stripe StripeClient + mailer email.Sender + signer *TokenSigner + now func() time.Time + log *slog.Logger +} + +func NewService(pool *pgxpool.Pool, sc StripeClient, m email.Sender, sn *TokenSigner, log *slog.Logger) *Service { + return &Service{pool: pool, stripe: sc, mailer: m, signer: sn, now: time.Now, log: log} +} +``` + +- [ ] **Step 4: Implement `HandleInvoiceUpcoming`** + +Append to `idleunsub.go`: + +```go +// HandleInvoiceUpcoming evaluates the trigger rule. Idempotent: safe to call +// multiple times for the same Stripe event. +func (s *Service) HandleInvoiceUpcoming(ctx context.Context, event stripe.Event) error { + // Extract subscription ID from the invoice.upcoming event payload. + subID := event.GetObjectValue("subscription") + if subID == "" { + s.log.Warn("invoice.upcoming missing subscription", "event_id", event.ID) + return nil + } + + // 1. Fetch the subscription (authoritative source for current period and status). + sub, err := s.stripe.GetSubscription(ctx, subID) + if err != nil { + mCancelError(ctx, "stripe_get_failed") + return fmt.Errorf("get subscription %s: %w", subID, err) + } + + // 2. Status / state early returns. + if sub.Status != stripe.SubscriptionStatusActive { + mCancelSkipped(ctx, "non_active_status") + return nil + } + if sub.CancelAtPeriodEnd { + mCancelSkipped(ctx, "already_canceled") + return nil + } + if len(sub.Items.Data) == 0 { + s.log.Warn("subscription has no items", "sub_id", subID) + return nil + } + item := sub.Items.Data[0] + periodStart := time.Unix(item.CurrentPeriodStart, 0).UTC() + periodEnd := time.Unix(item.CurrentPeriodEnd, 0).UTC() + if item.Price == nil || item.Price.Recurring == nil { + s.log.Warn("subscription item missing price.recurring", "sub_id", subID) + return nil + } + threshold := subtractInterval(periodStart, item.Price.Recurring.Interval) + + // 3. Look up our user by stripe customer ID. + custID := sub.Customer.ID + q := db.New(s.pool) + user, err := q.GetUserByStripeCustomer(ctx, pgxText(custID)) + if err != nil { + return fmt.Errorf("get user by stripe customer %s: %w", custID, err) + } + + // 4. Begin the cancel transaction. + tx, err := s.pool.BeginTx(ctx, pgx.TxOptions{}) + if err != nil { + mCancelError(ctx, "db_begin_failed") + return fmt.Errorf("begin tx: %w", err) + } + defer tx.Rollback(ctx) //nolint:errcheck // rollback after commit is a no-op + + qtx := db.New(tx) + + // 4a. Per-user mutex. + if _, err := qtx.LockUserForSubDecision(ctx, user.ID); err != nil { + mCancelError(ctx, "db_lock_failed") + return fmt.Errorf("lock user: %w", err) + } + + // 4b. Webhook event dedup (retry-storm protection). + claimed, err := qtx.TryClaimWebhookEvent(ctx, db.TryClaimWebhookEventParams{ + EventID: event.ID, EventType: string(event.Type), + }) + if err == pgx.ErrNoRows { + mCancelSkipped(ctx, "duplicate_event") + return nil + } + if err != nil { + mCancelError(ctx, "db_dedup_failed") + return fmt.Errorf("claim webhook event: %w", err) + } + _ = claimed // we got the row; proceed + + // 4c. Period-keyed dedup queries. + mostRecent, err := qtx.GetMostRecentKeptOrCanceledForPeriod(ctx, + db.GetMostRecentKeptOrCanceledForPeriodParams{ + UserID: user.ID, SubID: subID, PeriodStart: pgxTime(periodStart), + }) + if err != nil && err != pgx.ErrNoRows { + mCancelError(ctx, "db_dedup_query_failed") + return fmt.Errorf("most recent decision: %w", err) + } + if mostRecent == "subscription_kept" { + mCancelSkipped(ctx, "already_kept_this_period") + return nil + } + hasCanceled, err := qtx.HasAutoCanceledThisPeriod(ctx, + db.HasAutoCanceledThisPeriodParams{ + UserID: user.ID, SubID: subID, PeriodStart: pgxTime(periodStart), + }) + if err != nil { + mCancelError(ctx, "db_dedup_query_failed") + return fmt.Errorf("has auto canceled: %w", err) + } + if hasCanceled { + mCancelSkipped(ctx, "already_canceled_this_period") + return nil + } + + // 4d. Read state and compute trigger. + lastActive, err := qtx.GetUserLastActive(ctx, user.ID) + if err != nil { + return fmt.Errorf("get last active: %w", err) + } + if !lastActive.Valid { + mCancelSkipped(ctx, "no_activity_history") + return nil + } + if !lastActive.Time.Before(threshold) { + mCancelSkipped(ctx, "active_in_window") + return nil + } + if periodStart.Before(user.IdleEligibleAfter) { + mCancelSkipped(ctx, "grandfathered") + return nil + } + + // 5. Trigger fires: insert audit row + cache update inside the TX. + mdJSON, err := marshalCancelMetadata(subID, event.ID, periodStart, periodEnd) + if err != nil { + return fmt.Errorf("marshal cancel metadata: %w", err) + } + if err := qtx.InsertSubscriptionAutoCanceledEvent(ctx, + db.InsertSubscriptionAutoCanceledEventParams{ + UserID: user.ID, Metadata: mdJSON, + }); err != nil { + return fmt.Errorf("insert event row: %w", err) + } + if err := qtx.SetUserAutoCancelState(ctx, db.SetUserAutoCancelStateParams{ + ID: user.ID, SubCurrentPeriodStart: pgxTime(periodStart), + }); err != nil { + return fmt.Errorf("set cache: %w", err) + } + if err := tx.Commit(ctx); err != nil { + mCancelError(ctx, "db_commit_failed") + return fmt.Errorf("commit: %w", err) + } + + // 6. Stripe call OUTSIDE the transaction. Idempotency key keeps retries safe. + if _, err := s.stripe.UpdateSubscriptionCancel(ctx, subID, true, event.ID); err != nil { + mCancelError(ctx, "stripe_update_failed") + s.log.Error("stripe update failed after commit", + "sub_id", subID, "event_id", event.ID, "err", err) + // Cache is now ahead of Stripe. AutoReverse re-checks Stripe state, + // so this drift will self-heal on the user's next authed request. + return fmt.Errorf("stripe update: %w", err) + } + + mCancelFired(ctx, subID) + + // 7. Enqueue cancel email. + if err := s.enqueueCancelEmail(ctx, user, subID, periodEnd); err != nil { + mEmailEnqueue(ctx, "cancel", "error") + s.log.Error("cancel email enqueue failed", "user_id", user.ID, "err", err) + } else { + mEmailEnqueue(ctx, "cancel", "ok") + } + return nil +} + +// subtractInterval returns t minus one Stripe billing interval. +func subtractInterval(t time.Time, interval stripe.PriceRecurringInterval) time.Time { + switch interval { + case stripe.PriceRecurringIntervalDay: + return t.AddDate(0, 0, -1) + case stripe.PriceRecurringIntervalWeek: + return t.AddDate(0, 0, -7) + case stripe.PriceRecurringIntervalMonth: + return t.AddDate(0, -1, 0) + case stripe.PriceRecurringIntervalYear: + return t.AddDate(-1, 0, 0) + default: + return t.AddDate(0, -1, 0) // safe default + } +} + +// Helpers (placeholders — implement based on existing project conventions +// for pgtype.Text / pgtype.Timestamptz): +func pgxText(s string) pgtype.Text { return pgtype.Text{String: s, Valid: true} } +func pgxTime(t time.Time) pgtype.Timestamptz { + return pgtype.Timestamptz{Time: t, Valid: true} +} +``` + +(Add the `import "github.com/jackc/pgx/v5/pgtype"` to the import block.) + +- [ ] **Step 5: Implement `marshalCancelMetadata` and email enqueue stub** + +Append to `idleunsub.go`: + +```go +import "encoding/json" + +func marshalCancelMetadata(subID, eventID string, start, end time.Time) ([]byte, error) { + return json.Marshal(cancelMetadata{ + SubscriptionID: subID, + StripeEventID: eventID, + CurrentPeriodStart: start.UTC(), + CurrentPeriodEnd: end.UTC(), + }) +} + +// enqueueCancelEmail composes and enqueues the cancel email. +// Implementation lands in Task 9 alongside the templates; for now this is a +// stub so the cancel path compiles. Task 9 replaces it with the real impl. +func (s *Service) enqueueCancelEmail(ctx context.Context, user db.User, subID string, periodEnd time.Time) error { + return nil // placeholder; replaced in Task 9 +} +``` + +- [ ] **Step 6: Add a sqlc query needed above (`GetUserByStripeCustomer`)** + +This may already exist; if so, skip. Otherwise add to `sql/queries/users.sql`: + +```sql +-- name: GetUserByStripeCustomer :one +SELECT * FROM users WHERE stripe_customer_id = $1 AND deleted_at IS NULL; +``` + +Run `sqlc generate` after. + +- [ ] **Step 7: Write integration test for happy path** + +Create `internal/feat/idleunsub/stripestub_test.go`: + +```go +package idleunsub_test + +import ( + "context" + "testing" + + stripe "github.com/stripe/stripe-go/v82" + + "github.com/btc/drill/internal/feat/idleunsub" +) + +// fakeStripe is an in-memory StripeClient for tests. +type fakeStripe struct { + subs map[string]*stripe.Subscription + updateCalls []updateCall + updateErr error +} + +type updateCall struct { + ID string + CancelAtPeriodEnd bool + IdempotencyKey string +} + +func (f *fakeStripe) GetSubscription(ctx context.Context, id string) (*stripe.Subscription, error) { + if s, ok := f.subs[id]; ok { + return s, nil + } + return nil, stripe.ErrInvalidRequest +} + +func (f *fakeStripe) UpdateSubscriptionCancel(ctx context.Context, id string, cap bool, key string) (*stripe.Subscription, error) { + f.updateCalls = append(f.updateCalls, updateCall{id, cap, key}) + if f.updateErr != nil { + return nil, f.updateErr + } + if s, ok := f.subs[id]; ok { + s.CancelAtPeriodEnd = cap + return s, nil + } + return nil, stripe.ErrInvalidRequest +} + +var _ idleunsub.StripeClient = (*fakeStripe)(nil) +``` + +Then `internal/feat/idleunsub/idleunsub_test.go`: + +```go +package idleunsub_test + +import ( + "context" + "testing" + "time" + + "github.com/stretchr/testify/require" + stripe "github.com/stripe/stripe-go/v82" + + "github.com/btc/drill/internal/backendtest" + "github.com/btc/drill/internal/feat/idleunsub" +) + +func TestHandleInvoiceUpcoming_FiresCancel_WhenIdleTwoPeriods(t *testing.T) { + t.Parallel() + pgEnv := backendtest.NewPGEnv(t) // existing helper that gives a fresh DB + b := pgEnv.NewBackend(t) + ctx := context.Background() + userID := backendtest.SeedUser(t, b) + + // Set the user's Stripe customer ID and idle_eligible_after far in the past. + _, err := b.Pool().Exec(ctx, ` + UPDATE users + SET stripe_customer_id = $1, + idle_eligible_after = NOW() - INTERVAL '6 months', + plan = 'pro' + WHERE id = $2`, "cus_test", userID) + require.NoError(t, err) + + // Force MAX(last_active) to far in the past (3 months ago). + _, err = b.Pool().Exec(ctx, ` + UPDATE auth_sessions SET last_active = NOW() - INTERVAL '3 months' + WHERE user_id = $1`, userID) + require.NoError(t, err) + + // Fake Stripe sub: monthly, current period started 23 days ago. + now := time.Now().UTC().Truncate(time.Second) + periodStart := now.AddDate(0, 0, -23) + periodEnd := periodStart.AddDate(0, 1, 0) + subID := "sub_test" + fake := &fakeStripe{ + subs: map[string]*stripe.Subscription{ + subID: { + ID: subID, + Status: stripe.SubscriptionStatusActive, + CancelAtPeriodEnd: false, + Customer: &stripe.Customer{ID: "cus_test"}, + Items: &stripe.SubscriptionItemList{Data: []*stripe.SubscriptionItem{{ + CurrentPeriodStart: periodStart.Unix(), + CurrentPeriodEnd: periodEnd.Unix(), + Price: &stripe.Price{Recurring: &stripe.PriceRecurring{ + Interval: stripe.PriceRecurringIntervalMonth, + }}, + }}}, + }, + }, + } + + svc := idleunsub.NewService(b.Pool(), fake, &nullMailer{}, nil, b.Logger()) + + event := stripe.Event{ + ID: "evt_1", + Type: "invoice.upcoming", + Data: &stripe.EventData{Object: map[string]interface{}{ + "subscription": subID, + }}, + } + + require.NoError(t, svc.HandleInvoiceUpcoming(ctx, event)) + + // Stripe Update was called with cancel_at_period_end=true. + require.Len(t, fake.updateCalls, 1) + require.True(t, fake.updateCalls[0].CancelAtPeriodEnd) + require.Equal(t, "evt_1", fake.updateCalls[0].IdempotencyKey) + + // user_events row exists. + var count int + err = b.Pool().QueryRow(ctx, ` + SELECT COUNT(*) FROM user_events + WHERE user_id = $1 AND event_type = 'subscription_auto_canceled'`, + userID).Scan(&count) + require.NoError(t, err) + require.Equal(t, 1, count) + + // Cache flags flipped. + var cap, isAuto bool + err = b.Pool().QueryRow(ctx, + `SELECT sub_cancel_at_period_end, sub_cancel_is_auto FROM users WHERE id = $1`, + userID).Scan(&cap, &isAuto) + require.NoError(t, err) + require.True(t, cap) + require.True(t, isAuto) + + // Webhook dedup row claimed. + err = b.Pool().QueryRow(ctx, + `SELECT COUNT(*) FROM stripe_webhook_dedup WHERE event_id = 'evt_1'`).Scan(&count) + require.NoError(t, err) + require.Equal(t, 1, count) +} + +type nullMailer struct{} +func (nullMailer) Send(ctx context.Context, m email.Message) error { return nil } +``` + +(`backendtest.NewPGEnv` may need to be added if not present; if the project uses a different convention for spinning up a test DB, mirror what `internal/backend/billing_test.go` uses — typically `pg.NewBackend(t)`.) + +- [ ] **Step 8: Run the test to verify it passes** + +```bash +go test ./internal/feat/idleunsub/ -race -count=1 -v +``` + +Expected: PASS. + +- [ ] **Step 9: Add the early-return tests** + +Append to `idleunsub_test.go`: + +```go +func TestHandleInvoiceUpcoming_SkipsTrialing(t *testing.T) { + // Same setup but sub.Status = trialing → no Stripe Update call, no event row. + // Pattern: copy TestHandleInvoiceUpcoming_FiresCancel_WhenIdleTwoPeriods, + // change sub.Status, assert len(fake.updateCalls) == 0 and no user_events row. + // (Spec §3 early returns.) +} + +func TestHandleInvoiceUpcoming_SkipsAlreadyCanceled(t *testing.T) { + // sub.CancelAtPeriodEnd = true at fetch time → skip. +} + +func TestHandleInvoiceUpcoming_SkipsGrandfathered(t *testing.T) { + // idle_eligible_after = NOW() + 1 month (future) → skip even when idle. +} + +func TestHandleInvoiceUpcoming_DedupesRetryStorm(t *testing.T) { + // Call twice with same event.ID → second call is a no-op (one update call total). +} + +func TestHandleInvoiceUpcoming_DedupesPostKeep(t *testing.T) { + // Insert a subscription_kept user_events row for this period BEFORE calling + // HandleInvoiceUpcoming with a different event.ID → no re-cancel. +} + +func TestHandleInvoiceUpcoming_NoActivityHistory(t *testing.T) { + // User with no auth_sessions rows → MAX(last_active) is NULL → defensive skip. + // (Spec §8.1: "should be unreachable for a Pro user, but defensive".) +} + +func TestHandleInvoiceUpcoming_NotIdleStill(t *testing.T) { + // Activity within current period → trigger predicate false → skip. +} + +func TestHandleInvoiceUpcoming_FirstPeriodGrace(t *testing.T) { + // User just signed up; threshold predates their first activity → no cancel. +} +``` + +Implement each test by mirroring the happy-path test with the relevant precondition adjusted. Run all of them: + +```bash +go test ./internal/feat/idleunsub/ -race -count=1 -v +``` + +Expected: all PASS. + +- [ ] **Step 10: Commit** + +```bash +git add internal/feat/idleunsub/idleunsub.go \ + internal/feat/idleunsub/metadata.go \ + internal/feat/idleunsub/metrics.go \ + internal/feat/idleunsub/idleunsub_test.go \ + internal/feat/idleunsub/stripestub_test.go \ + sql/queries/users.sql internal/db/users.sql.go +git commit -m "idleunsub: HandleInvoiceUpcoming + transactional cancel decision" +``` + +--- + +### Task 8: `KeepSubscription` and `AutoReverse` + +**Files:** +- Modify: `internal/feat/idleunsub/idleunsub.go` (add two methods) +- Modify: `internal/feat/idleunsub/idleunsub_test.go` (new tests) + +- [ ] **Step 1: Implement `KeepSubscription`** + +Append to `idleunsub.go`: + +```go +// KeepSubscription reverses cancel_at_period_end after a verified, single-use +// keep-link click. The endpoint is responsible for verifying the token AND +// claiming single-use BEFORE invoking this method. Refuses to act if the +// stored cache says the cancel is NOT our auto-cancel (manual portal cancel). +func (s *Service) KeepSubscription(ctx context.Context, claims KeepTokenClaims) error { + q := db.New(s.pool) + gates, err := q.GetUserAutoCancelGates(ctx, claims.UserID) + if err != nil { + return fmt.Errorf("read gates: %w", err) + } + if !gates.SubCancelAtPeriodEnd || !gates.SubCancelIsAuto { + // Either the cancel was already reversed, or it's a manual portal cancel. + // Don't touch Stripe; render confirmation page idempotently. + return nil + } + updated, err := s.stripe.UpdateSubscriptionCancel(ctx, claims.SubscriptionID, false, "") + if err != nil { + return fmt.Errorf("stripe reverse: %w", err) + } + periodStart := time.Unix(updated.Items.Data[0].CurrentPeriodStart, 0).UTC() + + if err := q.ClearUserAutoCancelState(ctx, db.ClearUserAutoCancelStateParams{ + ID: claims.UserID, SubCurrentPeriodStart: pgxTime(periodStart), + }); err != nil { + return fmt.Errorf("clear gates: %w", err) + } + + mdJSON, err := marshalKeptMetadata(claims.SubscriptionID, "link", periodStart) + if err != nil { + return fmt.Errorf("marshal kept metadata: %w", err) + } + if err := q.InsertSubscriptionKeptEvent(ctx, db.InsertSubscriptionKeptEventParams{ + UserID: claims.UserID, Metadata: mdJSON, + }); err != nil { + return fmt.Errorf("insert kept event: %w", err) + } + + mReverseLink(ctx, claims.SubscriptionID) + + if err := s.enqueueKeptEmail(ctx, claims.UserID, claims.SubscriptionID, claims.CurrentPeriodEnd); err != nil { + mEmailEnqueue(ctx, "kept", "error") + s.log.Error("kept email enqueue failed", "user_id", claims.UserID, "err", err) + } else { + mEmailEnqueue(ctx, "kept", "ok") + } + return nil +} + +func marshalKeptMetadata(subID, via string, periodStart time.Time) ([]byte, error) { + return json.Marshal(keptMetadata{ + SubscriptionID: subID, + Via: via, + CurrentPeriodStart: periodStart.UTC(), + }) +} + +func (s *Service) enqueueKeptEmail(ctx context.Context, userID uuid.UUID, subID string, periodEnd time.Time) error { + return nil // placeholder; Task 9 +} +``` + +- [ ] **Step 2: Implement `AutoReverse`** + +Append: + +```go +// AutoReverse is invoked by the auth middleware when an authenticated request +// arrives from a user whose cached gates are SubCancelAtPeriodEnd && SubCancelIsAuto. +// The current request itself is the activity signal; we do NOT re-read +// last_active (would race against TouchAuthSession). +// +// AutoReverse verifies Stripe state before acting. If Stripe says the sub is +// NOT canceled (cache drift from a prior partial failure), AutoReverse silently +// clears the cache and returns without sending email or inserting an event row. +func (s *Service) AutoReverse(ctx context.Context, userID uuid.UUID) error { + q := db.New(s.pool) + gates, err := q.GetUserAutoCancelGates(ctx, userID) + if err != nil { + return fmt.Errorf("read gates: %w", err) + } + if !gates.SubCancelAtPeriodEnd || !gates.SubCancelIsAuto || !gates.StripeSubscriptionID.Valid { + return nil // cache says off, or no sub — nothing to do + } + subID := gates.StripeSubscriptionID.String + + // Verify Stripe state — handles the partial-failure window where our cache + // says canceled but Stripe never confirmed. + stripeSub, err := s.stripe.GetSubscription(ctx, subID) + if err != nil { + return fmt.Errorf("get sub: %w", err) + } + if !stripeSub.CancelAtPeriodEnd { + // Cache drift. Silently correct and return without email/event. + mCacheDriftCorrected(ctx, subID) + periodStart := time.Unix(stripeSub.Items.Data[0].CurrentPeriodStart, 0).UTC() + if err := q.ClearUserAutoCancelState(ctx, db.ClearUserAutoCancelStateParams{ + ID: userID, SubCurrentPeriodStart: pgxTime(periodStart), + }); err != nil { + return fmt.Errorf("clear cache (drift): %w", err) + } + // Note: ClearUserAutoCancelState sets pending_kept_banner=true even on drift + // correction. That's wrong for this case — we don't want to celebrate a + // reversal that didn't happen. Fix: use a dedicated clear-without-banner + // query, or split ClearUserAutoCancelState into two variants. + if err := q.ClearKeptBanner(ctx, userID); err != nil { + return fmt.Errorf("clear banner (drift): %w", err) + } + return nil + } + + // Real reversal. + updated, err := s.stripe.UpdateSubscriptionCancel(ctx, subID, false, "") + if err != nil { + return fmt.Errorf("stripe reverse: %w", err) + } + periodStart := time.Unix(updated.Items.Data[0].CurrentPeriodStart, 0).UTC() + periodEnd := time.Unix(updated.Items.Data[0].CurrentPeriodEnd, 0).UTC() + + if err := q.ClearUserAutoCancelState(ctx, db.ClearUserAutoCancelStateParams{ + ID: userID, SubCurrentPeriodStart: pgxTime(periodStart), + }); err != nil { + return fmt.Errorf("clear gates: %w", err) + } + + mdJSON, err := marshalKeptMetadata(subID, "auto_activity", periodStart) + if err != nil { + return fmt.Errorf("marshal kept metadata: %w", err) + } + if err := q.InsertSubscriptionKeptEvent(ctx, db.InsertSubscriptionKeptEventParams{ + UserID: userID, Metadata: mdJSON, + }); err != nil { + return fmt.Errorf("insert kept event: %w", err) + } + + mReverseActivity(ctx, subID) + + if err := s.enqueueKeptEmail(ctx, userID, subID, periodEnd); err != nil { + mEmailEnqueue(ctx, "kept", "error") + s.log.Error("kept email enqueue failed", "user_id", userID, "err", err) + } else { + mEmailEnqueue(ctx, "kept", "ok") + } + return nil +} +``` + +- [ ] **Step 3: Write tests** + +Append to `idleunsub_test.go`: + +```go +func TestKeepSubscription_HappyPath(t *testing.T) { + // Seed an auto-canceled state. Call KeepSubscription. Assert: + // - Stripe Update called with false + // - cache flipped (sub_cancel_at_period_end=false, sub_cancel_is_auto=false) + // - subscription_kept user_events row inserted with current_period_start + // - pending_kept_banner = true +} + +func TestKeepSubscription_RefusesManualCancel(t *testing.T) { + // Seed sub_cancel_at_period_end=true, sub_cancel_is_auto=FALSE. + // Call KeepSubscription. Assert: no Stripe call, no event row, no cache change. +} + +func TestKeepSubscription_Idempotent(t *testing.T) { + // Call twice. Second call is a no-op (cache already cleared after first). + // One Stripe call, one event row, one email enqueued. +} + +func TestAutoReverse_GateOff(t *testing.T) { + // User with sub_cancel_at_period_end=false → AutoReverse is no-op. + // (No Stripe call, no event row.) +} + +func TestAutoReverse_HappyPath(t *testing.T) { + // Seed auto-canceled state, Stripe state agrees (CancelAtPeriodEnd=true). + // Call AutoReverse. Assert: Stripe Update called, cache cleared, banner set, + // email enqueued, subscription_kept event with via='auto_activity'. +} + +func TestAutoReverse_CacheDriftCorrected(t *testing.T) { + // Seed auto-canceled state in DB, but Stripe says CancelAtPeriodEnd=false + // (simulating partial-failure window). + // Call AutoReverse. Assert: NO Stripe Update call, NO email, + // NO subscription_kept event, cache silently cleared, banner NOT set. +} + +func TestAutoReverse_RefusesManualCancel(t *testing.T) { + // Seed sub_cancel_at_period_end=true but sub_cancel_is_auto=false. + // AutoReverse must be a no-op — we don't touch a manual portal cancel. +} +``` + +Implement each test by setting up the appropriate state and calling the methods. + +- [ ] **Step 4: Run tests** + +```bash +go test ./internal/feat/idleunsub/ -race -count=1 -v +``` + +Expected: all PASS. + +- [ ] **Step 5: Commit** + +```bash +git add internal/feat/idleunsub/idleunsub.go internal/feat/idleunsub/idleunsub_test.go +git commit -m "idleunsub: KeepSubscription + AutoReverse with cache-drift correction" +``` + +--- + +### Task 9: Email composition + templates + +**Files:** +- Create: `internal/feat/idleunsub/email.go` +- Create: `internal/feat/idleunsub/templates/cancel.html.tmpl` +- Create: `internal/feat/idleunsub/templates/cancel.txt.tmpl` +- Create: `internal/feat/idleunsub/templates/kept.html.tmpl` +- Create: `internal/feat/idleunsub/templates/kept.txt.tmpl` +- Create: `internal/feat/idleunsub/email_test.go` +- Modify: `internal/feat/idleunsub/idleunsub.go` (replace the placeholder enqueue functions) + +- [ ] **Step 1: Create the cancel templates** + +`internal/feat/idleunsub/templates/cancel.txt.tmpl`: + +``` +Subject: We won't charge you for the next period + +Hi {{.FirstName}}, + +We noticed you haven't been around Sabermatic in the last two billing +periods, so we've stopped your auto-renewal. You'll keep access through +{{.CurrentPeriodEnd}} — we won't charge you for the next period. + +If you'd like to keep your subscription active, one click does it: + + {{.KeepLink}} + +If you're done for now, no action needed. We'll be here whenever you +want to come back. + +— Sabermatic +``` + +`cancel.html.tmpl`: same content, marked up minimally (one `

` per paragraph; `Keep my subscription` for the link). Keep it simple — no styling beyond what's in the existing wrapper template (`internal/email/templates/wrapper.html`). + +- [ ] **Step 2: Create the kept templates** + +`internal/feat/idleunsub/templates/kept.txt.tmpl`: + +``` +Subject: Your subscription is still active + +Hi {{.FirstName}}, + +You're all set — your Sabermatic subscription will renew normally on +{{.NextRenewalDate}}. + +Welcome back. + +— Sabermatic +``` + +`kept.html.tmpl`: same, minimal markup. + +- [ ] **Step 3: Implement `email.go`** + +```go +package idleunsub + +import ( + "bytes" + "context" + "embed" + "fmt" + "html/template" + "text/template" + "time" + + "github.com/btc/drill/internal/email" +) + +//go:embed templates/* +var templatesFS embed.FS + +type cancelEmailData struct { + FirstName string + CurrentPeriodEnd string // formatted like "March 1, 2026" + KeepLink string +} + +type keptEmailData struct { + FirstName string + NextRenewalDate string +} + +// composeCancelEmail renders the cancel email and returns an email.Message. +func composeCancelEmail(toEmail, firstName, keepURL string, periodEnd time.Time) (email.Message, error) { + data := cancelEmailData{ + FirstName: firstName, + CurrentPeriodEnd: periodEnd.Format("January 2, 2006"), + KeepLink: keepURL, + } + return renderEmail("cancel", toEmail, "We won't charge you for the next period", data) +} + +func composeKeptEmail(toEmail, firstName string, nextRenewal time.Time) (email.Message, error) { + data := keptEmailData{ + FirstName: firstName, + NextRenewalDate: nextRenewal.Format("January 2, 2006"), + } + return renderEmail("kept", toEmail, "Your subscription is still active", data) +} + +func renderEmail(name, to, subject string, data interface{}) (email.Message, error) { + htmlT, err := template.ParseFS(templatesFS, "templates/"+name+".html.tmpl") + if err != nil { + return email.Message{}, fmt.Errorf("parse %s.html: %w", name, err) + } + txtT, err := texttemplate.ParseFS(templatesFS, "templates/"+name+".txt.tmpl") + if err != nil { + return email.Message{}, fmt.Errorf("parse %s.txt: %w", name, err) + } + var htmlBuf, txtBuf bytes.Buffer + if err := htmlT.Execute(&htmlBuf, data); err != nil { + return email.Message{}, fmt.Errorf("execute %s.html: %w", name, err) + } + if err := txtT.Execute(&txtBuf, data); err != nil { + return email.Message{}, fmt.Errorf("execute %s.txt: %w", name, err) + } + return email.Message{ + To: to, Subject: subject, + HTMLBody: htmlBuf.String(), TextBody: txtBuf.String(), + }, nil +} +``` + +(Note: the import block needs both `html/template` and `text/template` — alias them as the snippet shows: `template` for HTML, `texttemplate` for text. If your project already has a different rendering helper, mirror it.) + +- [ ] **Step 4: Replace the placeholder enqueue functions in `idleunsub.go`** + +```go +// Replace `enqueueCancelEmail` placeholder with: +func (s *Service) enqueueCancelEmail(ctx context.Context, user db.User, subID string, periodEnd time.Time) error { + keepURL, err := s.buildKeepURL(user.ID, subID, periodEnd) + if err != nil { + return fmt.Errorf("build keep url: %w", err) + } + msg, err := composeCancelEmail(user.Email, user.FirstName(), keepURL, periodEnd) + if err != nil { + return fmt.Errorf("compose cancel: %w", err) + } + return s.mailer.Send(ctx, msg) +} + +// Replace `enqueueKeptEmail`: +func (s *Service) enqueueKeptEmail(ctx context.Context, userID uuid.UUID, subID string, periodEnd time.Time) error { + q := db.New(s.pool) + user, err := q.GetUser(ctx, userID) + if err != nil { + return fmt.Errorf("get user: %w", err) + } + msg, err := composeKeptEmail(user.Email, user.FirstName(), periodEnd) + if err != nil { + return fmt.Errorf("compose kept: %w", err) + } + return s.mailer.Send(ctx, msg) +} + +// buildKeepURL constructs the public keep-link URL with a signed token. +func (s *Service) buildKeepURL(userID uuid.UUID, subID string, periodEnd time.Time) (string, error) { + if s.signer == nil { + return "", fmt.Errorf("token signer not configured") + } + periodEnd = periodEnd.UTC().Truncate(time.Second) + tok := s.signer.Sign(KeepTokenClaims{ + UserID: userID, + SubscriptionID: subID, + Action: "keep_subscription", + CurrentPeriodEnd: periodEnd, + IssuedAt: s.now().Unix(), + ExpiresAt: periodEnd.Unix(), + }) + return fmt.Sprintf("https://sabermatic.dev/sub/keep?t=%s", tok), nil +} +``` + +(`db.User.FirstName()` may not exist. If display_name is "Jane Doe", split on first space; or use display_name directly. Inspect `internal/db/models.go` for the User struct and pick a sensible field.) + +- [ ] **Step 5: Write tests** + +`internal/feat/idleunsub/email_test.go`: + +```go +package idleunsub + +import ( + "strings" + "testing" + "time" + + "github.com/stretchr/testify/require" +) + +func TestComposeCancelEmail_Renders(t *testing.T) { + t.Parallel() + end := time.Date(2026, 3, 1, 0, 0, 0, 0, time.UTC) + msg, err := composeCancelEmail("user@example.com", "Jane", + "https://sabermatic.dev/sub/keep?t=abc.def", end) + require.NoError(t, err) + require.Equal(t, "user@example.com", msg.To) + require.Contains(t, msg.TextBody, "Hi Jane,") + require.Contains(t, msg.TextBody, "March 1, 2026") + require.Contains(t, msg.TextBody, "https://sabermatic.dev/sub/keep?t=abc.def") + require.Contains(t, msg.HTMLBody, "Jane") +} + +func TestComposeKeptEmail_Renders(t *testing.T) { + t.Parallel() + end := time.Date(2026, 3, 1, 0, 0, 0, 0, time.UTC) + msg, err := composeKeptEmail("user@example.com", "Jane", end) + require.NoError(t, err) + require.Contains(t, msg.TextBody, "March 1, 2026") +} +``` + +- [ ] **Step 6: Run tests** + +```bash +go test ./internal/feat/idleunsub/ -race -count=1 -v +``` + +Expected: all PASS, including the previously written cancel/keep happy-path tests now exercising real email composition. + +- [ ] **Step 7: Commit** + +```bash +git add internal/feat/idleunsub/email.go \ + internal/feat/idleunsub/email_test.go \ + internal/feat/idleunsub/templates/ \ + internal/feat/idleunsub/idleunsub.go +git commit -m "idleunsub: email composition + templates" +``` + +--- + +## Phase 4: Integration + +### Task 10: Stripe webhook integration + +**Files:** +- Modify: `internal/backend/billing.go` (add `case "invoice.upcoming"` and `case "customer.subscription.created"`; extend `handleSubscriptionUpdated` to call `SyncSubStateFromWebhook`; add `handleSubscriptionDeleted` to call `ClearSubStateOnDeletion`) +- Modify: `internal/backend/backend.go` (wire `idleunsub.Service` into the `Backend` struct constructor) +- Test: `internal/backend/billing_test.go` (add tests for the new dispatch cases and the sync behavior) + +- [ ] **Step 1: Wire `idleunsub.Service` into `Backend`** + +In `internal/backend/backend.go`, find the `Backend` struct definition and add a field: + +```go +type Backend struct { + // ... existing fields ... + idleunsub *idleunsub.Service +} +``` + +Find the `NewBackend` (or equivalent) constructor and accept/construct the service. The exact signature depends on existing setup; add the parameter at the end of the existing argument list. In `cmd/.../main.go` (or wherever Backend is constructed), instantiate `idleunsub.NewService` with the pool, a real Stripe client wrapper, the email sender, the token signer, and the logger. + +- [ ] **Step 2: Add `case "invoice.upcoming"` to `HandleStripeWebhook`** + +In `internal/backend/billing.go`, find the switch: + +```go +switch event.Type { +case "checkout.session.completed": + return b.handleCheckoutCompleted(ctx, event) +case "invoice.paid": + return b.handleInvoicePaid(ctx, event) +case "invoice.upcoming": // NEW + return b.idleunsub.HandleInvoiceUpcoming(ctx, event) +case "customer.subscription.created": // NEW — same body as updated + return b.handleSubscriptionUpdated(ctx, event) +case "customer.subscription.deleted": + return b.handleSubscriptionDeleted(ctx, event) +case "customer.subscription.updated": + return b.handleSubscriptionUpdated(ctx, event) +default: + slog.Info("unhandled stripe event", "type", event.Type, "id", event.ID) + return nil +} +``` + +- [ ] **Step 3: Extend `handleSubscriptionUpdated` to call `SyncSubStateFromWebhook`** + +After the existing body (which calls `UpdatePlanByStripeCustomer`), parse the subscription from the event and call: + +```go +sub := &stripe.Subscription{} +if err := json.Unmarshal(event.Data.Raw, sub); err != nil { + return fmt.Errorf("unmarshal subscription: %w", err) +} +custID := sub.Customer.ID +user, err := b.queries.GetUserByStripeCustomer(ctx, pgxText(custID)) +if err != nil { + return fmt.Errorf("get user by customer: %w", err) +} +var periodStart pgtype.Timestamptz +if len(sub.Items.Data) > 0 { + periodStart = pgxTime(time.Unix(sub.Items.Data[0].CurrentPeriodStart, 0).UTC()) +} +if err := b.queries.SyncSubStateFromWebhook(ctx, db.SyncSubStateFromWebhookParams{ + ID: user.ID, + StripeSubscriptionID: pgxText(sub.ID), + SubCancelAtPeriodEnd: sub.CancelAtPeriodEnd, + SubCurrentPeriodStart: periodStart, +}); err != nil { + return fmt.Errorf("sync sub state: %w", err) +} +``` + +The order matters: keep existing plan-update behavior, then sync. Both can fail independently — if plan update succeeds and sync fails, retry of the webhook will re-attempt sync (idempotent UPDATE). + +**Note** the SyncSubStateFromWebhook query intentionally does NOT touch `sub_cancel_is_auto`. This means after our cancel path sets both flags, a subsequent webhook with `cancel_at_period_end=true` overwrites only `sub_cancel_at_period_end` (with the same `true` value) and `sub_cancel_is_auto` stays `true`. Round-trip correct. + +- [ ] **Step 4: Add `handleSubscriptionDeleted` body (or extend existing)** + +If a `handleSubscriptionDeleted` already exists, extend it; otherwise add: + +```go +func (b *Backend) handleSubscriptionDeleted(ctx context.Context, event stripe.Event) error { + sub := &stripe.Subscription{} + if err := json.Unmarshal(event.Data.Raw, sub); err != nil { + return fmt.Errorf("unmarshal subscription: %w", err) + } + user, err := b.queries.GetUserByStripeCustomer(ctx, pgxText(sub.Customer.ID)) + if err != nil { + return fmt.Errorf("get user by customer: %w", err) + } + return b.queries.ClearSubStateOnDeletion(ctx, user.ID) +} +``` + +- [ ] **Step 5: Tests** + +In `internal/backend/billing_test.go`, add: + +```go +func TestHandleSubscriptionUpdated_SyncsCacheColumns(t *testing.T) { + // Fire an updated event with cancel_at_period_end=true and a known + // current_period_start. Assert users row reflects both. +} + +func TestHandleSubscriptionUpdated_DoesNotTouchSubCancelIsAuto(t *testing.T) { + // Pre-set users.sub_cancel_is_auto=true. Fire updated event. + // Assert users.sub_cancel_is_auto is STILL true after sync. +} + +func TestHandleSubscriptionDeleted_ClearsCache(t *testing.T) { + // Pre-set all cache columns. Fire deleted event. + // Assert users plan='free' and all cache columns cleared. +} + +func TestWebhookSwitch_InvoiceUpcoming_RoutesToIdleunsub(t *testing.T) { + // Construct a minimal invoice.upcoming event. Call HandleStripeWebhook. + // Assert idleunsub.HandleInvoiceUpcoming was invoked (use a recorder + // version of Service or fake Stripe and check the side effects). +} + +func TestWebhookSwitch_SubscriptionCreated_RoutesToUpdated(t *testing.T) { + // Construct subscription.created event. Assert SyncSubStateFromWebhook ran. +} +``` + +- [ ] **Step 6: Run all tests** + +```bash +go test ./internal/... -race -count=1 -timeout=300s +``` + +Expected: all pass. + +- [ ] **Step 7: Commit** + +```bash +git add internal/backend/billing.go internal/backend/backend.go \ + internal/backend/billing_test.go cmd/ +git commit -m "billing: route invoice.upcoming and subscription.created to idleunsub" +``` + +--- + +### Task 11: `/sub/keep` endpoint + +**Files:** +- Create: `internal/handler/keep.go` +- Modify: `cmd/.../main.go` (or wherever HTTP routes are mounted) to register the new endpoint +- Create: `internal/handler/keep_test.go` + +- [ ] **Step 1: Implement `keep.go`** + +```go +package handler + +import ( + "errors" + "fmt" + "html/template" + "net/http" + + "github.com/btc/drill/internal/db" + "github.com/btc/drill/internal/feat/idleunsub" +) + +// KeepHandler serves GET /sub/keep?t=. Mounted publicly (no RequireAuth). +type KeepHandler struct { + signer *idleunsub.TokenSigner + svc *idleunsub.Service + pool *pgxpool.Pool // for the single-use claim query +} + +func NewKeepHandler(signer *idleunsub.TokenSigner, svc *idleunsub.Service, pool *pgxpool.Pool) *KeepHandler { + return &KeepHandler{signer: signer, svc: svc, pool: pool} +} + +func (h *KeepHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { + tok := r.URL.Query().Get("t") + claims, err := h.signer.Verify(tok) + switch { + case errors.Is(err, idleunsub.ErrTokenInvalid): + renderInvalidLink(w) + return + case errors.Is(err, idleunsub.ErrTokenExpired): + renderPeriodEnded(w) + return + case err != nil: + http.Error(w, "internal error", http.StatusInternalServerError) + return + } + + // Single-use claim. + tokenHash := sha256Of(tok) // returns []byte + claimed, err := db.New(h.pool).TryClaimKeepToken(ctx, db.TryClaimKeepTokenParams{ + TokenHash: tokenHash, + UserID: claims.UserID, + }) + if err != nil && err != pgx.ErrNoRows { + http.Error(w, "internal error", http.StatusInternalServerError) + return + } + if err == pgx.ErrNoRows { + // Token was already used — render the same confirmation page using + // claims.CurrentPeriodEnd (still in the verified token). + renderKeptConfirmation(w, claims.CurrentPeriodEnd) + return + } + _ = claimed + + if err := h.svc.KeepSubscription(r.Context(), claims); err != nil { + http.Error(w, "internal error", http.StatusInternalServerError) + return + } + renderKeptConfirmation(w, claims.CurrentPeriodEnd) +} + +// Render functions (use html/template; templates can live in +// internal/handler/templates/ or be inline strings — match existing convention). +func renderInvalidLink(w http.ResponseWriter) { /* 400 */ } +func renderPeriodEnded(w http.ResponseWriter) { /* 200, "your sub already ended" */ } +func renderKeptConfirmation(w http.ResponseWriter, end time.Time) { /* 200, "you're all set, next renewal: " */ } +func sha256Of(s string) []byte { /* sha256 helper */ } +``` + +(Fill in the rendering helpers using minimal HTML; the spec doesn't require pretty pages but they should not be empty.) + +- [ ] **Step 2: Mount the route** + +In `cmd/.../main.go` (search for where other handlers like `/api/auth/...` are mounted), add: + +```go +mux.Handle("GET /sub/keep", handler.NewKeepHandler(tokenSigner, idleunsubSvc, pool)) +``` + +The route is mounted publicly — do NOT wrap in `RequireAuth`. The token IS the auth. + +- [ ] **Step 3: Tests** + +`internal/handler/keep_test.go`: + +```go +func TestKeep_HappyPath(t *testing.T) // valid token, first click → reverses + renders confirmation +func TestKeep_TamperedToken(t *testing.T) // bad signature → 400 +func TestKeep_ExpiredToken(t *testing.T) // expired but valid signature → 200 with period-ended page +func TestKeep_Replay(t *testing.T) // valid token used twice → second renders same page, no Stripe call +func TestKeep_RefusesManualCancel(t *testing.T) // sub_cancel_is_auto=false → no Stripe call (KeepSubscription handles this) +``` + +- [ ] **Step 4: Run tests** + +```bash +go test ./internal/handler/ -race -count=1 -v +``` + +- [ ] **Step 5: Commit** + +```bash +git add internal/handler/keep.go internal/handler/keep_test.go cmd/ +git commit -m "handler: GET /sub/keep with single-use token claim" +``` + +--- + +### Task 12: AutoReverse hook in `Backend.AuthenticateSession` + +**Files:** +- Modify: `internal/backend/backend.go:267-285` (add second fire-and-forget goroutine) +- Modify: `internal/backend/auth_test.go` (test that AutoReverse fires when gates are set) + +- [ ] **Step 1: Add the goroutine** + +In `Backend.AuthenticateSession`, immediately after the existing `TouchAuthSession` goroutine: + +```go +// Existing TouchAuthSession goroutine — leave unchanged. +go func() { + touchCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + defer cancel() + queries.TouchAuthSession(touchCtx, row.ID) +}() + +// NEW: AutoReverse hook for users with our auto-cancel set. +if row.SubCancelAtPeriodEnd && row.SubCancelIsAuto { + go func() { + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + defer cancel() + if err := b.idleunsub.AutoReverse(ctx, row.UserID); err != nil { + b.log.Warn("auto-reverse failed", "user_id", row.UserID, "err", err) + } + }() +} +``` + +- [ ] **Step 2: Test** + +```go +func TestAuthenticateSession_FiresAutoReverseWhenGatesSet(t *testing.T) { + // Seed user with sub_cancel_at_period_end=true, sub_cancel_is_auto=true, + // and a fake Stripe that says CancelAtPeriodEnd=true. + // Authenticate. Wait briefly for the goroutine. Assert cache cleared, + // banner set, subscription_kept event row exists. +} + +func TestAuthenticateSession_DoesNotFireAutoReverseForManualCancel(t *testing.T) { + // sub_cancel_at_period_end=true but sub_cancel_is_auto=false. + // Authenticate. Assert no Stripe call, no event row, cache unchanged. +} +``` + +(Goroutine timing in tests: a brief sleep or a `require.Eventually` polling assertion.) + +- [ ] **Step 3: Run tests** + +```bash +go test ./internal/backend/ -race -count=1 -v +``` + +- [ ] **Step 4: Commit** + +```bash +git add internal/backend/backend.go internal/backend/auth_test.go +git commit -m "auth: fire idleunsub.AutoReverse on authed requests when gates set" +``` + +--- + +## Phase 5: RPC + Frontend + +### Task 13: User-info RPC + `AckKeptBanner` RPC + +**Files:** +- Modify: `pb/drill/v1/user.proto` (add `pending_kept_banner` to user-info response; add `AckKeptBanner` RPC method) +- Run: `buf generate` after proto changes +- Modify: `internal/rpc/user/server.go` (project the new field; implement `AckKeptBanner`) +- Modify: `web/src/pb/...` (regenerated by buf — should be automatic) +- Test: existing user-info RPC tests + a new `TestAckKeptBanner` test + +- [ ] **Step 1: Update the proto** + +In `pb/drill/v1/user.proto`, find the user-info message (search for `GetUser`, `WhoAmI`, or similar) and add: + +```proto +message User { + // ... existing fields ... + bool pending_kept_banner = 20; // tag picks the next free number +} + +// Add the new RPC: +service UserService { + // ... existing methods ... + rpc AckKeptBanner(AckKeptBannerRequest) returns (AckKeptBannerResponse); +} + +message AckKeptBannerRequest {} +message AckKeptBannerResponse {} +``` + +(Use the actual existing service / message names — search the file.) + +- [ ] **Step 2: Run buf generate** + +```bash +buf generate +``` + +Expected: regenerates `internal/pb/drill/v1/user.pb.go` and the corresponding ConnectRPC service files, plus `web/src/pb/drill/v1/user_pb.ts` and `_connect.ts`. + +- [ ] **Step 3: Project `pending_kept_banner` into the user-info response** + +In `internal/rpc/user/server.go`, find the user-info handler and add the field to the response: + +```go +return &drillv1.User{ + // ... existing fields ... + PendingKeptBanner: user.PendingKeptBanner, +}, nil +``` + +- [ ] **Step 4: Implement `AckKeptBanner`** + +```go +func (s *Server) AckKeptBanner(ctx context.Context, req *connect.Request[drillv1.AckKeptBannerRequest]) (*connect.Response[drillv1.AckKeptBannerResponse], error) { + user := auth.UserFromContext(ctx.Context()) + if user == nil { + return nil, connect.NewError(connect.CodeUnauthenticated, errors.New("not authenticated")) + } + if err := s.queries.ClearKeptBanner(ctx.Context(), user.ID); err != nil { + return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("clear banner: %w", err)) + } + return connect.NewResponse(&drillv1.AckKeptBannerResponse{}), nil +} +``` + +- [ ] **Step 5: Tests** + +```go +func TestAckKeptBanner_ClearsFlag(t *testing.T) { + // Seed a user with pending_kept_banner=true. + // Call AckKeptBanner. + // Assert: pending_kept_banner=false in DB. +} + +func TestAckKeptBanner_RequiresAuth(t *testing.T) { + // Call without auth context → CodeUnauthenticated. +} +``` + +- [ ] **Step 6: Run all tests** + +```bash +make test +``` + +Expected: full pipeline (buf lint, codegen check, frontend tsc, lint, vitest, backend tests) passes. + +- [ ] **Step 7: Commit** + +```bash +git add pb/drill/v1/user.proto internal/pb/ web/src/pb/ \ + internal/rpc/user/server.go +git commit -m "user: pending_kept_banner field and AckKeptBanner RPC" +``` + +--- + +### Task 14: Frontend KeptBanner component + +**Files:** +- Create: `web/src/components/KeptBanner.tsx` +- Modify: app shell (e.g., `web/src/App.tsx` or wherever the top-level layout lives) to mount `` in a place visible after auth + +- [ ] **Step 1: Implement the banner component** + +```tsx +import { useState, useEffect } from "react"; +import { useUserStore } from "../store/user"; // existing convention; adjust import +import { userClient } from "../api/client"; // existing ConnectRPC client + +export function KeptBanner() { + const user = useUserStore((s) => s.user); + const [dismissed, setDismissed] = useState(false); + + if (!user?.pendingKeptBanner || dismissed) return null; + + const dismiss = async () => { + setDismissed(true); // optimistic + try { + await userClient.ackKeptBanner({}); + } catch (e) { + console.warn("ackKeptBanner failed", e); + // banner stays dismissed in this session; next page load reads server state + } + }; + + return ( +

+ Welcome back — we kept your subscription active. + Manage subscription + +
+ ); +} +``` + +(Match existing styling conventions — inspect a sibling component to see whether the project uses CSS modules, Tailwind, or plain CSS.) + +- [ ] **Step 2: Mount it** + +In the top-level layout component (likely `web/src/App.tsx` or `web/src/components/AppShell.tsx`), import and render `` somewhere users see it on next page load — typically just inside the authenticated-routes wrapper. + +- [ ] **Step 3: Frontend typecheck** + +```bash +cd web && npx tsc -b +``` + +Expected: no errors. + +- [ ] **Step 4: Frontend lint** + +```bash +cd web && npm run lint +``` + +Expected: no errors. (If your project uses a different lint command, mirror it.) + +- [ ] **Step 5: Browser smoke test** + +Per project convention (CLAUDE.md: "After any UI-affecting commit, load the page in the browser and verify before moving on"): + +```bash +make dev # starts overmind: vite + air on :8080 +``` + +Manually: +1. Sign in as a test user. +2. Set `users.pending_kept_banner = TRUE` for that user via psql: + ```sql + UPDATE users SET pending_kept_banner = TRUE WHERE email = 'your-test@example.com'; + ``` +3. Reload the page. Banner appears. +4. Click "×" — banner disappears. +5. Reload again — banner stays gone (server-side flag was cleared via AckKeptBanner). + +- [ ] **Step 6: Commit** + +```bash +git add web/src/components/KeptBanner.tsx web/src/App.tsx +git commit -m "web: KeptBanner component for post-reverse welcome-back UX" +``` + +--- + +## Phase 6: Verification + +### Task 15: End-to-end integration test + full `make test` cleanup + +**Files:** +- Create: `internal/feat/idleunsub/integration_test.go` (an end-to-end test that exercises the full path: webhook → cancel → email → keep-link click → reverse) +- Possibly: cleanup of any test leftovers from prior tasks + +- [ ] **Step 1: Write the end-to-end happy-path test** + +```go +//go:build integration + +package idleunsub_test + +import ( + "context" + "net/http/httptest" + "testing" + "time" + + "github.com/stretchr/testify/require" + stripe "github.com/stripe/stripe-go/v82" + + "github.com/btc/drill/internal/backendtest" + "github.com/btc/drill/internal/feat/idleunsub" + "github.com/btc/drill/internal/handler" +) + +func TestE2E_CancelAndKeepViaLink(t *testing.T) { + t.Parallel() + b := pg.NewBackend(t) + ctx := context.Background() + userID := backendtest.SeedUser(t, b) + subID := "sub_e2e" + + // Set up the user as a Pro subscriber idle for 2+ periods. + _, err := b.Pool().Exec(ctx, ` + UPDATE users + SET stripe_customer_id = $1, + stripe_subscription_id = $2, + plan = 'pro', + idle_eligible_after = NOW() - INTERVAL '6 months' + WHERE id = $3`, "cus_e2e", subID, userID) + require.NoError(t, err) + _, err = b.Pool().Exec(ctx, ` + UPDATE auth_sessions SET last_active = NOW() - INTERVAL '3 months' + WHERE user_id = $1`, userID) + require.NoError(t, err) + + // Fake Stripe with a sub that's mid-monthly-cycle. + now := time.Now().UTC().Truncate(time.Second) + periodStart := now.AddDate(0, 0, -23) + periodEnd := periodStart.AddDate(0, 1, 0) + fake := newFakeStripeWithSub(subID, "cus_e2e", periodStart, periodEnd) + + signer := idleunsub.NewTokenSigner([]byte("test-key")) + mailer := newRecordingMailer() + svc := idleunsub.NewService(b.Pool(), fake, mailer, signer, b.Logger()) + + // 1. Fire invoice.upcoming → sub gets canceled. + require.NoError(t, svc.HandleInvoiceUpcoming(ctx, makeUpcomingEvent("evt_1", subID))) + require.True(t, fake.subs[subID].CancelAtPeriodEnd) + require.Len(t, mailer.sent, 1) // cancel email enqueued + require.Contains(t, mailer.sent[0].Subject, "next period") + + // 2. Extract the keep link from the email body. + keepURL := extractKeepURL(t, mailer.sent[0].TextBody) + + // 3. Click the link → /sub/keep endpoint. + mux := http.NewServeMux() + mux.Handle("GET /sub/keep", handler.NewKeepHandler(signer, svc, b.Pool())) + server := httptest.NewServer(mux) + defer server.Close() + + resp, err := http.Get(server.URL + "/sub/keep?t=" + extractToken(keepURL)) + require.NoError(t, err) + require.Equal(t, http.StatusOK, resp.StatusCode) + + // 4. Verify reversal happened. + require.False(t, fake.subs[subID].CancelAtPeriodEnd) + require.Len(t, mailer.sent, 2) // kept email enqueued + require.Contains(t, mailer.sent[1].Subject, "still active") + + // 5. Verify cache and event rows. + var cap, isAuto, banner bool + err = b.Pool().QueryRow(ctx, ` + SELECT sub_cancel_at_period_end, sub_cancel_is_auto, pending_kept_banner + FROM users WHERE id = $1`, userID).Scan(&cap, &isAuto, &banner) + require.NoError(t, err) + require.False(t, cap) + require.False(t, isAuto) + require.True(t, banner) + + // 6. Replay the keep link → idempotent (still 200, no second email). + resp2, err := http.Get(server.URL + "/sub/keep?t=" + extractToken(keepURL)) + require.NoError(t, err) + require.Equal(t, http.StatusOK, resp2.StatusCode) + require.Len(t, mailer.sent, 2) // STILL 2; no new email +} +``` + +(Helpers `newFakeStripeWithSub`, `newRecordingMailer`, `makeUpcomingEvent`, `extractKeepURL`, `extractToken`: small constructors. Implement at the bottom of the test file.) + +- [ ] **Step 2: Add the auto-reverse e2e test** + +```go +func TestE2E_AutoReverseOnLogin(t *testing.T) { + // Same setup, but instead of clicking the link, simulate an authed request + // by calling Backend.AuthenticateSession with a valid session token. + // Wait briefly for the goroutine. Assert reversal + banner set + email sent. +} +``` + +- [ ] **Step 3: Run all tests** + +```bash +go test -tags=integration ./internal/feat/idleunsub/ -race -count=1 -v +make test +``` + +Expected: full pipeline passes. + +- [ ] **Step 4: Manually verify metrics and logs** + +Run `make dev` and trigger a cancel by hand: + +1. Use `stripe trigger invoice.upcoming` (Stripe CLI) against the local webhook, OR +2. Insert a fixture event into the dev DB manually and call the webhook handler. + +Watch for: +- `idleunsub.cancel.fired` counter increments +- `subscription_auto_canceled` row in `user_events` +- Cancel email visible in the local mail log (Mailgun stub or `LogSender`) + +- [ ] **Step 5: Final commit** + +```bash +git add internal/feat/idleunsub/integration_test.go +git commit -m "idleunsub: end-to-end integration tests for cancel + reverse" +``` + +--- + +## Acceptance Criteria + +The feature is shippable when: + +1. **Schema migrations 015–018 apply cleanly** on a fresh DB and on a DB seeded with the current production schema (no data loss, no broken FKs). +2. **`make test` is green** end-to-end (buf lint, codegen check, frontend typecheck/lint/vitest, backend tests with `-race`). +3. **All spec test cases pass** (§8 of the design doc): + - Trigger rule edge cases (idle/active/grandfathered/no-history/first-period) + - Idempotency (retry storm + multi-firing) + - Reversal paths (link, auto-activity, manual cancel guard) + - Token (sign/verify/tampering/expired/replay/invariant) + - Concurrent webhook race (`SELECT FOR UPDATE`) + - Cache drift correction +4. **Manual smoke test** (browser, per `CLAUDE.md`): banner appears on auto-reverse, dismisses cleanly, doesn't reappear on next load. +5. **Observability**: counters from §9 of the spec are emitted on each path. Verify via local OTEL exporter or by inspecting the metrics handler output. + +## Out of Scope (Per Spec §10) + +These are explicitly *not* implemented in this plan; revisit later if needed: +- Multiple subscriptions per user +- Annual subscriptions (rule generalizes; no special handling shipped) +- Settings toggle to opt out +- Cross-Spanda-product code library +- Cancel-window UI surface beyond the banner (e.g., billing-page indicator) +- Auto-prune of `auth_sessions` and `keep_link_token_uses` (boy-scout cleanup, not blocking) +- Generic webhook dedup adoption by other handlers (`invoice.paid`, etc.) +- `sub_current_period_end` cache column (we read end from `subscription.Update` response) From 7f87d0b4bce23761188f1172ad15c704e1caa2bc Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Thu, 7 May 2026 23:13:02 -0400 Subject: [PATCH 07/37] =?UTF-8?q?docs(spec):=20aippatch=20=E2=80=94=20appl?= =?UTF-8?q?y=20round-4=20review=20fixes?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round 4: opus + sonnet both rated "Approve with minor revisions". Both flagged the same M1 (UpdateAllWritable error-code inconsistency across four sections). Opus added M2 (codegen column-uniqueness) and several LOWs/NITs; sonnet added decode fd-nil-guard. MEDIUM - Reconcile UpdateAllWritable across all four sections that mentioned it: Errors section now lists CodeUnimplemented; Roadmap aligns with Apply walkthrough's defense-in-depth check; yaml-mapping prose says codegen is the primary rejection point. - Codegen step 10 added: verify column uniqueness across emitted bindings to catch overlapping override.column entries that would produce duplicate RETURNING and SET clauses Postgres rejects. - Codegen step 9 added explicitly: yaml empty_mask: update_writable is rejected at codegen, not just at runtime. LOW - decode() in Apply step 7 now guards fd == nil before passing to decode, symmetric with encode() in step 2. - AutoSet NOW() gotcha explained inline in the canonical yaml example (before, only in Testing strategy — readers who scrolled past would write a flaky test). - Op.Where doc strings now state values must be scalar / pgx-bindable; list comparators (IN, !=, <) deferred to v1+. - Generalize "call patches.InitPatches()" error message to "your generated InitPatches()" — the runtime is consumer-agnostic. - PK-without-proto-field case explained: column simply doesn't appear in Bindings; WHERE uses m.PK for identity regardless. NIT - Decisions row 8 now matches the wire-conformance "permanent per resource" claim (was contradictorily saying "relax later"). - "drop" in algorithm step 3.iii reworded to "emit no binding; field is registered as intentionally skipped" so the relationship to step 4 is unambiguous. - Risks #1 first-build cost wording softened from "~3 minutes" to "several minutes (varies by runner)". - Validate concurrency note added: callers serialize startup; sync.Once is left to the caller if needed. --- .../specs/2026-05-07-aippatch-design.md | 69 +++++++++++++++---- 1 file changed, 56 insertions(+), 13 deletions(-) diff --git a/docs/superpowers/specs/2026-05-07-aippatch-design.md b/docs/superpowers/specs/2026-05-07-aippatch-design.md index 60ac6f46..4fd29261 100644 --- a/docs/superpowers/specs/2026-05-07-aippatch-design.md +++ b/docs/superpowers/specs/2026-05-07-aippatch-design.md @@ -243,10 +243,14 @@ type Op[T proto.Message] struct { Message T // input proto carrying the new values; must be non-nil Mask *fieldmaskpb.FieldMask // which fields to apply PKValue any // value for the PK column (e.g. uuid.UUID) - Where map[string]any // optional extra equality predicates + Where map[string]any // optional extra equality predicates. // KEYS MUST BE BOUND COLUMNS — runtime validates // against m.bindingsByColumn before composing SQL. - // Values are pgx-parameterized; keys are NOT escaped. + // Keys are NOT escaped; values ARE pgx-parameterized. + // Values must be scalar / pgx-bindable; v0 framework + // supports equality only. List comparators (IN, !=, + // <, etc.) are deferred — passing a slice value + // produces SQL like `col = ARRAY[…]`, not `col IN (…)`. } // DBTX is the minimal pgx interface aippatch needs. It is a strict subset of @@ -283,6 +287,13 @@ func Apply[T proto.Message]( // no-panic-at-init rule. Idempotent up to the validated flag — repeated // calls return nil after the first success. On error, validated is left // false and the caller may retry after fixing the cause. +// +// Concurrency: callers should invoke Validate before serving any RPCs (the +// generated InitPatches() runs synchronously at startup). Two concurrent +// Validate calls on the same Mapping that both observe validated == false +// would each rebuild the indexes (harmless but redundant). v0 does not use +// sync.Once because real callers serialize startup; if strict-once +// semantics are needed, wrap InitPatches() in a sync.Once at the call site. func (m *Mapping[T]) Validate(codecs map[string]EnumCodec) error ``` @@ -314,6 +325,10 @@ All errors returned by `Apply` are `*connect.Error` with appropriate codes: in declared map, nil `op.Message`, `op.Where` key not in `bindingsByColumn`. - `CodeNotFound` — `UPDATE` matched zero rows (PK wrong, scope filter excluded the row, or row is soft-deleted). +- `CodeUnimplemented` — `EmptyMaskPolicy` is `UpdateAllWritable`. Codegen is + the primary defense (rejects `update_writable` in yaml in v0); this is a + defense-in-depth runtime check that fires only if a Mapping is constructed + by hand or by a future codegen version. - `CodeInternal` — pgx error, codec read-side data invariant violation, binding/proto desync that escaped boot-time `Validate`, or `Apply` called on an unvalidated `Mapping` (`InitPatches()` not invoked). @@ -408,7 +423,9 @@ and verify if desired). 2. Look up the SQL table from the schema; resolve the PK column. 3. For each proto field in the message, in **field-number order** (so diagnostic messages line up with the proto file's declaration order): - - If `overrides[field].skip` is true → drop. + - If `overrides[field].skip` is true → emit no binding; the field is + registered as intentionally skipped and is not flagged by step 4 + below. - If `overrides[field].column` set → use that column. - Else → snake-case name match with the SQL column list. - If no match → diagnostic: "field X has no matching column; suggest @@ -445,6 +462,16 @@ and verify if desired). 8. Sort `Bindings` alphabetically by `Proto` and `AutoSet` alphabetically by `Column` for **stable diff output** (different from step 3's processing order). + 9. Reject `empty_mask: update_writable` with a diagnostic — v0 does not + implement this policy. The runtime carries a defense-in-depth check + that returns `CodeUnimplemented`, but the yaml is the primary + enforcement point. + 10. Verify column uniqueness across emitted bindings: if two proto fields + (after applying overrides) bind to the same SQL column, emit a + diagnostic identifying both fields and the shared column. Without + this check, a misconfigured yaml could produce a duplicate + `RETURNING` list and a `SET` clause that Postgres rejects with + "multiple assignments to column". 4. Emit one Go file per resource, plus one `init.gen.go` that emits a shared `var Codecs = map[string]aippatch.EnumCodec{...}` registry and `InitPatches() error` calling `Validate(Codecs)` on each mapping. @@ -529,9 +556,15 @@ resources: table: users pk: id soft_delete: deleted_at - empty_mask: error # error → ErrorOnEmpty (default) | update_writable → UpdateAllWritable + empty_mask: error # error → ErrorOnEmpty (default; v0 only valid value) writable: [display_name] # deny-by-default auto_set: + # NOTE: NOW() is constant within a transaction. If a test runs + # SELECT-before, Apply, SELECT-after inside a single BeginFunc, the + # before/after timestamps will be equal. Test patterns that need to + # observe the bump must use clock_timestamp() instead, run the SELECTs + # outside the surrounding transaction, or compare against a captured + # NOW() bound (post >= captured). See *Testing strategy*. updated_at: NOW() # raw SQL, applied to every PATCH; pg_query_go-validated as expression overrides: create_time: { column: created_at } @@ -548,9 +581,11 @@ or binding. No yaml entry is needed for them. One file per repo. `~10–20` lines per resource. Reviewers see policy and mapping deltas in a single diff. Adding a writable field is one line. -The yaml-string `error` maps to the Go enum `ErrorOnEmpty`; -`update_writable` maps to `UpdateAllWritable` (v0 unsupported — codegen -emits a diagnostic until v1 lands). +The yaml-string `error` maps to the Go enum `ErrorOnEmpty`. The yaml-string +`update_writable` would map to `UpdateAllWritable`, but v0 codegen rejects +this value with a diagnostic (the runtime path is defense-in-depth only — +see *Errors* and Algorithm step 9). v1 will implement the policy and remove +the codegen rejection. ## Generated file shape @@ -590,7 +625,11 @@ var UserPatch = &aippatch.Mapping[*drillv1.User]{ The `id` column appears as a non-writable binding because the proto field `id` should round-trip in the response. The framework does not deduplicate PK from Bindings — PK identifies the row to update via `WHERE`, while -Bindings carries the read-back representation. +Bindings carries the read-back representation. If a future resource has a +PK column without a corresponding proto field (e.g. a synthetic table key +that's not exposed via the API), that column simply doesn't appear in +Bindings; `Returning(boundCols...)` won't include it; the WHERE clause still +uses `m.PK` for row identity. `internal/patches/init.gen.go`: @@ -662,7 +701,7 @@ func Apply[T proto.Message]( // 0. Sanity guards. if m == nil || !m.validated.Load() { - return zero, connectInternal("aippatch.Mapping not initialized; call patches.InitPatches() during startup") + return zero, connectInternal("aippatch.Mapping not initialized; call your generated InitPatches() (or Mapping.Validate) during startup") } // ProtoReflect().IsValid() returns false for typed-nil pointers and // un-initialized messages, sidestepping the typed-nil interface trap @@ -780,6 +819,9 @@ func Apply[T proto.Message]( b, ok := m.bindingsByColumn[string(c.Name)] if !ok { continue } fd := msg.Descriptor().Fields().ByName(protoreflect.Name(b.Proto)) + if fd == nil { + return zero, connectInternal("binding/proto desync on read: %q", b.Proto) + } if err := decode(msg, fd, vals[i], b.Codec, m.codecs); err != nil { return zero, connectInternal("decode %s: %w", b.Proto, err) } @@ -1014,7 +1056,7 @@ publishing the module.) | v1 | Pre/post hooks (or returned diff) for audit logging | `Apply` returns `(updated T, diff Diff, err error)` where Diff carries before/after for mask paths; handler emits audit events. | | v1 | Proto3 explicit-optional + NULL semantics | AIP-134 clearing rule (`mask path + zero value → NULL`); meaningful for `optional` fields. | | v1 | CHECK constraint extraction | Validate enum codec maps against `CHECK (col IN (…))` at codegen. | -| v1 | `UpdateAllWritable` empty-mask policy | Implement the "all populated/writable fields" path; until then v0 returns `Internal` if the policy is set. | +| v1 | `UpdateAllWritable` empty-mask policy | Implement the "all populated/writable fields" path. v0 codegen rejects `update_writable` in yaml; the runtime defense-in-depth check returns `CodeUnimplemented` if a Mapping is somehow constructed with this policy. | | v2 | Declarative validators | `NonEmptyTrimmed`, `LenBetween`, `URL`, `OneOf`. Per-resource yaml + handler-side composition. | | v2 | AIP-193 error mapping | pgx error inspection: `unique_violation` → `AlreadyExists`, `fk_violation` → `FailedPrecondition`, `not_null_violation` / `check_violation` → `InvalidArgument`. Per-resource override map. | | v3 | Per-field declarative authz | `admin_only_fields:` in yaml; layered with handler narrowing. | @@ -1036,8 +1078,9 @@ tiers ship. New features are opt-in via `aippatch.yaml`. the same. CI runners must have a C toolchain — drill's CI already does for testcontainers. **First-build cost:** `pg_query_go/v6` compiles part of the PostgreSQL parser from C source on first use; on a cold build - cache this can take ~3 minutes. CI runners should preserve `GOCACHE` - and `GOMODCACHE` across runs (drill's CI already does). + cache this can take several minutes (varies by runner). CI runners + should preserve `GOCACHE` and `GOMODCACHE` across runs (drill's CI + already does). 2. **pgx-native ↔ proto type drift.** New SQL types added to drill in the future may not be in the runtime's `decode` switch. Mitigation: @@ -1090,7 +1133,7 @@ tiers ship. New features are opt-in via `aippatch.yaml`. | 5 | Generated `*.gen.go` files committed to repo | Mapping is reviewable in PRs; CI checks for drift via `--check`. | | 6 | SQL builder: `huandu/go-sqlbuilder` (private to package) | Mature; `PostgreSQL.NewUpdateBuilder()` emits `$1` placeholders cleanly; `Returning(...)` is a first-class method. | | 7 | Row scan: direct `pgx.Rows.Values()` + proto reflection (no third-party scanner) | We populate a proto via reflection rather than a Go row struct; avoids an unnecessary dependency and a proto-aware shim. | -| 8 | Empty FieldMask rejected with `InvalidArgument` (default) | drill prefers explicit intent; documented divergence from AIP-134; relax later if a use case warrants. | +| 8 | Empty FieldMask rejected with `InvalidArgument` (default) | drill prefers explicit intent; documented divergence from AIP-134; permanent per resource once deployed (see *Wire conformance note*). | | 9 | Deny-by-default writable; opt in via `writable:` list | Security posture; consistent with AIP-134 §Update_Mask "must not allow output-only fields." | | 10 | Codegen errors on unsupported field types | Bad fields stop at codegen; runtime never sees a type it cannot handle. | | 11 | Framework reads back via `RETURNING ` (not `*`) and returns the populated proto | One round-trip; AIP-134 compliant; explicit column list excludes unmapped columns from the wire. (Bound non-writable columns are still returned — that is the AIP contract.) | From a3868a11f7dd360df77e0c0592ae9d136f026fb6 Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Thu, 7 May 2026 23:20:21 -0400 Subject: [PATCH 08/37] auth_sessions: soft-delete on logout (migration 015) Add revoked_at column to auth_sessions and index on (user_id, last_active DESC). Replace hard-delete (DeleteAuthSession, DeleteUserAuthSessions) with soft-revoke (RevokeAuthSession, RevokeUserAuthSessions) so session rows are preserved for the new GetUserLastActive activity query. Account deletion (DeleteAccount) retains hard-delete for GDPR data minimization. --- internal/backend/auth.go | 10 +- internal/backend/auth_test.go | 57 +++++++++- internal/db/auth_sessions.sql.go | 104 ++++++++++++------ internal/db/models.go | 17 +-- internal/db/querier.go | 19 +++- internal/db/users.sql.go | 6 +- .../015_auth_sessions_soft_delete.down.sql | 2 + .../015_auth_sessions_soft_delete.up.sql | 5 + sql/queries/auth_sessions.sql | 32 +++++- sql/queries/users.sql | 6 +- 10 files changed, 196 insertions(+), 62 deletions(-) create mode 100644 sql/migrations/015_auth_sessions_soft_delete.down.sql create mode 100644 sql/migrations/015_auth_sessions_soft_delete.up.sql diff --git a/internal/backend/auth.go b/internal/backend/auth.go index ef86ebec..d57c5483 100644 --- a/internal/backend/auth.go +++ b/internal/backend/auth.go @@ -258,8 +258,8 @@ func (b *Backend) Logout(ctx context.Context, sessionToken string) (err error) { session, err := queries.GetAuthSessionByToken(ctx, tokenHash) if err == nil { - if delErr := queries.DeleteAuthSession(ctx, session.ID); delErr != nil { - slog.Error("delete auth session", "error", delErr) + if revokeErr := queries.RevokeAuthSession(ctx, session.ID); revokeErr != nil { + slog.Error("revoke auth session", "error", revokeErr) } } return nil @@ -384,9 +384,9 @@ func (b *Backend) ResetPassword(ctx context.Context, p ResetPasswordParams) (err return fmt.Errorf("update password: %w", err) } - // Invalidate all sessions. - if err := queries.DeleteUserAuthSessions(ctx, userID); err != nil { - return fmt.Errorf("delete user sessions: %w", err) + // Revoke all sessions (soft-delete so activity history is preserved). + if err := queries.RevokeUserAuthSessions(ctx, userID); err != nil { + return fmt.Errorf("revoke user sessions: %w", err) } return nil } diff --git a/internal/backend/auth_test.go b/internal/backend/auth_test.go index 5dff2312..3169fd30 100644 --- a/internal/backend/auth_test.go +++ b/internal/backend/auth_test.go @@ -11,6 +11,7 @@ import ( "github.com/btc/drill/internal/auth" "github.com/btc/drill/internal/backend" + "github.com/btc/drill/internal/backendtest" "github.com/btc/drill/internal/db" "github.com/btc/drill/internal/jobs" ) @@ -274,11 +275,12 @@ func TestLogout_Success(t *testing.T) { err = b.Logout(ctx, loginRes.Token) require.NoError(t, err) - // Session should be deleted -- lookup by hash should fail. + // Session should be revoked (soft-deleted) -- lookup by token should fail + // because GetAuthSessionByToken filters on revoked_at IS NULL. tokenHash := auth.HashSessionToken(loginRes.Token) queries := db.New(b.Pool()) _, err = queries.GetAuthSessionByToken(ctx, tokenHash) - require.Error(t, err) // pgx.ErrNoRows + require.Error(t, err) // pgx.ErrNoRows — revoked session is invisible } func TestLogout_NonExistentToken(t *testing.T) { @@ -291,6 +293,55 @@ func TestLogout_NonExistentToken(t *testing.T) { require.NoError(t, err) } +func TestLogout_SoftDeletes_PreservesActivityHistory(t *testing.T) { + t.Parallel() + b := pg.NewBackend(t) + ctx := context.Background() + + signupUser(t, b, "softlogout@example.com", "strongpass1", "SoftLogout") + loginRes, err := b.Login(ctx, backend.LoginParams{ + Email: "softlogout@example.com", + Password: "strongpass1", + IP: "127.0.0.1:1234", + }) + require.NoError(t, err) + + // Logout should soft-delete the session. + require.NoError(t, b.Logout(ctx, loginRes.Token)) + + // GetAuthSessionByToken should no longer find it (revoked_at IS NULL filter). + tokenHash := auth.HashSessionToken(loginRes.Token) + queries := db.New(b.Pool()) + _, err = queries.GetAuthSessionByToken(ctx, tokenHash) + require.Error(t, err, "revoked session should not be returned by GetAuthSessionByToken") + + // The session row should still exist with revoked_at set (soft delete). + var revokedAt pgtype.Timestamptz + err = b.Pool().QueryRow(ctx, + `SELECT revoked_at FROM auth_sessions WHERE user_id = $1`, + loginRes.UserID).Scan(&revokedAt) + require.NoError(t, err) + require.True(t, revokedAt.Valid, "revoked_at should be set after logout") + + // GetUserLastActive should still return a value (reads across revoked sessions). + last, err := queries.GetUserLastActive(ctx, loginRes.UserID) + require.NoError(t, err) + require.True(t, last.Valid, "last_active should have a value across revoked sessions") +} + +func TestGetUserLastActive_NoSessions(t *testing.T) { + t.Parallel() + b := pg.NewBackend(t) + ctx := context.Background() + // SeedUser calls Signup only — no auth_sessions row is created, so no + // DELETE needed to reach the "zero sessions" precondition. + userID := backendtest.SeedUser(t, b) + + last, err := db.New(b.Pool()).GetUserLastActive(ctx, userID) + require.NoError(t, err) + require.False(t, last.Valid, "MAX over zero rows should be NULL") +} + // --------------------------------------------------------------------------- // VerifyEmail // --------------------------------------------------------------------------- @@ -418,7 +469,7 @@ func TestResetPassword_Success(t *testing.T) { tokenHash := auth.HashSessionToken(loginRes.Token) queries := db.New(b.Pool()) _, err = queries.GetAuthSessionByToken(ctx, tokenHash) - require.Error(t, err) // session deleted + require.Error(t, err) // session revoked (soft-deleted) — invisible to token lookup // Can login with new password. newLoginRes, err := b.Login(ctx, backend.LoginParams{ diff --git a/internal/db/auth_sessions.sql.go b/internal/db/auth_sessions.sql.go index 13797f27..0de91f79 100644 --- a/internal/db/auth_sessions.sql.go +++ b/internal/db/auth_sessions.sql.go @@ -17,7 +17,7 @@ import ( const createAuthSession = `-- name: CreateAuthSession :one INSERT INTO auth_sessions (user_id, token_hash, expires_at, ip_address, user_agent) VALUES ($1, $2, $3, $4, $5) -RETURNING id, user_id, token_hash, expires_at, last_active, ip_address, user_agent, created_at +RETURNING id, user_id, token_hash, expires_at, last_active, ip_address, user_agent, created_at, revoked_at ` type CreateAuthSessionParams struct { @@ -46,55 +46,42 @@ func (q *Queries) CreateAuthSession(ctx context.Context, arg CreateAuthSessionPa &i.IpAddress, &i.UserAgent, &i.CreatedAt, + &i.RevokedAt, ) return i, err } -const deleteAuthSession = `-- name: DeleteAuthSession :exec -DELETE FROM auth_sessions WHERE id = $1 -` - -func (q *Queries) DeleteAuthSession(ctx context.Context, id uuid.UUID) error { - _, err := q.db.Exec(ctx, deleteAuthSession, id) - return err -} - -const deleteUserAuthSessions = `-- name: DeleteUserAuthSessions :exec -DELETE FROM auth_sessions WHERE user_id = $1 -` - -func (q *Queries) DeleteUserAuthSessions(ctx context.Context, userID uuid.UUID) error { - _, err := q.db.Exec(ctx, deleteUserAuthSessions, userID) - return err -} - const getAuthSessionByToken = `-- name: GetAuthSessionByToken :one -SELECT s.id, s.user_id, s.token_hash, s.expires_at, s.last_active, s.ip_address, s.user_agent, s.created_at, u.email, u.display_name, u.role, u.plan, u.email_verified, +SELECT s.id, s.user_id, s.token_hash, s.expires_at, s.last_active, s.ip_address, s.user_agent, s.created_at, s.revoked_at, u.email, u.display_name, u.role, u.plan, u.email_verified, u.created_at AS user_created_at FROM auth_sessions s JOIN users u ON u.id = s.user_id WHERE s.token_hash = $1 AND s.expires_at > NOW() + AND s.revoked_at IS NULL AND u.deleted_at IS NULL ` type GetAuthSessionByTokenRow struct { - ID uuid.UUID `json:"id"` - UserID uuid.UUID `json:"user_id"` - TokenHash string `json:"token_hash"` - ExpiresAt time.Time `json:"expires_at"` - LastActive time.Time `json:"last_active"` - IpAddress *netip.Addr `json:"ip_address"` - UserAgent pgtype.Text `json:"user_agent"` - CreatedAt time.Time `json:"created_at"` - Email string `json:"email"` - DisplayName string `json:"display_name"` - Role string `json:"role"` - Plan string `json:"plan"` - EmailVerified bool `json:"email_verified"` - UserCreatedAt time.Time `json:"user_created_at"` + ID uuid.UUID `json:"id"` + UserID uuid.UUID `json:"user_id"` + TokenHash string `json:"token_hash"` + ExpiresAt time.Time `json:"expires_at"` + LastActive time.Time `json:"last_active"` + IpAddress *netip.Addr `json:"ip_address"` + UserAgent pgtype.Text `json:"user_agent"` + CreatedAt time.Time `json:"created_at"` + RevokedAt pgtype.Timestamptz `json:"revoked_at"` + Email string `json:"email"` + DisplayName string `json:"display_name"` + Role string `json:"role"` + Plan string `json:"plan"` + EmailVerified bool `json:"email_verified"` + UserCreatedAt time.Time `json:"user_created_at"` } +// Filters out soft-revoked sessions; only returns valid live sessions. +// (Activity queries do NOT filter on revoked_at — see GetUserLastActive.) func (q *Queries) GetAuthSessionByToken(ctx context.Context, tokenHash string) (GetAuthSessionByTokenRow, error) { row := q.db.QueryRow(ctx, getAuthSessionByToken, tokenHash) var i GetAuthSessionByTokenRow @@ -107,6 +94,7 @@ func (q *Queries) GetAuthSessionByToken(ctx context.Context, tokenHash string) ( &i.IpAddress, &i.UserAgent, &i.CreatedAt, + &i.RevokedAt, &i.Email, &i.DisplayName, &i.Role, @@ -117,6 +105,54 @@ func (q *Queries) GetAuthSessionByToken(ctx context.Context, tokenHash string) ( return i, err } +const getUserLastActive = `-- name: GetUserLastActive :one +SELECT a.last_active +FROM (SELECT 1) AS _placeholder +LEFT JOIN auth_sessions a ON a.user_id = $1 +ORDER BY a.last_active DESC +LIMIT 1 +` + +// Reads across all history (including revoked sessions). Returns NULL +// when the user has no auth_sessions rows. +// +// The LEFT-JOIN-from-placeholder shape (rather than the simpler MAX()) is +// a workaround: in sqlc 1.25 with pgx/v5, MAX(timestamptz_not_null) is +// generated as a non-nullable time.Time even though the SQL semantics are +// "NULL when zero rows match." LEFT JOIN forces sqlc's nullability +// inference correctly. If a future sqlc version makes MAX() nullable for +// this case, simplify back to: +// +// SELECT MAX(last_active)::timestamptz FROM auth_sessions WHERE user_id = $1; +func (q *Queries) GetUserLastActive(ctx context.Context, userID uuid.UUID) (pgtype.Timestamptz, error) { + row := q.db.QueryRow(ctx, getUserLastActive, userID) + var last_active pgtype.Timestamptz + err := row.Scan(&last_active) + return last_active, err +} + +const revokeAuthSession = `-- name: RevokeAuthSession :exec +UPDATE auth_sessions SET revoked_at = NOW() +WHERE id = $1 AND revoked_at IS NULL +` + +// Soft-delete: marks the row revoked but preserves it for the activity query. +func (q *Queries) RevokeAuthSession(ctx context.Context, id uuid.UUID) error { + _, err := q.db.Exec(ctx, revokeAuthSession, id) + return err +} + +const revokeUserAuthSessions = `-- name: RevokeUserAuthSessions :exec +UPDATE auth_sessions SET revoked_at = NOW() +WHERE user_id = $1 AND revoked_at IS NULL +` + +// Soft-delete every active session for a user (logout-everywhere). +func (q *Queries) RevokeUserAuthSessions(ctx context.Context, userID uuid.UUID) error { + _, err := q.db.Exec(ctx, revokeUserAuthSessions, userID) + return err +} + const touchAuthSession = `-- name: TouchAuthSession :exec UPDATE auth_sessions SET last_active = NOW() WHERE id = $1 diff --git a/internal/db/models.go b/internal/db/models.go index fde20198..40e54668 100644 --- a/internal/db/models.go +++ b/internal/db/models.go @@ -22,14 +22,15 @@ type Annotation struct { } type AuthSession struct { - ID uuid.UUID `json:"id"` - UserID uuid.UUID `json:"user_id"` - TokenHash string `json:"token_hash"` - ExpiresAt time.Time `json:"expires_at"` - LastActive time.Time `json:"last_active"` - IpAddress *netip.Addr `json:"ip_address"` - UserAgent pgtype.Text `json:"user_agent"` - CreatedAt time.Time `json:"created_at"` + ID uuid.UUID `json:"id"` + UserID uuid.UUID `json:"user_id"` + TokenHash string `json:"token_hash"` + ExpiresAt time.Time `json:"expires_at"` + LastActive time.Time `json:"last_active"` + IpAddress *netip.Addr `json:"ip_address"` + UserAgent pgtype.Text `json:"user_agent"` + CreatedAt time.Time `json:"created_at"` + RevokedAt pgtype.Timestamptz `json:"revoked_at"` } type CoachAnalysis struct { diff --git a/internal/db/querier.go b/internal/db/querier.go index c6126472..d4ff52d6 100644 --- a/internal/db/querier.go +++ b/internal/db/querier.go @@ -44,11 +44,11 @@ type Querier interface { // grants_created=0 for duplicate event. CreateSubscriptionGrant(ctx context.Context, arg CreateSubscriptionGrantParams) (CreateSubscriptionGrantRow, error) CreateUser(ctx context.Context, arg CreateUserParams) (User, error) - // Soft-deletes the user and wipes all auth sessions in one round-trip. - // Idempotent: re-calling on a deleted user is a no-op on the user row. + // Soft-deletes the user and hard-deletes all auth sessions in one round-trip + // (data minimization / GDPR). Logout uses RevokeAuthSession / RevokeUserAuthSessions + // in auth_sessions.sql instead. Idempotent: re-calling on a deleted user is a + // no-op on the user row. DeleteAccount(ctx context.Context, id uuid.UUID) error - DeleteAuthSession(ctx context.Context, id uuid.UUID) error - DeleteUserAuthSessions(ctx context.Context, userID uuid.UUID) error // Creates the one-time free trial grant + ledger entry atomically. Called // only at account creation (Signup, OAuthLogin). If the grant already exists // (ON CONFLICT), both the INSERT and the ledger SELECT produce zero rows — a @@ -60,6 +60,8 @@ type Querier interface { // Idempotent: returns 0 rows if session_refund ledger entries already exist. FullRefundSessionMinutes(ctx context.Context, arg FullRefundSessionMinutesParams) ([]FullRefundSessionMinutesRow, error) GetAnnotationsByEvaluation(ctx context.Context, evaluationID uuid.UUID) ([]GetAnnotationsByEvaluationRow, error) + // Filters out soft-revoked sessions; only returns valid live sessions. + // (Activity queries do NOT filter on revoked_at — see GetUserLastActive.) GetAuthSessionByToken(ctx context.Context, tokenHash string) (GetAuthSessionByTokenRow, error) // Single query to fetch all billing state needed for entitlement checks. // Fetch inside the caller's transaction to avoid TOCTOU. @@ -91,6 +93,11 @@ type Querier interface { GetUserByEmailIncludingDeleted(ctx context.Context, email string) (User, error) GetUserByID(ctx context.Context, id uuid.UUID) (User, error) GetUserByIDIncludingDeleted(ctx context.Context, id uuid.UUID) (User, error) + // Reads across all history (including revoked sessions) — this is a + // "when did the user last act, ever?" query, not a session-validity check. + // Returns NULL when the user has no auth_sessions rows. The LEFT JOIN + // from a single-row subquery forces sqlc to infer a nullable result type. + GetUserLastActive(ctx context.Context, userID uuid.UUID) (pgtype.Timestamptz, error) GetUserUsageSummary(ctx context.Context, userID uuid.UUID) (GetUserUsageSummaryRow, error) IncrementFreeEducatorUsed(ctx context.Context, arg IncrementFreeEducatorUsedParams) (int32, error) // Inline crash recovery for EndSession/CancelSession. @@ -123,6 +130,10 @@ type Querier interface { // Returns one row per grant debited. Returns zero rows if balance is insufficient // (all-or-nothing: no mutations occur when balance < requested). ReserveMinutes(ctx context.Context, arg ReserveMinutesParams) ([]ReserveMinutesRow, error) + // Soft-delete: marks the row revoked but preserves it for the activity query. + RevokeAuthSession(ctx context.Context, id uuid.UUID) error + // Soft-delete every active session for a user (logout-everywhere). + RevokeUserAuthSessions(ctx context.Context, userID uuid.UUID) error SetAudioURL(ctx context.Context, arg SetAudioURLParams) error SetQuestionImageURL(ctx context.Context, arg SetQuestionImageURLParams) error SoftDeleteUser(ctx context.Context, id uuid.UUID) error diff --git a/internal/db/users.sql.go b/internal/db/users.sql.go index 236059c7..6f7d2cb3 100644 --- a/internal/db/users.sql.go +++ b/internal/db/users.sql.go @@ -115,8 +115,10 @@ WITH soft_delete AS ( DELETE FROM auth_sessions WHERE user_id = $1 ` -// Soft-deletes the user and wipes all auth sessions in one round-trip. -// Idempotent: re-calling on a deleted user is a no-op on the user row. +// Soft-deletes the user and hard-deletes all auth sessions in one round-trip +// (data minimization / GDPR). Logout uses RevokeAuthSession / RevokeUserAuthSessions +// in auth_sessions.sql instead. Idempotent: re-calling on a deleted user is a +// no-op on the user row. func (q *Queries) DeleteAccount(ctx context.Context, id uuid.UUID) error { _, err := q.db.Exec(ctx, deleteAccount, id) return err diff --git a/sql/migrations/015_auth_sessions_soft_delete.down.sql b/sql/migrations/015_auth_sessions_soft_delete.down.sql new file mode 100644 index 00000000..a82b685f --- /dev/null +++ b/sql/migrations/015_auth_sessions_soft_delete.down.sql @@ -0,0 +1,2 @@ +DROP INDEX IF EXISTS idx_auth_sessions_user_last_active; +ALTER TABLE auth_sessions DROP COLUMN revoked_at; diff --git a/sql/migrations/015_auth_sessions_soft_delete.up.sql b/sql/migrations/015_auth_sessions_soft_delete.up.sql new file mode 100644 index 00000000..0832662f --- /dev/null +++ b/sql/migrations/015_auth_sessions_soft_delete.up.sql @@ -0,0 +1,5 @@ +ALTER TABLE auth_sessions + ADD COLUMN revoked_at TIMESTAMPTZ; + +CREATE INDEX idx_auth_sessions_user_last_active + ON auth_sessions(user_id, last_active DESC); diff --git a/sql/queries/auth_sessions.sql b/sql/queries/auth_sessions.sql index accceb58..474868fe 100644 --- a/sql/queries/auth_sessions.sql +++ b/sql/queries/auth_sessions.sql @@ -4,20 +4,44 @@ VALUES ($1, $2, $3, $4, $5) RETURNING *; -- name: GetAuthSessionByToken :one +-- Filters out soft-revoked sessions; only returns valid live sessions. +-- (Activity queries do NOT filter on revoked_at — see GetUserLastActive.) SELECT s.*, u.email, u.display_name, u.role, u.plan, u.email_verified, u.created_at AS user_created_at FROM auth_sessions s JOIN users u ON u.id = s.user_id WHERE s.token_hash = $1 AND s.expires_at > NOW() + AND s.revoked_at IS NULL AND u.deleted_at IS NULL; -- name: TouchAuthSession :exec UPDATE auth_sessions SET last_active = NOW() WHERE id = $1; --- name: DeleteAuthSession :exec -DELETE FROM auth_sessions WHERE id = $1; +-- name: RevokeAuthSession :exec +-- Soft-delete: marks the row revoked but preserves it for the activity query. +UPDATE auth_sessions SET revoked_at = NOW() +WHERE id = $1 AND revoked_at IS NULL; --- name: DeleteUserAuthSessions :exec -DELETE FROM auth_sessions WHERE user_id = $1; +-- name: RevokeUserAuthSessions :exec +-- Soft-delete every active session for a user (logout-everywhere). +UPDATE auth_sessions SET revoked_at = NOW() +WHERE user_id = $1 AND revoked_at IS NULL; + +-- name: GetUserLastActive :one +-- Reads across all history (including revoked sessions). Returns NULL +-- when the user has no auth_sessions rows. +-- +-- The LEFT-JOIN-from-placeholder shape (rather than the simpler MAX()) is +-- a workaround: in sqlc 1.25 with pgx/v5, MAX(timestamptz_not_null) is +-- generated as a non-nullable time.Time even though the SQL semantics are +-- "NULL when zero rows match." LEFT JOIN forces sqlc's nullability +-- inference correctly. If a future sqlc version makes MAX() nullable for +-- this case, simplify back to: +-- SELECT MAX(last_active)::timestamptz FROM auth_sessions WHERE user_id = $1; +SELECT a.last_active +FROM (SELECT 1) AS _placeholder +LEFT JOIN auth_sessions a ON a.user_id = $1 +ORDER BY a.last_active DESC +LIMIT 1; diff --git a/sql/queries/users.sql b/sql/queries/users.sql index b4c80254..bd166591 100644 --- a/sql/queries/users.sql +++ b/sql/queries/users.sql @@ -54,8 +54,10 @@ WHERE id = $1 AND free_full_educators_used < $2 RETURNING free_full_educators_used; -- name: DeleteAccount :exec --- Soft-deletes the user and wipes all auth sessions in one round-trip. --- Idempotent: re-calling on a deleted user is a no-op on the user row. +-- Soft-deletes the user and hard-deletes all auth sessions in one round-trip +-- (data minimization / GDPR). Logout uses RevokeAuthSession / RevokeUserAuthSessions +-- in auth_sessions.sql instead. Idempotent: re-calling on a deleted user is a +-- no-op on the user row. WITH soft_delete AS ( UPDATE users SET deleted_at = NOW(), updated_at = NOW() WHERE id = @id AND deleted_at IS NULL From bf758e3350f4ccf87fec69865b2b1a1cba9b2f1a Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Fri, 8 May 2026 00:19:11 -0400 Subject: [PATCH 09/37] users: sub state cache columns (migration 016) --- internal/db/models.go | 6 + internal/db/querier.go | 34 ++- internal/db/users.sql.go | 222 +++++++++++++++++++- sql/migrations/016_users_sub_state.down.sql | 7 + sql/migrations/016_users_sub_state.up.sql | 7 + sql/queries/users.sql | 65 ++++++ 6 files changed, 328 insertions(+), 13 deletions(-) create mode 100644 sql/migrations/016_users_sub_state.down.sql create mode 100644 sql/migrations/016_users_sub_state.up.sql diff --git a/internal/db/models.go b/internal/db/models.go index 40e54668..e0757663 100644 --- a/internal/db/models.go +++ b/internal/db/models.go @@ -177,6 +177,12 @@ type User struct { UpdatedAt time.Time `json:"updated_at"` DeletedAt pgtype.Timestamptz `json:"deleted_at"` FreeFullEducatorsUsed int32 `json:"free_full_educators_used"` + StripeSubscriptionID pgtype.Text `json:"stripe_subscription_id"` + SubCancelAtPeriodEnd bool `json:"sub_cancel_at_period_end"` + SubCancelIsAuto bool `json:"sub_cancel_is_auto"` + SubCurrentPeriodStart pgtype.Timestamptz `json:"sub_current_period_start"` + PendingKeptBanner bool `json:"pending_kept_banner"` + IdleEligibleAfter time.Time `json:"idle_eligible_after"` } type UserEvent struct { diff --git a/internal/db/querier.go b/internal/db/querier.go index d4ff52d6..37ddd3bf 100644 --- a/internal/db/querier.go +++ b/internal/db/querier.go @@ -22,6 +22,14 @@ type Querier interface { CancelSession(ctx context.Context, arg CancelSessionParams) error // Reset sessions stuck in 'generating' for too long (crash recovery). CleanupStaleGenerating(ctx context.Context) error + ClearKeptBanner(ctx context.Context, id uuid.UUID) error + // Periodic hygiene: clear banners that are older than 14 days. + // (Run from a tiny daily cron; the spec calls this out as low-priority cleanup.) + ClearStaleKeptBanners(ctx context.Context) error + // Called by handleSubscriptionDeleted. Clears all sub state and downgrades plan. + ClearSubStateOnDeletion(ctx context.Context, id uuid.UUID) error + // Called by KeepSubscription / AutoReverse on reversal. Sets banner flag. + ClearUserAutoCancelState(ctx context.Context, arg ClearUserAutoCancelStateParams) error // Batch-completes abandoned sessions that have at least one candidate message. // These are real interviews that the user forgot to end. CompleteAbandonedActiveSessions(ctx context.Context) ([]CompleteAbandonedActiveSessionsRow, error) @@ -88,15 +96,23 @@ type Querier interface { GetSessionForTurn(ctx context.Context, id uuid.UUID) (GetSessionForTurnRow, error) // Lightweight status check for EndSession/CancelSession polling. GetSessionStatus(ctx context.Context, id uuid.UUID) (GetSessionStatusRow, error) + // Used by AutoReverse to defensively re-read both gates. + GetUserAutoCancelGates(ctx context.Context, id uuid.UUID) (GetUserAutoCancelGatesRow, error) GetUserByEmail(ctx context.Context, email string) (User, error) GetUserByEmailForUpdate(ctx context.Context, email string) (User, error) GetUserByEmailIncludingDeleted(ctx context.Context, email string) (User, error) GetUserByID(ctx context.Context, id uuid.UUID) (User, error) GetUserByIDIncludingDeleted(ctx context.Context, id uuid.UUID) (User, error) - // Reads across all history (including revoked sessions) — this is a - // "when did the user last act, ever?" query, not a session-validity check. - // Returns NULL when the user has no auth_sessions rows. The LEFT JOIN - // from a single-row subquery forces sqlc to infer a nullable result type. + // Reads across all history (including revoked sessions). Returns NULL + // when the user has no auth_sessions rows. + // + // The LEFT-JOIN-from-placeholder shape (rather than the simpler MAX()) is + // a workaround: in sqlc 1.25 with pgx/v5, MAX(timestamptz_not_null) is + // generated as a non-nullable time.Time even though the SQL semantics are + // "NULL when zero rows match." LEFT JOIN forces sqlc's nullability + // inference correctly. If a future sqlc version makes MAX() nullable for + // this case, simplify back to: + // SELECT MAX(last_active)::timestamptz FROM auth_sessions WHERE user_id = $1; GetUserLastActive(ctx context.Context, userID uuid.UUID) (pgtype.Timestamptz, error) GetUserUsageSummary(ctx context.Context, userID uuid.UUID) (GetUserUsageSummaryRow, error) IncrementFreeEducatorUsed(ctx context.Context, arg IncrementFreeEducatorUsedParams) (int32, error) @@ -118,6 +134,10 @@ type Querier interface { ListQuestionsWithoutImages(ctx context.Context) ([]uuid.UUID, error) ListSeedQuestions(ctx context.Context) ([]ListSeedQuestionsRow, error) ListSessionsByUser(ctx context.Context, userID uuid.UUID) ([]ListSessionsByUserRow, error) + // Per-user mutex for the cancel transaction. Acquires a row-level lock that + // serializes concurrent invoice.upcoming evaluations for the same user. + // Must be inside a transaction; releases on COMMIT or ROLLBACK. + LockUserForSubDecision(ctx context.Context, id uuid.UUID) (uuid.UUID, error) MarkSessionCompleted(ctx context.Context, id uuid.UUID) error ReactivateUser(ctx context.Context, id uuid.UUID) error // Refunds unused minutes for a completed session based on wall-clock duration. @@ -136,7 +156,13 @@ type Querier interface { RevokeUserAuthSessions(ctx context.Context, userID uuid.UUID) error SetAudioURL(ctx context.Context, arg SetAudioURLParams) error SetQuestionImageURL(ctx context.Context, arg SetQuestionImageURLParams) error + // Called by idleunsub.HandleInvoiceUpcoming when our trigger fires. + // Sets BOTH cache flags so the auto-reverse middleware gate fires for this user. + SetUserAutoCancelState(ctx context.Context, arg SetUserAutoCancelStateParams) error SoftDeleteUser(ctx context.Context, id uuid.UUID) error + // Called by handleSubscriptionUpdated. Does NOT touch sub_cancel_is_auto: + // only our handler sets that flag; webhook sync must not overwrite it. + SyncSubStateFromWebhook(ctx context.Context, arg SyncSubStateFromWebhookParams) error TouchAuthSession(ctx context.Context, id uuid.UUID) error UpdateEducatorAnalysisContent(ctx context.Context, arg UpdateEducatorAnalysisContentParams) error UpdateEducatorAnalysisStatus(ctx context.Context, arg UpdateEducatorAnalysisStatusParams) error diff --git a/internal/db/users.sql.go b/internal/db/users.sql.go index 6f7d2cb3..229e0e94 100644 --- a/internal/db/users.sql.go +++ b/internal/db/users.sql.go @@ -12,10 +12,73 @@ import ( "github.com/jackc/pgx/v5/pgtype" ) +const clearKeptBanner = `-- name: ClearKeptBanner :exec +UPDATE users SET pending_kept_banner = FALSE WHERE id = $1 +` + +func (q *Queries) ClearKeptBanner(ctx context.Context, id uuid.UUID) error { + _, err := q.db.Exec(ctx, clearKeptBanner, id) + return err +} + +const clearStaleKeptBanners = `-- name: ClearStaleKeptBanners :exec +UPDATE users +SET pending_kept_banner = FALSE +WHERE pending_kept_banner = TRUE + AND id IN ( + SELECT user_id FROM user_events + WHERE event_type = 'subscription_kept' + AND created_at < NOW() - INTERVAL '14 days' + ) +` + +// Periodic hygiene: clear banners that are older than 14 days. +// (Run from a tiny daily cron; the spec calls this out as low-priority cleanup.) +func (q *Queries) ClearStaleKeptBanners(ctx context.Context) error { + _, err := q.db.Exec(ctx, clearStaleKeptBanners) + return err +} + +const clearSubStateOnDeletion = `-- name: ClearSubStateOnDeletion :exec +UPDATE users +SET stripe_subscription_id = NULL, + sub_cancel_at_period_end = FALSE, + sub_cancel_is_auto = FALSE, + sub_current_period_start = NULL, + plan = 'free' +WHERE id = $1 +` + +// Called by handleSubscriptionDeleted. Clears all sub state and downgrades plan. +func (q *Queries) ClearSubStateOnDeletion(ctx context.Context, id uuid.UUID) error { + _, err := q.db.Exec(ctx, clearSubStateOnDeletion, id) + return err +} + +const clearUserAutoCancelState = `-- name: ClearUserAutoCancelState :exec +UPDATE users +SET sub_cancel_at_period_end = FALSE, + sub_cancel_is_auto = FALSE, + pending_kept_banner = TRUE, + sub_current_period_start = $2 +WHERE id = $1 +` + +type ClearUserAutoCancelStateParams struct { + ID uuid.UUID `json:"id"` + SubCurrentPeriodStart pgtype.Timestamptz `json:"sub_current_period_start"` +} + +// Called by KeepSubscription / AutoReverse on reversal. Sets banner flag. +func (q *Queries) ClearUserAutoCancelState(ctx context.Context, arg ClearUserAutoCancelStateParams) error { + _, err := q.db.Exec(ctx, clearUserAutoCancelState, arg.ID, arg.SubCurrentPeriodStart) + return err +} + const createOAuthUser = `-- name: CreateOAuthUser :one INSERT INTO users (email, email_verified, display_name) VALUES ($1, TRUE, $2) -RETURNING id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used +RETURNING id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used, stripe_subscription_id, sub_cancel_at_period_end, sub_cancel_is_auto, sub_current_period_start, pending_kept_banner, idle_eligible_after ` type CreateOAuthUserParams struct { @@ -39,6 +102,12 @@ func (q *Queries) CreateOAuthUser(ctx context.Context, arg CreateOAuthUserParams &i.UpdatedAt, &i.DeletedAt, &i.FreeFullEducatorsUsed, + &i.StripeSubscriptionID, + &i.SubCancelAtPeriodEnd, + &i.SubCancelIsAuto, + &i.SubCurrentPeriodStart, + &i.PendingKeptBanner, + &i.IdleEligibleAfter, ) return i, err } @@ -47,7 +116,7 @@ const createOAuthUserOrNoop = `-- name: CreateOAuthUserOrNoop :one INSERT INTO users (email, email_verified, display_name) VALUES ($1, TRUE, $2) ON CONFLICT (email) DO NOTHING -RETURNING id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used +RETURNING id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used, stripe_subscription_id, sub_cancel_at_period_end, sub_cancel_is_auto, sub_current_period_start, pending_kept_banner, idle_eligible_after ` type CreateOAuthUserOrNoopParams struct { @@ -71,6 +140,12 @@ func (q *Queries) CreateOAuthUserOrNoop(ctx context.Context, arg CreateOAuthUser &i.UpdatedAt, &i.DeletedAt, &i.FreeFullEducatorsUsed, + &i.StripeSubscriptionID, + &i.SubCancelAtPeriodEnd, + &i.SubCancelIsAuto, + &i.SubCurrentPeriodStart, + &i.PendingKeptBanner, + &i.IdleEligibleAfter, ) return i, err } @@ -78,7 +153,7 @@ func (q *Queries) CreateOAuthUserOrNoop(ctx context.Context, arg CreateOAuthUser const createUser = `-- name: CreateUser :one INSERT INTO users (email, password_hash, display_name) VALUES ($1, $2, $3) -RETURNING id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used +RETURNING id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used, stripe_subscription_id, sub_cancel_at_period_end, sub_cancel_is_auto, sub_current_period_start, pending_kept_banner, idle_eligible_after ` type CreateUserParams struct { @@ -103,6 +178,12 @@ func (q *Queries) CreateUser(ctx context.Context, arg CreateUserParams) (User, e &i.UpdatedAt, &i.DeletedAt, &i.FreeFullEducatorsUsed, + &i.StripeSubscriptionID, + &i.SubCancelAtPeriodEnd, + &i.SubCancelIsAuto, + &i.SubCurrentPeriodStart, + &i.PendingKeptBanner, + &i.IdleEligibleAfter, ) return i, err } @@ -124,8 +205,35 @@ func (q *Queries) DeleteAccount(ctx context.Context, id uuid.UUID) error { return err } +const getUserAutoCancelGates = `-- name: GetUserAutoCancelGates :one +SELECT sub_cancel_at_period_end, sub_cancel_is_auto, + stripe_subscription_id, sub_current_period_start +FROM users +WHERE id = $1 +` + +type GetUserAutoCancelGatesRow struct { + SubCancelAtPeriodEnd bool `json:"sub_cancel_at_period_end"` + SubCancelIsAuto bool `json:"sub_cancel_is_auto"` + StripeSubscriptionID pgtype.Text `json:"stripe_subscription_id"` + SubCurrentPeriodStart pgtype.Timestamptz `json:"sub_current_period_start"` +} + +// Used by AutoReverse to defensively re-read both gates. +func (q *Queries) GetUserAutoCancelGates(ctx context.Context, id uuid.UUID) (GetUserAutoCancelGatesRow, error) { + row := q.db.QueryRow(ctx, getUserAutoCancelGates, id) + var i GetUserAutoCancelGatesRow + err := row.Scan( + &i.SubCancelAtPeriodEnd, + &i.SubCancelIsAuto, + &i.StripeSubscriptionID, + &i.SubCurrentPeriodStart, + ) + return i, err +} + const getUserByEmail = `-- name: GetUserByEmail :one -SELECT id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used FROM users +SELECT id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used, stripe_subscription_id, sub_cancel_at_period_end, sub_cancel_is_auto, sub_current_period_start, pending_kept_banner, idle_eligible_after FROM users WHERE email = $1 AND deleted_at IS NULL ` @@ -145,12 +253,18 @@ func (q *Queries) GetUserByEmail(ctx context.Context, email string) (User, error &i.UpdatedAt, &i.DeletedAt, &i.FreeFullEducatorsUsed, + &i.StripeSubscriptionID, + &i.SubCancelAtPeriodEnd, + &i.SubCancelIsAuto, + &i.SubCurrentPeriodStart, + &i.PendingKeptBanner, + &i.IdleEligibleAfter, ) return i, err } const getUserByEmailForUpdate = `-- name: GetUserByEmailForUpdate :one -SELECT id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used FROM users +SELECT id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used, stripe_subscription_id, sub_cancel_at_period_end, sub_cancel_is_auto, sub_current_period_start, pending_kept_banner, idle_eligible_after FROM users WHERE email = $1 FOR UPDATE ` @@ -171,12 +285,18 @@ func (q *Queries) GetUserByEmailForUpdate(ctx context.Context, email string) (Us &i.UpdatedAt, &i.DeletedAt, &i.FreeFullEducatorsUsed, + &i.StripeSubscriptionID, + &i.SubCancelAtPeriodEnd, + &i.SubCancelIsAuto, + &i.SubCurrentPeriodStart, + &i.PendingKeptBanner, + &i.IdleEligibleAfter, ) return i, err } const getUserByEmailIncludingDeleted = `-- name: GetUserByEmailIncludingDeleted :one -SELECT id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used FROM users +SELECT id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used, stripe_subscription_id, sub_cancel_at_period_end, sub_cancel_is_auto, sub_current_period_start, pending_kept_banner, idle_eligible_after FROM users WHERE email = $1 ` @@ -196,12 +316,18 @@ func (q *Queries) GetUserByEmailIncludingDeleted(ctx context.Context, email stri &i.UpdatedAt, &i.DeletedAt, &i.FreeFullEducatorsUsed, + &i.StripeSubscriptionID, + &i.SubCancelAtPeriodEnd, + &i.SubCancelIsAuto, + &i.SubCurrentPeriodStart, + &i.PendingKeptBanner, + &i.IdleEligibleAfter, ) return i, err } const getUserByID = `-- name: GetUserByID :one -SELECT id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used FROM users +SELECT id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used, stripe_subscription_id, sub_cancel_at_period_end, sub_cancel_is_auto, sub_current_period_start, pending_kept_banner, idle_eligible_after FROM users WHERE id = $1 AND deleted_at IS NULL ` @@ -221,12 +347,18 @@ func (q *Queries) GetUserByID(ctx context.Context, id uuid.UUID) (User, error) { &i.UpdatedAt, &i.DeletedAt, &i.FreeFullEducatorsUsed, + &i.StripeSubscriptionID, + &i.SubCancelAtPeriodEnd, + &i.SubCancelIsAuto, + &i.SubCurrentPeriodStart, + &i.PendingKeptBanner, + &i.IdleEligibleAfter, ) return i, err } const getUserByIDIncludingDeleted = `-- name: GetUserByIDIncludingDeleted :one -SELECT id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used FROM users +SELECT id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used, stripe_subscription_id, sub_cancel_at_period_end, sub_cancel_is_auto, sub_current_period_start, pending_kept_banner, idle_eligible_after FROM users WHERE id = $1 ` @@ -246,6 +378,12 @@ func (q *Queries) GetUserByIDIncludingDeleted(ctx context.Context, id uuid.UUID) &i.UpdatedAt, &i.DeletedAt, &i.FreeFullEducatorsUsed, + &i.StripeSubscriptionID, + &i.SubCancelAtPeriodEnd, + &i.SubCancelIsAuto, + &i.SubCurrentPeriodStart, + &i.PendingKeptBanner, + &i.IdleEligibleAfter, ) return i, err } @@ -269,6 +407,19 @@ func (q *Queries) IncrementFreeEducatorUsed(ctx context.Context, arg IncrementFr return free_full_educators_used, err } +const lockUserForSubDecision = `-- name: LockUserForSubDecision :one +SELECT id FROM users WHERE id = $1 FOR UPDATE +` + +// Per-user mutex for the cancel transaction. Acquires a row-level lock that +// serializes concurrent invoice.upcoming evaluations for the same user. +// Must be inside a transaction; releases on COMMIT or ROLLBACK. +func (q *Queries) LockUserForSubDecision(ctx context.Context, id uuid.UUID) (uuid.UUID, error) { + row := q.db.QueryRow(ctx, lockUserForSubDecision, id) + err := row.Scan(&id) + return id, err +} + const reactivateUser = `-- name: ReactivateUser :exec UPDATE users SET deleted_at = NULL, email_verified = TRUE, updated_at = NOW() WHERE id = $1 @@ -279,6 +430,26 @@ func (q *Queries) ReactivateUser(ctx context.Context, id uuid.UUID) error { return err } +const setUserAutoCancelState = `-- name: SetUserAutoCancelState :exec +UPDATE users +SET sub_cancel_at_period_end = TRUE, + sub_cancel_is_auto = TRUE, + sub_current_period_start = $2 +WHERE id = $1 +` + +type SetUserAutoCancelStateParams struct { + ID uuid.UUID `json:"id"` + SubCurrentPeriodStart pgtype.Timestamptz `json:"sub_current_period_start"` +} + +// Called by idleunsub.HandleInvoiceUpcoming when our trigger fires. +// Sets BOTH cache flags so the auto-reverse middleware gate fires for this user. +func (q *Queries) SetUserAutoCancelState(ctx context.Context, arg SetUserAutoCancelStateParams) error { + _, err := q.db.Exec(ctx, setUserAutoCancelState, arg.ID, arg.SubCurrentPeriodStart) + return err +} + const softDeleteUser = `-- name: SoftDeleteUser :exec UPDATE users SET deleted_at = NOW(), updated_at = NOW() WHERE id = $1 @@ -289,6 +460,33 @@ func (q *Queries) SoftDeleteUser(ctx context.Context, id uuid.UUID) error { return err } +const syncSubStateFromWebhook = `-- name: SyncSubStateFromWebhook :exec +UPDATE users +SET stripe_subscription_id = $2, + sub_cancel_at_period_end = $3, + sub_current_period_start = $4 +WHERE id = $1 +` + +type SyncSubStateFromWebhookParams struct { + ID uuid.UUID `json:"id"` + StripeSubscriptionID pgtype.Text `json:"stripe_subscription_id"` + SubCancelAtPeriodEnd bool `json:"sub_cancel_at_period_end"` + SubCurrentPeriodStart pgtype.Timestamptz `json:"sub_current_period_start"` +} + +// Called by handleSubscriptionUpdated. Does NOT touch sub_cancel_is_auto: +// only our handler sets that flag; webhook sync must not overwrite it. +func (q *Queries) SyncSubStateFromWebhook(ctx context.Context, arg SyncSubStateFromWebhookParams) error { + _, err := q.db.Exec(ctx, syncSubStateFromWebhook, + arg.ID, + arg.StripeSubscriptionID, + arg.SubCancelAtPeriodEnd, + arg.SubCurrentPeriodStart, + ) + return err +} + const updatePlanByStripeCustomer = `-- name: UpdatePlanByStripeCustomer :execrows UPDATE users SET plan = $1, updated_at = NOW() WHERE stripe_customer_id = $2 AND deleted_at IS NULL @@ -310,7 +508,7 @@ func (q *Queries) UpdatePlanByStripeCustomer(ctx context.Context, arg UpdatePlan const updateUserDisplayName = `-- name: UpdateUserDisplayName :one UPDATE users SET display_name = $1, updated_at = NOW() WHERE id = $2 AND deleted_at IS NULL -RETURNING id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used +RETURNING id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used, stripe_subscription_id, sub_cancel_at_period_end, sub_cancel_is_auto, sub_current_period_start, pending_kept_banner, idle_eligible_after ` type UpdateUserDisplayNameParams struct { @@ -334,6 +532,12 @@ func (q *Queries) UpdateUserDisplayName(ctx context.Context, arg UpdateUserDispl &i.UpdatedAt, &i.DeletedAt, &i.FreeFullEducatorsUsed, + &i.StripeSubscriptionID, + &i.SubCancelAtPeriodEnd, + &i.SubCancelIsAuto, + &i.SubCurrentPeriodStart, + &i.PendingKeptBanner, + &i.IdleEligibleAfter, ) return i, err } diff --git a/sql/migrations/016_users_sub_state.down.sql b/sql/migrations/016_users_sub_state.down.sql new file mode 100644 index 00000000..b4ec8b68 --- /dev/null +++ b/sql/migrations/016_users_sub_state.down.sql @@ -0,0 +1,7 @@ +ALTER TABLE users + DROP COLUMN idle_eligible_after, + DROP COLUMN pending_kept_banner, + DROP COLUMN sub_current_period_start, + DROP COLUMN sub_cancel_is_auto, + DROP COLUMN sub_cancel_at_period_end, + DROP COLUMN stripe_subscription_id; diff --git a/sql/migrations/016_users_sub_state.up.sql b/sql/migrations/016_users_sub_state.up.sql new file mode 100644 index 00000000..b7e8c97e --- /dev/null +++ b/sql/migrations/016_users_sub_state.up.sql @@ -0,0 +1,7 @@ +ALTER TABLE users + ADD COLUMN stripe_subscription_id TEXT, + ADD COLUMN sub_cancel_at_period_end BOOLEAN NOT NULL DEFAULT FALSE, + ADD COLUMN sub_cancel_is_auto BOOLEAN NOT NULL DEFAULT FALSE, + ADD COLUMN sub_current_period_start TIMESTAMPTZ, + ADD COLUMN pending_kept_banner BOOLEAN NOT NULL DEFAULT FALSE, + ADD COLUMN idle_eligible_after TIMESTAMPTZ NOT NULL DEFAULT NOW(); diff --git a/sql/queries/users.sql b/sql/queries/users.sql index bd166591..44097d21 100644 --- a/sql/queries/users.sql +++ b/sql/queries/users.sql @@ -79,3 +79,68 @@ INSERT INTO users (email, email_verified, display_name) VALUES (@email, TRUE, @display_name) ON CONFLICT (email) DO NOTHING RETURNING *; + +-- name: SetUserAutoCancelState :exec +-- Called by idleunsub.HandleInvoiceUpcoming when our trigger fires. +-- Sets BOTH cache flags so the auto-reverse middleware gate fires for this user. +UPDATE users +SET sub_cancel_at_period_end = TRUE, + sub_cancel_is_auto = TRUE, + sub_current_period_start = $2 +WHERE id = $1; + +-- name: ClearUserAutoCancelState :exec +-- Called by KeepSubscription / AutoReverse on reversal. Sets banner flag. +UPDATE users +SET sub_cancel_at_period_end = FALSE, + sub_cancel_is_auto = FALSE, + pending_kept_banner = TRUE, + sub_current_period_start = $2 +WHERE id = $1; + +-- name: SyncSubStateFromWebhook :exec +-- Called by handleSubscriptionUpdated. Does NOT touch sub_cancel_is_auto: +-- only our handler sets that flag; webhook sync must not overwrite it. +UPDATE users +SET stripe_subscription_id = $2, + sub_cancel_at_period_end = $3, + sub_current_period_start = $4 +WHERE id = $1; + +-- name: ClearSubStateOnDeletion :exec +-- Called by handleSubscriptionDeleted. Clears all sub state and downgrades plan. +UPDATE users +SET stripe_subscription_id = NULL, + sub_cancel_at_period_end = FALSE, + sub_cancel_is_auto = FALSE, + sub_current_period_start = NULL, + plan = 'free' +WHERE id = $1; + +-- name: ClearKeptBanner :exec +UPDATE users SET pending_kept_banner = FALSE WHERE id = $1; + +-- name: ClearStaleKeptBanners :exec +-- Periodic hygiene: clear banners that are older than 14 days. +-- (Run from a tiny daily cron; the spec calls this out as low-priority cleanup.) +UPDATE users +SET pending_kept_banner = FALSE +WHERE pending_kept_banner = TRUE + AND id IN ( + SELECT user_id FROM user_events + WHERE event_type = 'subscription_kept' + AND created_at < NOW() - INTERVAL '14 days' + ); + +-- name: GetUserAutoCancelGates :one +-- Used by AutoReverse to defensively re-read both gates. +SELECT sub_cancel_at_period_end, sub_cancel_is_auto, + stripe_subscription_id, sub_current_period_start +FROM users +WHERE id = $1; + +-- name: LockUserForSubDecision :one +-- Per-user mutex for the cancel transaction. Acquires a row-level lock that +-- serializes concurrent invoice.upcoming evaluations for the same user. +-- Must be inside a transaction; releases on COMMIT or ROLLBACK. +SELECT id FROM users WHERE id = $1 FOR UPDATE; From 93e402a3f2fa50995f56feb7ab9e9e7f4d77d4cb Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Fri, 8 May 2026 00:25:52 -0400 Subject: [PATCH 10/37] stripe_webhook_dedup: generic webhook idempotency (migration 017) --- internal/db/models.go | 6 ++++ internal/db/querier.go | 3 ++ internal/db/stripe_webhook_dedup.sql.go | 31 +++++++++++++++++++ .../017_stripe_webhook_dedup.down.sql | 1 + .../017_stripe_webhook_dedup.up.sql | 5 +++ sql/queries/stripe_webhook_dedup.sql | 7 +++++ 6 files changed, 53 insertions(+) create mode 100644 internal/db/stripe_webhook_dedup.sql.go create mode 100644 sql/migrations/017_stripe_webhook_dedup.down.sql create mode 100644 sql/migrations/017_stripe_webhook_dedup.up.sql create mode 100644 sql/queries/stripe_webhook_dedup.sql diff --git a/internal/db/models.go b/internal/db/models.go index e0757663..8b73a0fe 100644 --- a/internal/db/models.go +++ b/internal/db/models.go @@ -164,6 +164,12 @@ type Question struct { FeaturedOrder pgtype.Int4 `json:"featured_order"` } +type StripeWebhookDedup struct { + EventID string `json:"event_id"` + EventType string `json:"event_type"` + ProcessedAt time.Time `json:"processed_at"` +} + type User struct { ID uuid.UUID `json:"id"` Email string `json:"email"` diff --git a/internal/db/querier.go b/internal/db/querier.go index 37ddd3bf..0fbe78ff 100644 --- a/internal/db/querier.go +++ b/internal/db/querier.go @@ -164,6 +164,9 @@ type Querier interface { // only our handler sets that flag; webhook sync must not overwrite it. SyncSubStateFromWebhook(ctx context.Context, arg SyncSubStateFromWebhookParams) error TouchAuthSession(ctx context.Context, id uuid.UUID) error + // Returns the event_id on first claim; returns no row on subsequent claims. + // Use the no-row return as the signal "this event was already handled". + TryClaimWebhookEvent(ctx context.Context, arg TryClaimWebhookEventParams) (string, error) UpdateEducatorAnalysisContent(ctx context.Context, arg UpdateEducatorAnalysisContentParams) error UpdateEducatorAnalysisStatus(ctx context.Context, arg UpdateEducatorAnalysisStatusParams) error UpdatePlanByStripeCustomer(ctx context.Context, arg UpdatePlanByStripeCustomerParams) (int64, error) diff --git a/internal/db/stripe_webhook_dedup.sql.go b/internal/db/stripe_webhook_dedup.sql.go new file mode 100644 index 00000000..88c62422 --- /dev/null +++ b/internal/db/stripe_webhook_dedup.sql.go @@ -0,0 +1,31 @@ +// Code generated by sqlc. DO NOT EDIT. +// versions: +// sqlc v1.25.0 +// source: stripe_webhook_dedup.sql + +package db + +import ( + "context" +) + +const tryClaimWebhookEvent = `-- name: TryClaimWebhookEvent :one +INSERT INTO stripe_webhook_dedup (event_id, event_type) +VALUES ($1, $2) +ON CONFLICT (event_id) DO NOTHING +RETURNING event_id +` + +type TryClaimWebhookEventParams struct { + EventID string `json:"event_id"` + EventType string `json:"event_type"` +} + +// Returns the event_id on first claim; returns no row on subsequent claims. +// Use the no-row return as the signal "this event was already handled". +func (q *Queries) TryClaimWebhookEvent(ctx context.Context, arg TryClaimWebhookEventParams) (string, error) { + row := q.db.QueryRow(ctx, tryClaimWebhookEvent, arg.EventID, arg.EventType) + var event_id string + err := row.Scan(&event_id) + return event_id, err +} diff --git a/sql/migrations/017_stripe_webhook_dedup.down.sql b/sql/migrations/017_stripe_webhook_dedup.down.sql new file mode 100644 index 00000000..25d99421 --- /dev/null +++ b/sql/migrations/017_stripe_webhook_dedup.down.sql @@ -0,0 +1 @@ +DROP TABLE IF EXISTS stripe_webhook_dedup; diff --git a/sql/migrations/017_stripe_webhook_dedup.up.sql b/sql/migrations/017_stripe_webhook_dedup.up.sql new file mode 100644 index 00000000..0209f84c --- /dev/null +++ b/sql/migrations/017_stripe_webhook_dedup.up.sql @@ -0,0 +1,5 @@ +CREATE TABLE stripe_webhook_dedup ( + event_id TEXT PRIMARY KEY, + event_type TEXT NOT NULL, + processed_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); diff --git a/sql/queries/stripe_webhook_dedup.sql b/sql/queries/stripe_webhook_dedup.sql new file mode 100644 index 00000000..fe20a224 --- /dev/null +++ b/sql/queries/stripe_webhook_dedup.sql @@ -0,0 +1,7 @@ +-- name: TryClaimWebhookEvent :one +-- Returns the event_id on first claim; returns no row on subsequent claims. +-- Use the no-row return as the signal "this event was already handled". +INSERT INTO stripe_webhook_dedup (event_id, event_type) +VALUES ($1, $2) +ON CONFLICT (event_id) DO NOTHING +RETURNING event_id; From f6541c9ac22cfc60820aa56b6c3da2398846d468 Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Fri, 8 May 2026 00:31:49 -0400 Subject: [PATCH 11/37] keep_link_token_uses + user_events dedup queries (migration 018) --- internal/db/keep_link_token_uses.sql.go | 32 ++++++ internal/db/models.go | 6 ++ internal/db/querier.go | 13 +++ internal/db/user_events.sql.go | 97 +++++++++++++++++++ .../018_keep_link_token_uses.down.sql | 1 + .../018_keep_link_token_uses.up.sql | 5 + sql/queries/keep_link_token_uses.sql | 6 ++ sql/queries/user_events.sql | 33 +++++++ 8 files changed, 193 insertions(+) create mode 100644 internal/db/keep_link_token_uses.sql.go create mode 100644 internal/db/user_events.sql.go create mode 100644 sql/migrations/018_keep_link_token_uses.down.sql create mode 100644 sql/migrations/018_keep_link_token_uses.up.sql create mode 100644 sql/queries/keep_link_token_uses.sql create mode 100644 sql/queries/user_events.sql diff --git a/internal/db/keep_link_token_uses.sql.go b/internal/db/keep_link_token_uses.sql.go new file mode 100644 index 00000000..c2ff5987 --- /dev/null +++ b/internal/db/keep_link_token_uses.sql.go @@ -0,0 +1,32 @@ +// Code generated by sqlc. DO NOT EDIT. +// versions: +// sqlc v1.25.0 +// source: keep_link_token_uses.sql + +package db + +import ( + "context" + + "github.com/google/uuid" +) + +const tryClaimKeepToken = `-- name: TryClaimKeepToken :one +INSERT INTO keep_link_token_uses (token_hash, user_id) +VALUES ($1, $2) +ON CONFLICT (token_hash) DO NOTHING +RETURNING token_hash +` + +type TryClaimKeepTokenParams struct { + TokenHash []byte `json:"token_hash"` + UserID uuid.UUID `json:"user_id"` +} + +// Single-use enforcement. Returns the hash on first claim, no row otherwise. +func (q *Queries) TryClaimKeepToken(ctx context.Context, arg TryClaimKeepTokenParams) ([]byte, error) { + row := q.db.QueryRow(ctx, tryClaimKeepToken, arg.TokenHash, arg.UserID) + var token_hash []byte + err := row.Scan(&token_hash) + return token_hash, err +} diff --git a/internal/db/models.go b/internal/db/models.go index 8b73a0fe..d265a2f8 100644 --- a/internal/db/models.go +++ b/internal/db/models.go @@ -99,6 +99,12 @@ type InterviewSession struct { GeneratingSince pgtype.Timestamptz `json:"generating_since"` } +type KeepLinkTokenUse struct { + TokenHash []byte `json:"token_hash"` + UserID uuid.UUID `json:"user_id"` + UsedAt time.Time `json:"used_at"` +} + type LedgerEntry struct { ID uuid.UUID `json:"id"` UserID uuid.UUID `json:"user_id"` diff --git a/internal/db/querier.go b/internal/db/querier.go index 0fbe78ff..9ffb79cd 100644 --- a/internal/db/querier.go +++ b/internal/db/querier.go @@ -83,6 +83,10 @@ type Querier interface { GetMessagesBySessionAfterSeq(ctx context.Context, arg GetMessagesBySessionAfterSeqParams) ([]Message, error) // Return messages after the given offset (for known_message_count cursor). GetMessagesBySessionOffset(ctx context.Context, arg GetMessagesBySessionOffsetParams) ([]Message, error) + // For multi-firing dedup: did a 'subscription_kept' event arrive after the + // most recent 'subscription_auto_canceled' for this period? If yes, the user + // already kept their sub for this period; don't re-cancel. + GetMostRecentKeptOrCanceledForPeriod(ctx context.Context, arg GetMostRecentKeptOrCanceledForPeriodParams) (string, error) GetOAuthAccount(ctx context.Context, arg GetOAuthAccountParams) (OauthAccount, error) GetOAuthAccountsByUser(ctx context.Context, userID uuid.UUID) ([]OauthAccount, error) GetQuestion(ctx context.Context, id uuid.UUID) (Question, error) @@ -115,6 +119,10 @@ type Querier interface { // SELECT MAX(last_active)::timestamptz FROM auth_sessions WHERE user_id = $1; GetUserLastActive(ctx context.Context, userID uuid.UUID) (pgtype.Timestamptz, error) GetUserUsageSummary(ctx context.Context, userID uuid.UUID) (GetUserUsageSummaryRow, error) + // Returns true if a subscription_auto_canceled event already exists for + // this (user, subscription, period). Backs the multi-firing dedup in §4.1. + // metadata->>'current_period_start' is RFC3339 text written by cancelMetadata. + HasAutoCanceledThisPeriod(ctx context.Context, arg HasAutoCanceledThisPeriodParams) (bool, error) IncrementFreeEducatorUsed(ctx context.Context, arg IncrementFreeEducatorUsedParams) (int32, error) // Inline crash recovery for EndSession/CancelSession. // Conditional WHERE makes this idempotent and race-free. @@ -127,6 +135,9 @@ type Querier interface { InsertLLMCallContent(ctx context.Context, arg InsertLLMCallContentParams) error InsertMessage(ctx context.Context, arg InsertMessageParams) (Message, error) InsertQuestion(ctx context.Context, arg InsertQuestionParams) (uuid.UUID, error) + // Composed insert for the cancel decision. metadata is already-marshaled JSON. + InsertSubscriptionAutoCanceledEvent(ctx context.Context, arg InsertSubscriptionAutoCanceledEventParams) error + InsertSubscriptionKeptEvent(ctx context.Context, arg InsertSubscriptionKeptEventParams) error LinkOAuthAccount(ctx context.Context, arg LinkOAuthAccountParams) (OauthAccount, error) ListActiveGrants(ctx context.Context, userID uuid.UUID) ([]ListActiveGrantsRow, error) ListFeaturedQuestions(ctx context.Context) ([]ListFeaturedQuestionsRow, error) @@ -164,6 +175,8 @@ type Querier interface { // only our handler sets that flag; webhook sync must not overwrite it. SyncSubStateFromWebhook(ctx context.Context, arg SyncSubStateFromWebhookParams) error TouchAuthSession(ctx context.Context, id uuid.UUID) error + // Single-use enforcement. Returns the hash on first claim, no row otherwise. + TryClaimKeepToken(ctx context.Context, arg TryClaimKeepTokenParams) ([]byte, error) // Returns the event_id on first claim; returns no row on subsequent claims. // Use the no-row return as the signal "this event was already handled". TryClaimWebhookEvent(ctx context.Context, arg TryClaimWebhookEventParams) (string, error) diff --git a/internal/db/user_events.sql.go b/internal/db/user_events.sql.go new file mode 100644 index 00000000..75ad011c --- /dev/null +++ b/internal/db/user_events.sql.go @@ -0,0 +1,97 @@ +// Code generated by sqlc. DO NOT EDIT. +// versions: +// sqlc v1.25.0 +// source: user_events.sql + +package db + +import ( + "context" + "time" + + "github.com/google/uuid" +) + +const getMostRecentKeptOrCanceledForPeriod = `-- name: GetMostRecentKeptOrCanceledForPeriod :one +SELECT event_type +FROM user_events +WHERE user_id = $1 + AND event_type IN ('subscription_auto_canceled', 'subscription_kept') + AND (metadata->>'subscription_id') = $2::text + AND (metadata->>'current_period_start')::timestamptz = $3::timestamptz +ORDER BY created_at DESC +LIMIT 1 +` + +type GetMostRecentKeptOrCanceledForPeriodParams struct { + UserID uuid.UUID `json:"user_id"` + SubscriptionID string `json:"subscription_id"` + CurrentPeriodStart time.Time `json:"current_period_start"` +} + +// For multi-firing dedup: did a 'subscription_kept' event arrive after the +// most recent 'subscription_auto_canceled' for this period? If yes, the user +// already kept their sub for this period; don't re-cancel. +func (q *Queries) GetMostRecentKeptOrCanceledForPeriod(ctx context.Context, arg GetMostRecentKeptOrCanceledForPeriodParams) (string, error) { + row := q.db.QueryRow(ctx, getMostRecentKeptOrCanceledForPeriod, arg.UserID, arg.SubscriptionID, arg.CurrentPeriodStart) + var event_type string + err := row.Scan(&event_type) + return event_type, err +} + +const hasAutoCanceledThisPeriod = `-- name: HasAutoCanceledThisPeriod :one +SELECT EXISTS ( + SELECT 1 FROM user_events + WHERE user_id = $1 + AND event_type = 'subscription_auto_canceled' + AND (metadata->>'subscription_id') = $2::text + AND (metadata->>'current_period_start')::timestamptz = $3::timestamptz +) +` + +type HasAutoCanceledThisPeriodParams struct { + UserID uuid.UUID `json:"user_id"` + SubscriptionID string `json:"subscription_id"` + CurrentPeriodStart time.Time `json:"current_period_start"` +} + +// Returns true if a subscription_auto_canceled event already exists for +// this (user, subscription, period). Backs the multi-firing dedup in §4.1. +// metadata->>'current_period_start' is RFC3339 text written by cancelMetadata. +func (q *Queries) HasAutoCanceledThisPeriod(ctx context.Context, arg HasAutoCanceledThisPeriodParams) (bool, error) { + row := q.db.QueryRow(ctx, hasAutoCanceledThisPeriod, arg.UserID, arg.SubscriptionID, arg.CurrentPeriodStart) + var exists bool + err := row.Scan(&exists) + return exists, err +} + +const insertSubscriptionAutoCanceledEvent = `-- name: InsertSubscriptionAutoCanceledEvent :exec +INSERT INTO user_events (user_id, event_type, metadata) +VALUES ($1, 'subscription_auto_canceled', $2) +` + +type InsertSubscriptionAutoCanceledEventParams struct { + UserID uuid.UUID `json:"user_id"` + Metadata []byte `json:"metadata"` +} + +// Composed insert for the cancel decision. metadata is already-marshaled JSON. +func (q *Queries) InsertSubscriptionAutoCanceledEvent(ctx context.Context, arg InsertSubscriptionAutoCanceledEventParams) error { + _, err := q.db.Exec(ctx, insertSubscriptionAutoCanceledEvent, arg.UserID, arg.Metadata) + return err +} + +const insertSubscriptionKeptEvent = `-- name: InsertSubscriptionKeptEvent :exec +INSERT INTO user_events (user_id, event_type, metadata) +VALUES ($1, 'subscription_kept', $2) +` + +type InsertSubscriptionKeptEventParams struct { + UserID uuid.UUID `json:"user_id"` + Metadata []byte `json:"metadata"` +} + +func (q *Queries) InsertSubscriptionKeptEvent(ctx context.Context, arg InsertSubscriptionKeptEventParams) error { + _, err := q.db.Exec(ctx, insertSubscriptionKeptEvent, arg.UserID, arg.Metadata) + return err +} diff --git a/sql/migrations/018_keep_link_token_uses.down.sql b/sql/migrations/018_keep_link_token_uses.down.sql new file mode 100644 index 00000000..5aea5905 --- /dev/null +++ b/sql/migrations/018_keep_link_token_uses.down.sql @@ -0,0 +1 @@ +DROP TABLE IF EXISTS keep_link_token_uses; diff --git a/sql/migrations/018_keep_link_token_uses.up.sql b/sql/migrations/018_keep_link_token_uses.up.sql new file mode 100644 index 00000000..07b91837 --- /dev/null +++ b/sql/migrations/018_keep_link_token_uses.up.sql @@ -0,0 +1,5 @@ +CREATE TABLE keep_link_token_uses ( + token_hash BYTEA PRIMARY KEY, + user_id UUID NOT NULL REFERENCES users(id), + used_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); diff --git a/sql/queries/keep_link_token_uses.sql b/sql/queries/keep_link_token_uses.sql new file mode 100644 index 00000000..837cd948 --- /dev/null +++ b/sql/queries/keep_link_token_uses.sql @@ -0,0 +1,6 @@ +-- name: TryClaimKeepToken :one +-- Single-use enforcement. Returns the hash on first claim, no row otherwise. +INSERT INTO keep_link_token_uses (token_hash, user_id) +VALUES ($1, $2) +ON CONFLICT (token_hash) DO NOTHING +RETURNING token_hash; diff --git a/sql/queries/user_events.sql b/sql/queries/user_events.sql new file mode 100644 index 00000000..8ba4b0c0 --- /dev/null +++ b/sql/queries/user_events.sql @@ -0,0 +1,33 @@ +-- name: HasAutoCanceledThisPeriod :one +-- Returns true if a subscription_auto_canceled event already exists for +-- this (user, subscription, period). Backs the multi-firing dedup in §4.1. +-- metadata->>'current_period_start' is RFC3339 text written by cancelMetadata. +SELECT EXISTS ( + SELECT 1 FROM user_events + WHERE user_id = @user_id + AND event_type = 'subscription_auto_canceled' + AND (metadata->>'subscription_id') = @subscription_id::text + AND (metadata->>'current_period_start')::timestamptz = @current_period_start::timestamptz +); + +-- name: GetMostRecentKeptOrCanceledForPeriod :one +-- For multi-firing dedup: did a 'subscription_kept' event arrive after the +-- most recent 'subscription_auto_canceled' for this period? If yes, the user +-- already kept their sub for this period; don't re-cancel. +SELECT event_type +FROM user_events +WHERE user_id = @user_id + AND event_type IN ('subscription_auto_canceled', 'subscription_kept') + AND (metadata->>'subscription_id') = @subscription_id::text + AND (metadata->>'current_period_start')::timestamptz = @current_period_start::timestamptz +ORDER BY created_at DESC +LIMIT 1; + +-- name: InsertSubscriptionAutoCanceledEvent :exec +-- Composed insert for the cancel decision. metadata is already-marshaled JSON. +INSERT INTO user_events (user_id, event_type, metadata) +VALUES (@user_id, 'subscription_auto_canceled', @metadata); + +-- name: InsertSubscriptionKeptEvent :exec +INSERT INTO user_events (user_id, event_type, metadata) +VALUES (@user_id, 'subscription_kept', @metadata); From 7b321d1a062c32eca638fcba611531b03bad4f6c Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Fri, 8 May 2026 00:41:19 -0400 Subject: [PATCH 12/37] auth: project sub_cancel_at_period_end and friends into AuthUser --- internal/auth/user_context.go | 5 ++++ internal/backend/auth_test.go | 35 ++++++++++++++++++++++++++++ internal/backend/backend.go | 17 ++++++++------ internal/db/auth_sessions.sql.go | 39 +++++++++++++++++++------------- sql/queries/auth_sessions.sql | 3 ++- 5 files changed, 75 insertions(+), 24 deletions(-) diff --git a/internal/auth/user_context.go b/internal/auth/user_context.go index 8ec28cac..5e44074f 100644 --- a/internal/auth/user_context.go +++ b/internal/auth/user_context.go @@ -22,6 +22,11 @@ type AuthUser struct { Plan string EmailVerified bool CreatedAt time.Time + + // Auto-cancel state (migration 016). + SubCancelAtPeriodEnd bool + SubCancelIsAuto bool + PendingKeptBanner bool } // SessionAuthenticator validates a hashed session token and returns the diff --git a/internal/backend/auth_test.go b/internal/backend/auth_test.go index 3169fd30..b93767a3 100644 --- a/internal/backend/auth_test.go +++ b/internal/backend/auth_test.go @@ -487,6 +487,41 @@ func TestResetPassword_Success(t *testing.T) { require.ErrorIs(t, err, backend.ErrInvalidCredentials) } +// --------------------------------------------------------------------------- +// AuthenticateSession +// --------------------------------------------------------------------------- + +func TestAuthenticateSession_ProjectsSubState(t *testing.T) { + t.Parallel() + b := pg.NewBackend(t) + ctx := context.Background() + + signupRes := signupUser(t, b, "substate@example.com", "testpassword123", "SubState") + + // Force the cache columns to known values. + _, err := b.Pool().Exec(ctx, ` + UPDATE users + SET sub_cancel_at_period_end = TRUE, + sub_cancel_is_auto = TRUE, + pending_kept_banner = TRUE + WHERE id = $1`, signupRes.UserID) + require.NoError(t, err) + + loginRes, err := b.Login(ctx, backend.LoginParams{ + Email: "substate@example.com", + Password: "testpassword123", + }) + require.NoError(t, err) + + tokenHash := auth.HashSessionToken(loginRes.Token) + + user, err := b.AuthenticateSession(ctx, tokenHash) + require.NoError(t, err) + require.True(t, user.SubCancelAtPeriodEnd) + require.True(t, user.SubCancelIsAuto) + require.True(t, user.PendingKeptBanner) +} + func TestResetPassword_InvalidToken(t *testing.T) { t.Parallel() b := pg.NewBackend(t) diff --git a/internal/backend/backend.go b/internal/backend/backend.go index ee6d08dc..cbfb61e3 100644 --- a/internal/backend/backend.go +++ b/internal/backend/backend.go @@ -285,13 +285,16 @@ func (b *Backend) AuthenticateSession(ctx context.Context, tokenHash string) (_ }() return &auth.AuthUser{ - ID: row.UserID, - Email: row.Email, - DisplayName: row.DisplayName, - Role: row.Role, - Plan: row.Plan, - EmailVerified: row.EmailVerified, - CreatedAt: row.UserCreatedAt, + ID: row.UserID, + Email: row.Email, + DisplayName: row.DisplayName, + Role: row.Role, + Plan: row.Plan, + EmailVerified: row.EmailVerified, + CreatedAt: row.UserCreatedAt, + SubCancelAtPeriodEnd: row.SubCancelAtPeriodEnd, + SubCancelIsAuto: row.SubCancelIsAuto, + PendingKeptBanner: row.PendingKeptBanner, }, nil } diff --git a/internal/db/auth_sessions.sql.go b/internal/db/auth_sessions.sql.go index 0de91f79..a7a03be7 100644 --- a/internal/db/auth_sessions.sql.go +++ b/internal/db/auth_sessions.sql.go @@ -53,7 +53,8 @@ func (q *Queries) CreateAuthSession(ctx context.Context, arg CreateAuthSessionPa const getAuthSessionByToken = `-- name: GetAuthSessionByToken :one SELECT s.id, s.user_id, s.token_hash, s.expires_at, s.last_active, s.ip_address, s.user_agent, s.created_at, s.revoked_at, u.email, u.display_name, u.role, u.plan, u.email_verified, - u.created_at AS user_created_at + u.created_at AS user_created_at, + u.sub_cancel_at_period_end, u.sub_cancel_is_auto, u.pending_kept_banner FROM auth_sessions s JOIN users u ON u.id = s.user_id WHERE s.token_hash = $1 @@ -63,21 +64,24 @@ WHERE s.token_hash = $1 ` type GetAuthSessionByTokenRow struct { - ID uuid.UUID `json:"id"` - UserID uuid.UUID `json:"user_id"` - TokenHash string `json:"token_hash"` - ExpiresAt time.Time `json:"expires_at"` - LastActive time.Time `json:"last_active"` - IpAddress *netip.Addr `json:"ip_address"` - UserAgent pgtype.Text `json:"user_agent"` - CreatedAt time.Time `json:"created_at"` - RevokedAt pgtype.Timestamptz `json:"revoked_at"` - Email string `json:"email"` - DisplayName string `json:"display_name"` - Role string `json:"role"` - Plan string `json:"plan"` - EmailVerified bool `json:"email_verified"` - UserCreatedAt time.Time `json:"user_created_at"` + ID uuid.UUID `json:"id"` + UserID uuid.UUID `json:"user_id"` + TokenHash string `json:"token_hash"` + ExpiresAt time.Time `json:"expires_at"` + LastActive time.Time `json:"last_active"` + IpAddress *netip.Addr `json:"ip_address"` + UserAgent pgtype.Text `json:"user_agent"` + CreatedAt time.Time `json:"created_at"` + RevokedAt pgtype.Timestamptz `json:"revoked_at"` + Email string `json:"email"` + DisplayName string `json:"display_name"` + Role string `json:"role"` + Plan string `json:"plan"` + EmailVerified bool `json:"email_verified"` + UserCreatedAt time.Time `json:"user_created_at"` + SubCancelAtPeriodEnd bool `json:"sub_cancel_at_period_end"` + SubCancelIsAuto bool `json:"sub_cancel_is_auto"` + PendingKeptBanner bool `json:"pending_kept_banner"` } // Filters out soft-revoked sessions; only returns valid live sessions. @@ -101,6 +105,9 @@ func (q *Queries) GetAuthSessionByToken(ctx context.Context, tokenHash string) ( &i.Plan, &i.EmailVerified, &i.UserCreatedAt, + &i.SubCancelAtPeriodEnd, + &i.SubCancelIsAuto, + &i.PendingKeptBanner, ) return i, err } diff --git a/sql/queries/auth_sessions.sql b/sql/queries/auth_sessions.sql index 474868fe..29f4e424 100644 --- a/sql/queries/auth_sessions.sql +++ b/sql/queries/auth_sessions.sql @@ -7,7 +7,8 @@ RETURNING *; -- Filters out soft-revoked sessions; only returns valid live sessions. -- (Activity queries do NOT filter on revoked_at — see GetUserLastActive.) SELECT s.*, u.email, u.display_name, u.role, u.plan, u.email_verified, - u.created_at AS user_created_at + u.created_at AS user_created_at, + u.sub_cancel_at_period_end, u.sub_cancel_is_auto, u.pending_kept_banner FROM auth_sessions s JOIN users u ON u.id = s.user_id WHERE s.token_hash = $1 From 90aa5dff8b7b4326f8a2e636012ef4b590ef4949 Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Fri, 8 May 2026 00:47:34 -0400 Subject: [PATCH 13/37] auth: tighten AuthUser projection test --- internal/auth/user_context.go | 2 +- internal/backend/auth_test.go | 28 +++++++++++++++++++--------- 2 files changed, 20 insertions(+), 10 deletions(-) diff --git a/internal/auth/user_context.go b/internal/auth/user_context.go index 5e44074f..3ef062ba 100644 --- a/internal/auth/user_context.go +++ b/internal/auth/user_context.go @@ -23,7 +23,7 @@ type AuthUser struct { EmailVerified bool CreatedAt time.Time - // Auto-cancel state (migration 016). + // Auto-cancel state cache. SubCancelAtPeriodEnd bool SubCancelIsAuto bool PendingKeptBanner bool diff --git a/internal/backend/auth_test.go b/internal/backend/auth_test.go index b93767a3..0c5a8588 100644 --- a/internal/backend/auth_test.go +++ b/internal/backend/auth_test.go @@ -498,15 +498,6 @@ func TestAuthenticateSession_ProjectsSubState(t *testing.T) { signupRes := signupUser(t, b, "substate@example.com", "testpassword123", "SubState") - // Force the cache columns to known values. - _, err := b.Pool().Exec(ctx, ` - UPDATE users - SET sub_cancel_at_period_end = TRUE, - sub_cancel_is_auto = TRUE, - pending_kept_banner = TRUE - WHERE id = $1`, signupRes.UserID) - require.NoError(t, err) - loginRes, err := b.Login(ctx, backend.LoginParams{ Email: "substate@example.com", Password: "testpassword123", @@ -515,11 +506,30 @@ func TestAuthenticateSession_ProjectsSubState(t *testing.T) { tokenHash := auth.HashSessionToken(loginRes.Token) + // Assert schema defaults: all three booleans must be FALSE before mutation. user, err := b.AuthenticateSession(ctx, tokenHash) require.NoError(t, err) + require.False(t, user.SubCancelAtPeriodEnd) + require.False(t, user.SubCancelIsAuto) + require.False(t, user.PendingKeptBanner) + + // Raw SQL: SetUserAutoCancelState doesn't touch pending_kept_banner, + // and we want to assert all three projections at once. + _, err = b.Pool().Exec(ctx, ` + UPDATE users + SET sub_cancel_at_period_end = TRUE, + sub_cancel_is_auto = TRUE, + pending_kept_banner = TRUE + WHERE id = $1`, signupRes.UserID) + require.NoError(t, err) + + // AuthenticateSession re-reads the user row on every call — same token is valid. + user, err = b.AuthenticateSession(ctx, tokenHash) + require.NoError(t, err) require.True(t, user.SubCancelAtPeriodEnd) require.True(t, user.SubCancelIsAuto) require.True(t, user.PendingKeptBanner) + require.Equal(t, "substate@example.com", user.Email) } func TestResetPassword_InvalidToken(t *testing.T) { From c9513cf79eca6dd77fd7b831201705104ddc5c04 Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Fri, 8 May 2026 00:49:26 -0400 Subject: [PATCH 14/37] idleunsub: HMAC keep-token sign/verify --- internal/feat/idleunsub/token.go | 90 +++++++++++++++++++++++++++ internal/feat/idleunsub/token_test.go | 85 +++++++++++++++++++++++++ 2 files changed, 175 insertions(+) create mode 100644 internal/feat/idleunsub/token.go create mode 100644 internal/feat/idleunsub/token_test.go diff --git a/internal/feat/idleunsub/token.go b/internal/feat/idleunsub/token.go new file mode 100644 index 00000000..69735e46 --- /dev/null +++ b/internal/feat/idleunsub/token.go @@ -0,0 +1,90 @@ +package idleunsub + +import ( + "crypto/hmac" + "crypto/sha256" + "encoding/base64" + "encoding/json" + "errors" + "strings" + "time" + + "github.com/google/uuid" +) + +var ( + ErrTokenInvalid = errors.New("idleunsub: token invalid") + ErrTokenExpired = errors.New("idleunsub: token expired") +) + +// KeepTokenClaims is the canonical signed payload. Field order is fixed by +// struct definition; encoding/json over a typed struct produces a stable +// serialization (no map iteration order ambiguity). +type KeepTokenClaims struct { + UserID uuid.UUID `json:"user_id"` + SubscriptionID string `json:"subscription_id"` + Action string `json:"action"` // "keep_subscription" + CurrentPeriodEnd time.Time `json:"current_period_end"` + IssuedAt int64 `json:"iat"` + ExpiresAt int64 `json:"exp"` // INVARIANT: == CurrentPeriodEnd.Unix() +} + +// TokenSigner signs and verifies KeepTokenClaims with HMAC-SHA256. +type TokenSigner struct { + key []byte + now func() time.Time +} + +// NewTokenSigner constructs a signer with the given HMAC key. +func NewTokenSigner(key []byte) *TokenSigner { + return &TokenSigner{key: key, now: func() time.Time { return time.Now() }} +} + +// Sign serializes and HMAC-signs the claims, returning a URL-safe base64 +// string of the form ".". +func (s *TokenSigner) Sign(c KeepTokenClaims) string { + body, _ := json.Marshal(c) // typed struct never errors + bodyB64 := base64.RawURLEncoding.EncodeToString(body) + sig := hmac.New(sha256.New, s.key) + sig.Write([]byte(bodyB64)) + sigB64 := base64.RawURLEncoding.EncodeToString(sig.Sum(nil)) + return bodyB64 + "." + sigB64 +} + +// Verify parses and validates a token. Returns the claims or an error. +// - ErrTokenInvalid: bad signature, malformed, or invariant violated +// - ErrTokenExpired: signature valid but exp < now +func (s *TokenSigner) Verify(tok string) (KeepTokenClaims, error) { + var zero KeepTokenClaims + parts := strings.SplitN(tok, ".", 2) + if len(parts) != 2 { + return zero, ErrTokenInvalid + } + bodyB64, sigB64 := parts[0], parts[1] + + expectedSig := hmac.New(sha256.New, s.key) + expectedSig.Write([]byte(bodyB64)) + givenSig, err := base64.RawURLEncoding.DecodeString(sigB64) + if err != nil || !hmac.Equal(givenSig, expectedSig.Sum(nil)) { + return zero, ErrTokenInvalid + } + + body, err := base64.RawURLEncoding.DecodeString(bodyB64) + if err != nil { + return zero, ErrTokenInvalid + } + var c KeepTokenClaims + if err := json.Unmarshal(body, &c); err != nil { + return zero, ErrTokenInvalid + } + + // Spec invariant: ExpiresAt must equal CurrentPeriodEnd.Unix(). + if c.ExpiresAt != c.CurrentPeriodEnd.Unix() { + return zero, ErrTokenInvalid + } + + if s.now().Unix() >= c.ExpiresAt { + return zero, ErrTokenExpired + } + return c, nil +} diff --git a/internal/feat/idleunsub/token_test.go b/internal/feat/idleunsub/token_test.go new file mode 100644 index 00000000..3fc7e360 --- /dev/null +++ b/internal/feat/idleunsub/token_test.go @@ -0,0 +1,85 @@ +package idleunsub_test + +import ( + "crypto/rand" + "testing" + "time" + + "github.com/google/uuid" + "github.com/stretchr/testify/require" + + "github.com/btc/drill/internal/feat/idleunsub" +) + +func newSigner(t *testing.T) *idleunsub.TokenSigner { + t.Helper() + key := make([]byte, 32) + _, err := rand.Read(key) + require.NoError(t, err) + return idleunsub.NewTokenSigner(key) +} + +func TestSignVerify_RoundTrip(t *testing.T) { + t.Parallel() + s := newSigner(t) + now := time.Now().UTC().Truncate(time.Second) + end := now.Add(7 * 24 * time.Hour) + + claims := idleunsub.KeepTokenClaims{ + UserID: uuid.New(), + SubscriptionID: "sub_123", + Action: "keep_subscription", + CurrentPeriodEnd: end, + IssuedAt: now.Unix(), + ExpiresAt: end.Unix(), + } + + tok := s.Sign(claims) + got, err := s.Verify(tok) + require.NoError(t, err) + require.Equal(t, claims.UserID, got.UserID) + require.Equal(t, claims.SubscriptionID, got.SubscriptionID) + require.Equal(t, claims.Action, got.Action) + require.True(t, claims.CurrentPeriodEnd.Equal(got.CurrentPeriodEnd)) + require.Equal(t, claims.ExpiresAt, got.ExpiresAt) +} + +func TestVerify_RejectsTampered(t *testing.T) { + t.Parallel() + s := newSigner(t) + tok := s.Sign(idleunsub.KeepTokenClaims{ + UserID: uuid.New(), SubscriptionID: "sub_x", Action: "keep_subscription", + CurrentPeriodEnd: time.Now().Add(time.Hour).UTC(), ExpiresAt: time.Now().Add(time.Hour).Unix(), + }) + tampered := tok[:len(tok)-2] + "AA" // mutate last two characters + _, err := s.Verify(tampered) + require.ErrorIs(t, err, idleunsub.ErrTokenInvalid) +} + +func TestVerify_RejectsExpired(t *testing.T) { + t.Parallel() + s := newSigner(t) + past := time.Now().Add(-time.Hour).UTC().Truncate(time.Second) + tok := s.Sign(idleunsub.KeepTokenClaims{ + UserID: uuid.New(), SubscriptionID: "sub_x", Action: "keep_subscription", + CurrentPeriodEnd: past, IssuedAt: past.Add(-24 * time.Hour).Unix(), ExpiresAt: past.Unix(), + }) + _, err := s.Verify(tok) + require.ErrorIs(t, err, idleunsub.ErrTokenExpired) +} + +func TestVerify_RejectsExpDriftFromPeriodEnd(t *testing.T) { + t.Parallel() + s := newSigner(t) + end := time.Now().Add(time.Hour).UTC().Truncate(time.Second) + + claims := idleunsub.KeepTokenClaims{ + UserID: uuid.New(), SubscriptionID: "sub_x", Action: "keep_subscription", + CurrentPeriodEnd: end, + IssuedAt: time.Now().Unix(), + ExpiresAt: end.Add(24 * time.Hour).Unix(), // drift! + } + tok := s.Sign(claims) + _, err := s.Verify(tok) + require.ErrorIs(t, err, idleunsub.ErrTokenInvalid) +} From 80e1ff85f4d60927616924ab60bba21f89077571 Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Fri, 8 May 2026 00:54:05 -0400 Subject: [PATCH 15/37] idleunsub: tighten token verify test coverage --- internal/feat/idleunsub/token_test.go | 38 +++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/internal/feat/idleunsub/token_test.go b/internal/feat/idleunsub/token_test.go index 3fc7e360..edabe775 100644 --- a/internal/feat/idleunsub/token_test.go +++ b/internal/feat/idleunsub/token_test.go @@ -42,6 +42,44 @@ func TestSignVerify_RoundTrip(t *testing.T) { require.Equal(t, claims.Action, got.Action) require.True(t, claims.CurrentPeriodEnd.Equal(got.CurrentPeriodEnd)) require.Equal(t, claims.ExpiresAt, got.ExpiresAt) + require.Equal(t, claims.IssuedAt, got.IssuedAt) +} + +func TestVerify_RejectsMalformed(t *testing.T) { + t.Parallel() + s := newSigner(t) + cases := []struct { + name string + tok string + }{ + {"empty", ""}, + {"no_dot", "nodot"}, + {"only_dot", "."}, + {"both_halves_invalid_base64", "!!.@@"}, + {"sig_invalid_base64", "AAAA.!!"}, + } + for _, tc := range cases { + tc := tc + t.Run(tc.name, func(t *testing.T) { + t.Parallel() + _, err := s.Verify(tc.tok) + require.ErrorIs(t, err, idleunsub.ErrTokenInvalid) + }) + } +} + +func TestVerify_RejectsWrongKey(t *testing.T) { + t.Parallel() + signerA := newSigner(t) + signerB := newSigner(t) // different random key + end := time.Now().Add(time.Hour).UTC().Truncate(time.Second) + + tok := signerA.Sign(idleunsub.KeepTokenClaims{ + UserID: uuid.New(), SubscriptionID: "sub_x", Action: "keep_subscription", + CurrentPeriodEnd: end, IssuedAt: time.Now().Unix(), ExpiresAt: end.Unix(), + }) + _, err := signerB.Verify(tok) + require.ErrorIs(t, err, idleunsub.ErrTokenInvalid) } func TestVerify_RejectsTampered(t *testing.T) { From 2dab2f38ba8403a802c7d7e4e000a1b6a4f48d06 Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Fri, 8 May 2026 01:12:30 -0400 Subject: [PATCH 16/37] idleunsub: HandleInvoiceUpcoming + transactional cancel decision MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Service skeleton plus the cancel-decision flow per spec §4.1/4.2: fetch sub from Stripe, dedup by webhook event_id, period-key dedup against prior subscription_kept/auto_canceled rows, evaluate the idle predicate (MAX(last_active) < periodStart - one interval) under a per-user row lock, write the audit row + cache flags inside a tx, then call UpdateSubscriptionCancel(idempotency_key=event.ID) outside the tx. Tests cover the happy path plus eight early-return shapes (trialing, already-canceled, grandfathered, retry-storm dedup, post-keep dedup, no-activity-history, still-active, first-period grace). They use a fake StripeClient and a per-test fresh Postgres via the package's testutil.SharedPostgres TestMain. Adds a GetUserByStripeCustomer query to map Stripe customers to users on the webhook path (nullable text param matches the column). --- internal/db/querier.go | 2 + internal/db/users.sql.go | 32 ++ internal/feat/idleunsub/idleunsub.go | 281 ++++++++++++++++++ internal/feat/idleunsub/idleunsub_test.go | 321 +++++++++++++++++++++ internal/feat/idleunsub/main_test.go | 14 + internal/feat/idleunsub/metadata.go | 23 ++ internal/feat/idleunsub/metrics.go | 56 ++++ internal/feat/idleunsub/stripestub_test.go | 58 ++++ sql/queries/users.sql | 5 + 9 files changed, 792 insertions(+) create mode 100644 internal/feat/idleunsub/idleunsub.go create mode 100644 internal/feat/idleunsub/idleunsub_test.go create mode 100644 internal/feat/idleunsub/main_test.go create mode 100644 internal/feat/idleunsub/metadata.go create mode 100644 internal/feat/idleunsub/metrics.go create mode 100644 internal/feat/idleunsub/stripestub_test.go diff --git a/internal/db/querier.go b/internal/db/querier.go index 9ffb79cd..aab84101 100644 --- a/internal/db/querier.go +++ b/internal/db/querier.go @@ -107,6 +107,8 @@ type Querier interface { GetUserByEmailIncludingDeleted(ctx context.Context, email string) (User, error) GetUserByID(ctx context.Context, id uuid.UUID) (User, error) GetUserByIDIncludingDeleted(ctx context.Context, id uuid.UUID) (User, error) + // Used by idleunsub.HandleInvoiceUpcoming to map a Stripe customer to our user. + GetUserByStripeCustomer(ctx context.Context, stripeCustomerID pgtype.Text) (User, error) // Reads across all history (including revoked sessions). Returns NULL // when the user has no auth_sessions rows. // diff --git a/internal/db/users.sql.go b/internal/db/users.sql.go index 229e0e94..a52b19cb 100644 --- a/internal/db/users.sql.go +++ b/internal/db/users.sql.go @@ -388,6 +388,38 @@ func (q *Queries) GetUserByIDIncludingDeleted(ctx context.Context, id uuid.UUID) return i, err } +const getUserByStripeCustomer = `-- name: GetUserByStripeCustomer :one +SELECT id, email, email_verified, password_hash, display_name, role, stripe_customer_id, plan, created_at, updated_at, deleted_at, free_full_educators_used, stripe_subscription_id, sub_cancel_at_period_end, sub_cancel_is_auto, sub_current_period_start, pending_kept_banner, idle_eligible_after FROM users +WHERE stripe_customer_id = $1 AND deleted_at IS NULL +` + +// Used by idleunsub.HandleInvoiceUpcoming to map a Stripe customer to our user. +func (q *Queries) GetUserByStripeCustomer(ctx context.Context, stripeCustomerID pgtype.Text) (User, error) { + row := q.db.QueryRow(ctx, getUserByStripeCustomer, stripeCustomerID) + var i User + err := row.Scan( + &i.ID, + &i.Email, + &i.EmailVerified, + &i.PasswordHash, + &i.DisplayName, + &i.Role, + &i.StripeCustomerID, + &i.Plan, + &i.CreatedAt, + &i.UpdatedAt, + &i.DeletedAt, + &i.FreeFullEducatorsUsed, + &i.StripeSubscriptionID, + &i.SubCancelAtPeriodEnd, + &i.SubCancelIsAuto, + &i.SubCurrentPeriodStart, + &i.PendingKeptBanner, + &i.IdleEligibleAfter, + ) + return i, err +} + const incrementFreeEducatorUsed = `-- name: IncrementFreeEducatorUsed :one UPDATE users SET free_full_educators_used = free_full_educators_used + 1 diff --git a/internal/feat/idleunsub/idleunsub.go b/internal/feat/idleunsub/idleunsub.go new file mode 100644 index 00000000..28a9aa87 --- /dev/null +++ b/internal/feat/idleunsub/idleunsub.go @@ -0,0 +1,281 @@ +// Package idleunsub owns the idle-auto-cancel logic: evaluate the cancel +// trigger on Stripe's invoice.upcoming webhook, sign keep-tokens, and +// auto-reverse a pending cancel when activity resumes. +package idleunsub + +import ( + "context" + "encoding/json" + "errors" + "fmt" + "log/slog" + "time" + + "github.com/jackc/pgx/v5" + "github.com/jackc/pgx/v5/pgtype" + "github.com/jackc/pgx/v5/pgxpool" + stripe "github.com/stripe/stripe-go/v82" + + "github.com/btc/drill/internal/db" + "github.com/btc/drill/internal/email" +) + +// StripeClient is the surface idleunsub needs from Stripe. Production wires +// this to wrappers around stripe-go's package-level functions; tests inject +// an in-memory fake. +type StripeClient interface { + GetSubscription(ctx context.Context, id string) (*stripe.Subscription, error) + UpdateSubscriptionCancel(ctx context.Context, id string, cancelAtPeriodEnd bool, idempotencyKey string) (*stripe.Subscription, error) +} + +// Service owns the cancel/keep/auto-reverse logic. +type Service struct { + pool *pgxpool.Pool + stripe StripeClient + mailer email.Sender + signer *TokenSigner + now func() time.Time + log *slog.Logger +} + +// NewService constructs a Service. The signer may be nil for tests that +// only exercise HandleInvoiceUpcoming (Task 7); Tasks 8/9 wire it in. +func NewService(pool *pgxpool.Pool, sc StripeClient, m email.Sender, sn *TokenSigner, log *slog.Logger) *Service { + if log == nil { + log = slog.Default() + } + return &Service{pool: pool, stripe: sc, mailer: m, signer: sn, now: time.Now, log: log} +} + +// HandleInvoiceUpcoming evaluates the trigger rule. Idempotent: safe to call +// multiple times for the same Stripe event. +func (s *Service) HandleInvoiceUpcoming(ctx context.Context, event stripe.Event) error { + // Extract subscription ID from the invoice.upcoming event payload. + subID := event.GetObjectValue("subscription") + if subID == "" { + s.log.Warn("invoice.upcoming missing subscription", "event_id", event.ID) + return nil + } + + // 1. Fetch the subscription (authoritative source for current period and status). + sub, err := s.stripe.GetSubscription(ctx, subID) + if err != nil { + mCancelError(ctx, "stripe_get_failed") + return fmt.Errorf("get subscription %s: %w", subID, err) + } + + // 2. Status / state early returns. + if sub.Status != stripe.SubscriptionStatusActive { + s.log.Debug("skip non_active_status", "sub_id", subID, "status", sub.Status) + mCancelSkipped(ctx, "non_active_status") + return nil + } + if sub.CancelAtPeriodEnd { + s.log.Debug("skip already_canceled", "sub_id", subID) + mCancelSkipped(ctx, "already_canceled") + return nil + } + if sub.Items == nil || len(sub.Items.Data) == 0 { + s.log.Warn("subscription has no items", "sub_id", subID) + return nil + } + item := sub.Items.Data[0] + if item.Price == nil || item.Price.Recurring == nil { + s.log.Warn("subscription item missing price.recurring", "sub_id", subID) + return nil + } + periodStart := time.Unix(item.CurrentPeriodStart, 0).UTC() + periodEnd := time.Unix(item.CurrentPeriodEnd, 0).UTC() + threshold := subtractInterval(periodStart, item.Price.Recurring.Interval) + + // 3. Look up our user by stripe customer ID. + if sub.Customer == nil || sub.Customer.ID == "" { + s.log.Warn("subscription missing customer", "sub_id", subID) + return nil + } + custID := sub.Customer.ID + q := db.New(s.pool) + user, err := q.GetUserByStripeCustomer(ctx, pgxText(custID)) + if err != nil { + if errors.Is(err, pgx.ErrNoRows) { + s.log.Warn("no user for stripe customer", "customer_id", custID, "sub_id", subID) + return nil + } + return fmt.Errorf("get user by stripe customer %s: %w", custID, err) + } + + // 4. Begin the cancel transaction. + tx, err := s.pool.BeginTx(ctx, pgx.TxOptions{}) + if err != nil { + mCancelError(ctx, "db_begin_failed") + return fmt.Errorf("begin tx: %w", err) + } + defer tx.Rollback(ctx) //nolint:errcheck // rollback after commit is a no-op + + qtx := db.New(tx) + + // 4a. Per-user mutex. + if _, err := qtx.LockUserForSubDecision(ctx, user.ID); err != nil { + mCancelError(ctx, "db_lock_failed") + return fmt.Errorf("lock user: %w", err) + } + + // 4b. Webhook event dedup (retry-storm protection). + if _, err := qtx.TryClaimWebhookEvent(ctx, db.TryClaimWebhookEventParams{ + EventID: event.ID, EventType: string(event.Type), + }); err != nil { + if errors.Is(err, pgx.ErrNoRows) { + s.log.Debug("skip duplicate_event", "event_id", event.ID) + mCancelSkipped(ctx, "duplicate_event") + return nil + } + mCancelError(ctx, "db_dedup_failed") + return fmt.Errorf("claim webhook event: %w", err) + } + + // 4c. Period-keyed dedup queries. + mostRecent, err := qtx.GetMostRecentKeptOrCanceledForPeriod(ctx, + db.GetMostRecentKeptOrCanceledForPeriodParams{ + UserID: user.ID, + SubscriptionID: subID, + CurrentPeriodStart: periodStart, + }) + if err != nil && !errors.Is(err, pgx.ErrNoRows) { + mCancelError(ctx, "db_dedup_query_failed") + return fmt.Errorf("most recent decision: %w", err) + } + if mostRecent == "subscription_kept" { + s.log.Debug("skip already_kept_this_period", "user_id", user.ID, "sub_id", subID) + mCancelSkipped(ctx, "already_kept_this_period") + return nil + } + hasCanceled, err := qtx.HasAutoCanceledThisPeriod(ctx, + db.HasAutoCanceledThisPeriodParams{ + UserID: user.ID, + SubscriptionID: subID, + CurrentPeriodStart: periodStart, + }) + if err != nil { + mCancelError(ctx, "db_dedup_query_failed") + return fmt.Errorf("has auto canceled: %w", err) + } + if hasCanceled { + s.log.Debug("skip already_canceled_this_period", "user_id", user.ID, "sub_id", subID) + mCancelSkipped(ctx, "already_canceled_this_period") + return nil + } + + // 4d. Read state and compute trigger. + lastActive, err := qtx.GetUserLastActive(ctx, user.ID) + if err != nil { + return fmt.Errorf("get last active: %w", err) + } + if !lastActive.Valid { + s.log.Debug("skip no_activity_history", "user_id", user.ID) + mCancelSkipped(ctx, "no_activity_history") + return nil + } + if !lastActive.Time.Before(threshold) { + s.log.Debug("skip active_in_window", + "user_id", user.ID, "last_active", lastActive.Time, "threshold", threshold) + mCancelSkipped(ctx, "active_in_window") + return nil + } + if periodStart.Before(user.IdleEligibleAfter) { + s.log.Debug("skip grandfathered", + "user_id", user.ID, + "period_start", periodStart, "idle_eligible_after", user.IdleEligibleAfter) + mCancelSkipped(ctx, "grandfathered") + return nil + } + + // 5. Trigger fires: insert audit row + cache update inside the TX. + mdJSON, err := marshalCancelMetadata(subID, event.ID, periodStart, periodEnd) + if err != nil { + return fmt.Errorf("marshal cancel metadata: %w", err) + } + if err := qtx.InsertSubscriptionAutoCanceledEvent(ctx, + db.InsertSubscriptionAutoCanceledEventParams{ + UserID: user.ID, Metadata: mdJSON, + }); err != nil { + return fmt.Errorf("insert event row: %w", err) + } + if err := qtx.SetUserAutoCancelState(ctx, db.SetUserAutoCancelStateParams{ + ID: user.ID, + SubCurrentPeriodStart: pgxTime(periodStart), + }); err != nil { + return fmt.Errorf("set cache: %w", err) + } + if err := tx.Commit(ctx); err != nil { + mCancelError(ctx, "db_commit_failed") + return fmt.Errorf("commit: %w", err) + } + + // 6. Stripe call OUTSIDE the transaction. Idempotency key keeps retries safe. + if _, err := s.stripe.UpdateSubscriptionCancel(ctx, subID, true, event.ID); err != nil { + mCancelError(ctx, "stripe_update_failed") + s.log.Error("stripe update failed after commit", + "sub_id", subID, "event_id", event.ID, "err", err) + // Cache is now ahead of Stripe. AutoReverse re-checks Stripe state, + // so this drift will self-heal on the user's next authed request. + return fmt.Errorf("stripe update: %w", err) + } + + mCancelFired(ctx, subID) + + // 7. Enqueue cancel email. + if err := s.enqueueCancelEmail(ctx, user, subID, periodEnd); err != nil { + mEmailEnqueue(ctx, "cancel", "error") + s.log.Error("cancel email enqueue failed", "user_id", user.ID, "err", err) + } else { + mEmailEnqueue(ctx, "cancel", "ok") + } + return nil +} + +// subtractInterval returns t minus one Stripe billing interval. +func subtractInterval(t time.Time, interval stripe.PriceRecurringInterval) time.Time { + switch interval { + case stripe.PriceRecurringIntervalDay: + return t.AddDate(0, 0, -1) + case stripe.PriceRecurringIntervalWeek: + return t.AddDate(0, 0, -7) + case stripe.PriceRecurringIntervalMonth: + return t.AddDate(0, -1, 0) + case stripe.PriceRecurringIntervalYear: + return t.AddDate(-1, 0, 0) + default: + return t.AddDate(0, -1, 0) // safe default + } +} + +func marshalCancelMetadata(subID, eventID string, start, end time.Time) ([]byte, error) { + return json.Marshal(cancelMetadata{ + SubscriptionID: subID, + StripeEventID: eventID, + CurrentPeriodStart: start.UTC(), + CurrentPeriodEnd: end.UTC(), + }) +} + +// enqueueCancelEmail composes and enqueues the cancel email. +// Implementation lands in Task 9 alongside the templates; for now this is a +// stub so the cancel path compiles. Task 9 replaces it with the real impl. +func (s *Service) enqueueCancelEmail(ctx context.Context, user db.User, subID string, periodEnd time.Time) error { + _ = ctx + _ = user + _ = subID + _ = periodEnd + return nil // placeholder; replaced in Task 9 +} + +// pgxText / pgxTime are local pgtype constructors. Inlined here because the +// project does not yet have a shared helper for these wrappers; if/when it +// does, swap to that and delete these. +func pgxText(s string) pgtype.Text { + return pgtype.Text{String: s, Valid: true} +} + +func pgxTime(t time.Time) pgtype.Timestamptz { + return pgtype.Timestamptz{Time: t, Valid: true} +} diff --git a/internal/feat/idleunsub/idleunsub_test.go b/internal/feat/idleunsub/idleunsub_test.go new file mode 100644 index 00000000..ff5ac00d --- /dev/null +++ b/internal/feat/idleunsub/idleunsub_test.go @@ -0,0 +1,321 @@ +package idleunsub_test + +import ( + "context" + "io" + "log/slog" + "os" + "testing" + "time" + + "github.com/google/uuid" + "github.com/stretchr/testify/require" + stripe "github.com/stripe/stripe-go/v82" + + "github.com/btc/drill/internal/backend" + "github.com/btc/drill/internal/backendtest" + "github.com/btc/drill/internal/feat/idleunsub" +) + +// silentLogger discards log output. Tests assert on DB state, not logs. +// Set IDLEUNSUB_TEST_DEBUG=1 to surface internal log lines while debugging. +func silentLogger() *slog.Logger { + if os.Getenv("IDLEUNSUB_TEST_DEBUG") == "1" { + return slog.New(slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: slog.LevelDebug})) + } + return slog.New(slog.NewTextHandler(io.Discard, nil)) +} + +// fixture bundles the backend + Stripe stub + service for a single test. +// Tests mutate Sub or per-user DB state before calling Svc.HandleInvoiceUpcoming. +type fixture struct { + B *backend.Backend + Ctx context.Context + UserID uuid.UUID + CustID string + SubID string + PeriodStart time.Time + PeriodEnd time.Time + Sub *stripe.Subscription + Fake *fakeStripe + Svc *idleunsub.Service + Event stripe.Event +} + +// setupHappyPath spins up a fresh backend, seeds a Pro user with a Stripe +// customer mapping, force-ages their last_active to 3 months ago, and +// constructs a fake Stripe Subscription whose period started 23 days ago +// (= idle for more than one monthly interval). +func setupHappyPath(t *testing.T) *fixture { + t.Helper() + b := pg.NewBackend(t) + ctx := context.Background() + userID := backendtest.SeedUser(t, b) + custID := "cus_test_" + uuid.NewString()[:8] + subID := "sub_test_" + uuid.NewString()[:8] + + // Wire user to the Stripe customer ID and grandfather them in + // (idle_eligible_after far in the past). + _, err := b.Pool().Exec(ctx, ` + UPDATE users + SET stripe_customer_id = $1, + idle_eligible_after = NOW() - INTERVAL '6 months', + plan = 'pro' + WHERE id = $2`, custID, userID) + require.NoError(t, err) + + // Insert an auth_sessions row whose last_active is 3 months ago. + // SeedUser does not create one — Signup() only creates the user — so + // we need to write one explicitly. token_hash must be unique. + _, err = b.Pool().Exec(ctx, ` + INSERT INTO auth_sessions (user_id, token_hash, expires_at, last_active) + VALUES ($1, $2, NOW() + INTERVAL '30 days', NOW() - INTERVAL '3 months')`, + userID, "tok_test_"+uuid.NewString()) + require.NoError(t, err) + + // Build the fake Stripe subscription. Monthly billing; period started + // 23 days ago, ends ~7 days from now. Threshold (= periodStart - 1mo) + // sits ~53 days ago, so a 3-months-ago last_active is "before" it. + now := time.Now().UTC().Truncate(time.Second) + periodStart := now.AddDate(0, 0, -23) + periodEnd := periodStart.AddDate(0, 1, 0) + sub := &stripe.Subscription{ + ID: subID, + Status: stripe.SubscriptionStatusActive, + CancelAtPeriodEnd: false, + Customer: &stripe.Customer{ID: custID}, + Items: &stripe.SubscriptionItemList{Data: []*stripe.SubscriptionItem{{ + CurrentPeriodStart: periodStart.Unix(), + CurrentPeriodEnd: periodEnd.Unix(), + Price: &stripe.Price{Recurring: &stripe.PriceRecurring{ + Interval: stripe.PriceRecurringIntervalMonth, + }}, + }}}, + } + fake := &fakeStripe{subs: map[string]*stripe.Subscription{subID: sub}} + + svc := idleunsub.NewService(b.Pool(), fake, nullMailer{}, nil, silentLogger()) + + event := stripe.Event{ + ID: "evt_" + uuid.NewString()[:8], + Type: "invoice.upcoming", + Data: &stripe.EventData{Object: map[string]interface{}{ + "subscription": subID, + }}, + } + + return &fixture{ + B: b, + Ctx: ctx, + UserID: userID, + CustID: custID, + SubID: subID, + PeriodStart: periodStart, + PeriodEnd: periodEnd, + Sub: sub, + Fake: fake, + Svc: svc, + Event: event, + } +} + +// requireNoCancel asserts no Stripe update fired and no audit row was +// written. Used by every early-return test below. +func requireNoCancel(t *testing.T, fx *fixture) { + t.Helper() + require.Empty(t, fx.Fake.updateCalls, "expected no Stripe update calls") + var count int + err := fx.B.Pool().QueryRow(fx.Ctx, ` + SELECT COUNT(*) FROM user_events + WHERE user_id = $1 AND event_type = 'subscription_auto_canceled'`, + fx.UserID).Scan(&count) + require.NoError(t, err) + require.Equal(t, 0, count, "expected zero subscription_auto_canceled events") + + var cancelAtEnd, isAuto bool + err = fx.B.Pool().QueryRow(fx.Ctx, + `SELECT sub_cancel_at_period_end, sub_cancel_is_auto FROM users WHERE id = $1`, + fx.UserID).Scan(&cancelAtEnd, &isAuto) + require.NoError(t, err) + require.False(t, cancelAtEnd, "cache flag sub_cancel_at_period_end must remain false") + require.False(t, isAuto, "cache flag sub_cancel_is_auto must remain false") +} + +// --------------------------------------------------------------------------- +// Happy path +// --------------------------------------------------------------------------- + +func TestHandleInvoiceUpcoming_FiresCancel_WhenIdleTwoPeriods(t *testing.T) { + t.Parallel() + fx := setupHappyPath(t) + + require.NoError(t, fx.Svc.HandleInvoiceUpcoming(fx.Ctx, fx.Event)) + + // Stripe Update was called with cancel_at_period_end=true. + require.Len(t, fx.Fake.updateCalls, 1) + require.True(t, fx.Fake.updateCalls[0].CancelAtPeriodEnd) + require.Equal(t, fx.Event.ID, fx.Fake.updateCalls[0].IdempotencyKey) + + // user_events row exists. + var count int + err := fx.B.Pool().QueryRow(fx.Ctx, ` + SELECT COUNT(*) FROM user_events + WHERE user_id = $1 AND event_type = 'subscription_auto_canceled'`, + fx.UserID).Scan(&count) + require.NoError(t, err) + require.Equal(t, 1, count) + + // Cache flags flipped. + var cancelAtEnd, isAuto bool + err = fx.B.Pool().QueryRow(fx.Ctx, + `SELECT sub_cancel_at_period_end, sub_cancel_is_auto FROM users WHERE id = $1`, + fx.UserID).Scan(&cancelAtEnd, &isAuto) + require.NoError(t, err) + require.True(t, cancelAtEnd) + require.True(t, isAuto) + + // Webhook dedup row claimed. + err = fx.B.Pool().QueryRow(fx.Ctx, + `SELECT COUNT(*) FROM stripe_webhook_dedup WHERE event_id = $1`, fx.Event.ID). + Scan(&count) + require.NoError(t, err) + require.Equal(t, 1, count) +} + +// --------------------------------------------------------------------------- +// Early-return tests +// --------------------------------------------------------------------------- + +func TestHandleInvoiceUpcoming_SkipsTrialing(t *testing.T) { + t.Parallel() + fx := setupHappyPath(t) + fx.Sub.Status = stripe.SubscriptionStatusTrialing + + require.NoError(t, fx.Svc.HandleInvoiceUpcoming(fx.Ctx, fx.Event)) + requireNoCancel(t, fx) +} + +func TestHandleInvoiceUpcoming_SkipsAlreadyCanceled(t *testing.T) { + t.Parallel() + fx := setupHappyPath(t) + fx.Sub.CancelAtPeriodEnd = true + + require.NoError(t, fx.Svc.HandleInvoiceUpcoming(fx.Ctx, fx.Event)) + requireNoCancel(t, fx) +} + +func TestHandleInvoiceUpcoming_SkipsGrandfathered(t *testing.T) { + t.Parallel() + fx := setupHappyPath(t) + // Override idle_eligible_after to the future: user just signed up under + // the new policy and has not yet served a full period of grandfather. + _, err := fx.B.Pool().Exec(fx.Ctx, + `UPDATE users SET idle_eligible_after = NOW() + INTERVAL '1 month' WHERE id = $1`, + fx.UserID) + require.NoError(t, err) + + require.NoError(t, fx.Svc.HandleInvoiceUpcoming(fx.Ctx, fx.Event)) + requireNoCancel(t, fx) +} + +func TestHandleInvoiceUpcoming_DedupesRetryStorm(t *testing.T) { + t.Parallel() + fx := setupHappyPath(t) + + require.NoError(t, fx.Svc.HandleInvoiceUpcoming(fx.Ctx, fx.Event)) + require.NoError(t, fx.Svc.HandleInvoiceUpcoming(fx.Ctx, fx.Event)) + + // Only one update call total — the second invocation hit the webhook dedup. + require.Len(t, fx.Fake.updateCalls, 1) + + // Exactly one auto-cancel event row. + var count int + err := fx.B.Pool().QueryRow(fx.Ctx, ` + SELECT COUNT(*) FROM user_events + WHERE user_id = $1 AND event_type = 'subscription_auto_canceled'`, + fx.UserID).Scan(&count) + require.NoError(t, err) + require.Equal(t, 1, count) + + // One webhook dedup row. + err = fx.B.Pool().QueryRow(fx.Ctx, + `SELECT COUNT(*) FROM stripe_webhook_dedup WHERE event_id = $1`, fx.Event.ID). + Scan(&count) + require.NoError(t, err) + require.Equal(t, 1, count) +} + +func TestHandleInvoiceUpcoming_DedupesPostKeep(t *testing.T) { + t.Parallel() + fx := setupHappyPath(t) + + // Pre-insert a subscription_kept row for THIS sub + period. The user + // has already kept their sub for this period; we must not re-cancel + // even when a fresh invoice.upcoming arrives. + mdJSON := `{` + + `"subscription_id":"` + fx.SubID + `",` + + `"via":"link",` + + `"current_period_start":"` + fx.PeriodStart.UTC().Format(time.RFC3339Nano) + `"` + + `}` + _, err := fx.B.Pool().Exec(fx.Ctx, ` + INSERT INTO user_events (user_id, event_type, metadata) + VALUES ($1, 'subscription_kept', $2::jsonb)`, fx.UserID, mdJSON) + require.NoError(t, err) + + require.NoError(t, fx.Svc.HandleInvoiceUpcoming(fx.Ctx, fx.Event)) + + // No update call. + require.Empty(t, fx.Fake.updateCalls) + // No new auto-cancel rows. + var count int + err = fx.B.Pool().QueryRow(fx.Ctx, ` + SELECT COUNT(*) FROM user_events + WHERE user_id = $1 AND event_type = 'subscription_auto_canceled'`, + fx.UserID).Scan(&count) + require.NoError(t, err) + require.Equal(t, 0, count) +} + +func TestHandleInvoiceUpcoming_NoActivityHistory(t *testing.T) { + t.Parallel() + fx := setupHappyPath(t) + + // Wipe auth_sessions for this user — they signed up but never + // established a session. Defensive path per spec §8.1. + _, err := fx.B.Pool().Exec(fx.Ctx, + `DELETE FROM auth_sessions WHERE user_id = $1`, fx.UserID) + require.NoError(t, err) + + require.NoError(t, fx.Svc.HandleInvoiceUpcoming(fx.Ctx, fx.Event)) + requireNoCancel(t, fx) +} + +func TestHandleInvoiceUpcoming_NotIdleStill(t *testing.T) { + t.Parallel() + fx := setupHappyPath(t) + + // Move last_active to NOW (well within the current period) — user is + // active. Threshold (periodStart - 1mo) is in the past, so an active + // user fails `lastActive < threshold`. + _, err := fx.B.Pool().Exec(fx.Ctx, + `UPDATE auth_sessions SET last_active = NOW() WHERE user_id = $1`, fx.UserID) + require.NoError(t, err) + + require.NoError(t, fx.Svc.HandleInvoiceUpcoming(fx.Ctx, fx.Event)) + requireNoCancel(t, fx) +} + +func TestHandleInvoiceUpcoming_FirstPeriodGrace(t *testing.T) { + t.Parallel() + fx := setupHappyPath(t) + + // User signed up moments ago: idle_eligible_after = NOW(). The current + // period started 23 days ago, so periodStart < idle_eligible_after, + // triggering the grandfather skip ("first-period grace"). + _, err := fx.B.Pool().Exec(fx.Ctx, + `UPDATE users SET idle_eligible_after = NOW() WHERE id = $1`, fx.UserID) + require.NoError(t, err) + + require.NoError(t, fx.Svc.HandleInvoiceUpcoming(fx.Ctx, fx.Event)) + requireNoCancel(t, fx) +} diff --git a/internal/feat/idleunsub/main_test.go b/internal/feat/idleunsub/main_test.go new file mode 100644 index 00000000..a10e137f --- /dev/null +++ b/internal/feat/idleunsub/main_test.go @@ -0,0 +1,14 @@ +package idleunsub_test + +import ( + "testing" + + "github.com/btc/drill/internal/testutil" +) + +var pg testutil.PG + +func TestMain(m *testing.M) { + pg = testutil.SharedPostgres() + pg.RunTests(m) +} diff --git a/internal/feat/idleunsub/metadata.go b/internal/feat/idleunsub/metadata.go new file mode 100644 index 00000000..b9b01549 --- /dev/null +++ b/internal/feat/idleunsub/metadata.go @@ -0,0 +1,23 @@ +package idleunsub + +import "time" + +// All timestamps are canonicalized to second precision via +// time.Unix(stripeInt64, 0).UTC() before assignment. Stripe period fields +// arrive as int64 Unix seconds; explicit second precision and UTC guard +// against future drift if a code path ever constructs a time.Time from a +// different source. JSON encoding via encoding/json produces RFC3339 +// ("...Z") which Postgres ::timestamptz parses reliably. + +type cancelMetadata struct { + SubscriptionID string `json:"subscription_id"` + StripeEventID string `json:"stripe_event_id"` + CurrentPeriodStart time.Time `json:"current_period_start"` + CurrentPeriodEnd time.Time `json:"current_period_end"` +} + +type keptMetadata struct { + SubscriptionID string `json:"subscription_id"` + Via string `json:"via"` // "link" | "auto_activity" + CurrentPeriodStart time.Time `json:"current_period_start"` // identifies the period kept +} diff --git a/internal/feat/idleunsub/metrics.go b/internal/feat/idleunsub/metrics.go new file mode 100644 index 00000000..76630fb0 --- /dev/null +++ b/internal/feat/idleunsub/metrics.go @@ -0,0 +1,56 @@ +package idleunsub + +import ( + "context" + + "go.opentelemetry.io/otel" + "go.opentelemetry.io/otel/attribute" + "go.opentelemetry.io/otel/metric" +) + +// otel meter.Int64Counter returns (counter, error) but the contract is that +// errors are reserved for invalid arguments — name/description constants +// here are valid by construction. On error, the API returns a non-nil +// no-op counter, so the discard is safe and Add() never panics. This +// satisfies the project rule "never panic at init time" without forcing +// boilerplate error propagation through main(). +var ( + meter = otel.Meter("idleunsub") + cancelFired, _ = meter.Int64Counter("idleunsub.cancel.fired") + cancelSkipped, _ = meter.Int64Counter("idleunsub.cancel.skipped") + cancelError, _ = meter.Int64Counter("idleunsub.cancel.error") + cacheDrift, _ = meter.Int64Counter("idleunsub.cancel.cache_drift_corrected") + reverseLink, _ = meter.Int64Counter("idleunsub.reverse.link") + reverseAct, _ = meter.Int64Counter("idleunsub.reverse.activity") + emailEnqueue, _ = meter.Int64Counter("idleunsub.email.enqueue") +) + +func mCancelFired(ctx context.Context, subID string) { + cancelFired.Add(ctx, 1, metric.WithAttributes(attribute.String("sub_id", subID))) +} + +func mCancelSkipped(ctx context.Context, reason string) { + cancelSkipped.Add(ctx, 1, metric.WithAttributes(attribute.String("reason", reason))) +} + +func mCancelError(ctx context.Context, reason string) { + cancelError.Add(ctx, 1, metric.WithAttributes(attribute.String("reason", reason))) +} + +func mCacheDriftCorrected(ctx context.Context, subID string) { + cacheDrift.Add(ctx, 1, metric.WithAttributes(attribute.String("sub_id", subID))) +} + +func mReverseLink(ctx context.Context, subID string) { + reverseLink.Add(ctx, 1, metric.WithAttributes(attribute.String("sub_id", subID))) +} + +func mReverseActivity(ctx context.Context, subID string) { + reverseAct.Add(ctx, 1, metric.WithAttributes(attribute.String("sub_id", subID))) +} + +func mEmailEnqueue(ctx context.Context, kind, status string) { + emailEnqueue.Add(ctx, 1, metric.WithAttributes( + attribute.String("kind", kind), + attribute.String("status", status))) +} diff --git a/internal/feat/idleunsub/stripestub_test.go b/internal/feat/idleunsub/stripestub_test.go new file mode 100644 index 00000000..33874081 --- /dev/null +++ b/internal/feat/idleunsub/stripestub_test.go @@ -0,0 +1,58 @@ +package idleunsub_test + +import ( + "context" + "errors" + + stripe "github.com/stripe/stripe-go/v82" + + "github.com/btc/drill/internal/email" + "github.com/btc/drill/internal/feat/idleunsub" +) + +// errStripeNotFound is the sentinel returned by the fake when an unknown +// subscription ID is requested. The tests never assert on its identity; +// it just needs to be non-nil so callers see "fetch failed" semantics. +var errStripeNotFound = errors.New("fakeStripe: subscription not found") + +// fakeStripe is an in-memory StripeClient for tests. +type fakeStripe struct { + subs map[string]*stripe.Subscription + updateCalls []updateCall + updateErr error +} + +type updateCall struct { + ID string + CancelAtPeriodEnd bool + IdempotencyKey string +} + +func (f *fakeStripe) GetSubscription(_ context.Context, id string) (*stripe.Subscription, error) { + if s, ok := f.subs[id]; ok { + return s, nil + } + return nil, errStripeNotFound +} + +func (f *fakeStripe) UpdateSubscriptionCancel(_ context.Context, id string, cancelAtEnd bool, key string) (*stripe.Subscription, error) { + f.updateCalls = append(f.updateCalls, updateCall{id, cancelAtEnd, key}) + if f.updateErr != nil { + return nil, f.updateErr + } + if s, ok := f.subs[id]; ok { + s.CancelAtPeriodEnd = cancelAtEnd + return s, nil + } + return nil, errStripeNotFound +} + +var _ idleunsub.StripeClient = (*fakeStripe)(nil) + +// nullMailer is an email.Sender that drops all sends. Cancel emails are +// fire-and-forget in HandleInvoiceUpcoming; we only need them not to error. +type nullMailer struct{} + +func (nullMailer) Send(_ context.Context, _ email.Message) error { return nil } + +var _ email.Sender = nullMailer{} diff --git a/sql/queries/users.sql b/sql/queries/users.sql index 44097d21..b8077b8b 100644 --- a/sql/queries/users.sql +++ b/sql/queries/users.sql @@ -11,6 +11,11 @@ WHERE email = $1 AND deleted_at IS NULL; SELECT * FROM users WHERE id = $1 AND deleted_at IS NULL; +-- name: GetUserByStripeCustomer :one +-- Used by idleunsub.HandleInvoiceUpcoming to map a Stripe customer to our user. +SELECT * FROM users +WHERE stripe_customer_id = $1 AND deleted_at IS NULL; + -- name: VerifyUserEmail :exec UPDATE users SET email_verified = TRUE, updated_at = NOW() WHERE id = $1; From 1fd39380ec08b2b4e6be8cfee885a7b386aaaf50 Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Fri, 8 May 2026 01:28:34 -0400 Subject: [PATCH 17/37] idleunsub: address Task 7 code review feedback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add per-symbol nolint:unused directives (with Task 8 rationale) for 7 symbols in metrics.go and metadata.go that are not yet wired up - Honor Price.Recurring.IntervalCount in subtractInterval (was always subtracting exactly 1 interval regardless of billing cycle multiplier) - Add TODO comment on enqueueCancelEmail stub documenting Task 9's required test injection - Strengthen DedupesRetryStorm test: three-step scenario isolates the webhook-dedup gate from the period-dedup gate; documents that evt_2's TryClaimWebhookEvent row rolls back with the TX on period-dedup hit - Happy-path test asserts stripe_subscription_id remains NULL (documents spec §4.1 single-population-path invariant) - Hoist metadata.go's free-floating doc comment to attach to cancelMetadata --- internal/feat/idleunsub/idleunsub.go | 24 ++++++++---- internal/feat/idleunsub/idleunsub_test.go | 48 ++++++++++++++++++++--- internal/feat/idleunsub/metadata.go | 16 ++++---- internal/feat/idleunsub/metrics.go | 12 ++++-- 4 files changed, 77 insertions(+), 23 deletions(-) diff --git a/internal/feat/idleunsub/idleunsub.go b/internal/feat/idleunsub/idleunsub.go index 28a9aa87..1815d938 100644 --- a/internal/feat/idleunsub/idleunsub.go +++ b/internal/feat/idleunsub/idleunsub.go @@ -86,7 +86,7 @@ func (s *Service) HandleInvoiceUpcoming(ctx context.Context, event stripe.Event) } periodStart := time.Unix(item.CurrentPeriodStart, 0).UTC() periodEnd := time.Unix(item.CurrentPeriodEnd, 0).UTC() - threshold := subtractInterval(periodStart, item.Price.Recurring.Interval) + threshold := subtractInterval(periodStart, item.Price.Recurring.Interval, item.Price.Recurring.IntervalCount) // 3. Look up our user by stripe customer ID. if sub.Customer == nil || sub.Customer.ID == "" { @@ -233,19 +233,23 @@ func (s *Service) HandleInvoiceUpcoming(ctx context.Context, event stripe.Event) return nil } -// subtractInterval returns t minus one Stripe billing interval. -func subtractInterval(t time.Time, interval stripe.PriceRecurringInterval) time.Time { +// subtractInterval returns t minus count Stripe billing intervals. +func subtractInterval(t time.Time, interval stripe.PriceRecurringInterval, count int64) time.Time { + if count < 1 { + count = 1 + } + n := int(count) switch interval { case stripe.PriceRecurringIntervalDay: - return t.AddDate(0, 0, -1) + return t.AddDate(0, 0, -n) case stripe.PriceRecurringIntervalWeek: - return t.AddDate(0, 0, -7) + return t.AddDate(0, 0, -7*n) case stripe.PriceRecurringIntervalMonth: - return t.AddDate(0, -1, 0) + return t.AddDate(0, -n, 0) case stripe.PriceRecurringIntervalYear: - return t.AddDate(-1, 0, 0) + return t.AddDate(-n, 0, 0) default: - return t.AddDate(0, -1, 0) // safe default + return t.AddDate(0, -n, 0) // safe default } } @@ -261,6 +265,10 @@ func marshalCancelMetadata(subID, eventID string, start, end time.Time) ([]byte, // enqueueCancelEmail composes and enqueues the cancel email. // Implementation lands in Task 9 alongside the templates; for now this is a // stub so the cancel path compiles. Task 9 replaces it with the real impl. +// +// TODO: Task 9 implements; the error path here is currently unreachable +// (stub always returns nil). Task 9's test must inject an erroring sender +// and assert mEmailEnqueue("cancel", "error") fires + cancel still succeeds. func (s *Service) enqueueCancelEmail(ctx context.Context, user db.User, subID string, periodEnd time.Time) error { _ = ctx _ = user diff --git a/internal/feat/idleunsub/idleunsub_test.go b/internal/feat/idleunsub/idleunsub_test.go index ff5ac00d..16563097 100644 --- a/internal/feat/idleunsub/idleunsub_test.go +++ b/internal/feat/idleunsub/idleunsub_test.go @@ -9,6 +9,7 @@ import ( "time" "github.com/google/uuid" + "github.com/jackc/pgx/v5/pgtype" "github.com/stretchr/testify/require" stripe "github.com/stripe/stripe-go/v82" @@ -180,6 +181,16 @@ func TestHandleInvoiceUpcoming_FiresCancel_WhenIdleTwoPeriods(t *testing.T) { Scan(&count) require.NoError(t, err) require.Equal(t, 1, count) + + // SetUserAutoCancelState does NOT write stripe_subscription_id (spec §4.1 + // designates SyncSubStateFromWebhook as the single population path). + // In this test the webhook hasn't fired, so the column remains NULL. + var subIDCol pgtype.Text + err = fx.B.Pool().QueryRow(fx.Ctx, + `SELECT stripe_subscription_id FROM users WHERE id = $1`, + fx.UserID).Scan(&subIDCol) + require.NoError(t, err) + require.False(t, subIDCol.Valid, "spec invariant: HandleInvoiceUpcoming does not write stripe_subscription_id") } // --------------------------------------------------------------------------- @@ -222,27 +233,54 @@ func TestHandleInvoiceUpcoming_DedupesRetryStorm(t *testing.T) { t.Parallel() fx := setupHappyPath(t) + // Step 1: First call with evt_1 → fires cancel (1 Stripe update call). require.NoError(t, fx.Svc.HandleInvoiceUpcoming(fx.Ctx, fx.Event)) + require.Len(t, fx.Fake.updateCalls, 1) + + // Step 2: Second call with evt_1 → blocked by webhook dedup (same event + // ID); still exactly 1 update call. require.NoError(t, fx.Svc.HandleInvoiceUpcoming(fx.Ctx, fx.Event)) + require.Len(t, fx.Fake.updateCalls, 1) - // Only one update call total — the second invocation hit the webhook dedup. + // Step 3: Third call with evt_2 (different event ID, same period) → + // TryClaimWebhookEvent succeeds (new event), but HasAutoCanceledThisPeriod + // blocks the cancel inside the TX. Still exactly 1 update call. + evt2 := fx.Event + evt2.ID = "evt_" + uuid.NewString()[:8] + require.NoError(t, fx.Svc.HandleInvoiceUpcoming(fx.Ctx, evt2)) require.Len(t, fx.Fake.updateCalls, 1) - // Exactly one auto-cancel event row. + // Exactly one auto-cancel user_events row (only step 1 wrote it). var count int err := fx.B.Pool().QueryRow(fx.Ctx, ` SELECT COUNT(*) FROM user_events WHERE user_id = $1 AND event_type = 'subscription_auto_canceled'`, fx.UserID).Scan(&count) require.NoError(t, err) - require.Equal(t, 1, count) + require.Equal(t, 1, count, "only one auto-cancel event row expected") - // One webhook dedup row. + // evt_1 dedup row persisted (step 1 committed). err = fx.B.Pool().QueryRow(fx.Ctx, `SELECT COUNT(*) FROM stripe_webhook_dedup WHERE event_id = $1`, fx.Event.ID). Scan(&count) require.NoError(t, err) - require.Equal(t, 1, count) + require.Equal(t, 1, count, "evt_1 dedup row persisted from step 1") + + // evt_2 dedup row is NOT persisted: TryClaimWebhookEvent runs inside the + // same TX that gets rolled back (no commit occurs) when + // HasAutoCanceledThisPeriod returns true. The dedup insert is in-flight + // only — it rolls back with the rest of the TX. + err = fx.B.Pool().QueryRow(fx.Ctx, + `SELECT COUNT(*) FROM stripe_webhook_dedup WHERE event_id = $1`, evt2.ID). + Scan(&count) + require.NoError(t, err) + require.Equal(t, 0, count, "evt_2 dedup row rolled back with the TX") + + // Total dedup rows: only evt_1. + err = fx.B.Pool().QueryRow(fx.Ctx, + `SELECT COUNT(*) FROM stripe_webhook_dedup`).Scan(&count) + require.NoError(t, err) + require.Equal(t, 1, count, "only one dedup row persisted total") } func TestHandleInvoiceUpcoming_DedupesPostKeep(t *testing.T) { diff --git a/internal/feat/idleunsub/metadata.go b/internal/feat/idleunsub/metadata.go index b9b01549..3054f541 100644 --- a/internal/feat/idleunsub/metadata.go +++ b/internal/feat/idleunsub/metadata.go @@ -2,13 +2,14 @@ package idleunsub import "time" -// All timestamps are canonicalized to second precision via -// time.Unix(stripeInt64, 0).UTC() before assignment. Stripe period fields -// arrive as int64 Unix seconds; explicit second precision and UTC guard -// against future drift if a code path ever constructs a time.Time from a -// different source. JSON encoding via encoding/json produces RFC3339 -// ("...Z") which Postgres ::timestamptz parses reliably. - +// cancelMetadata is stored as JSONB in user_events.metadata for +// subscription_auto_canceled rows. All timestamps are canonicalized to +// second precision via time.Unix(stripeInt64, 0).UTC() before assignment. +// Stripe period fields arrive as int64 Unix seconds; explicit second +// precision and UTC guard against future drift if a code path ever +// constructs a time.Time from a different source. JSON encoding via +// encoding/json produces RFC3339 ("...Z") which Postgres ::timestamptz +// parses reliably. type cancelMetadata struct { SubscriptionID string `json:"subscription_id"` StripeEventID string `json:"stripe_event_id"` @@ -16,6 +17,7 @@ type cancelMetadata struct { CurrentPeriodEnd time.Time `json:"current_period_end"` } +//nolint:unused // wired up by KeepSubscription/AutoReverse in Task 8 type keptMetadata struct { SubscriptionID string `json:"subscription_id"` Via string `json:"via"` // "link" | "auto_activity" diff --git a/internal/feat/idleunsub/metrics.go b/internal/feat/idleunsub/metrics.go index 76630fb0..a84c68e3 100644 --- a/internal/feat/idleunsub/metrics.go +++ b/internal/feat/idleunsub/metrics.go @@ -19,9 +19,12 @@ var ( cancelFired, _ = meter.Int64Counter("idleunsub.cancel.fired") cancelSkipped, _ = meter.Int64Counter("idleunsub.cancel.skipped") cancelError, _ = meter.Int64Counter("idleunsub.cancel.error") - cacheDrift, _ = meter.Int64Counter("idleunsub.cancel.cache_drift_corrected") - reverseLink, _ = meter.Int64Counter("idleunsub.reverse.link") - reverseAct, _ = meter.Int64Counter("idleunsub.reverse.activity") + //nolint:unused // wired up by AutoReverse in Task 8 + cacheDrift, _ = meter.Int64Counter("idleunsub.cancel.cache_drift_corrected") + //nolint:unused // wired up by KeepSubscription in Task 8 + reverseLink, _ = meter.Int64Counter("idleunsub.reverse.link") + //nolint:unused // wired up by AutoReverse in Task 8 + reverseAct, _ = meter.Int64Counter("idleunsub.reverse.activity") emailEnqueue, _ = meter.Int64Counter("idleunsub.email.enqueue") ) @@ -37,14 +40,17 @@ func mCancelError(ctx context.Context, reason string) { cancelError.Add(ctx, 1, metric.WithAttributes(attribute.String("reason", reason))) } +//nolint:unused // wired up in Task 8 func mCacheDriftCorrected(ctx context.Context, subID string) { cacheDrift.Add(ctx, 1, metric.WithAttributes(attribute.String("sub_id", subID))) } +//nolint:unused // wired up in Task 8 func mReverseLink(ctx context.Context, subID string) { reverseLink.Add(ctx, 1, metric.WithAttributes(attribute.String("sub_id", subID))) } +//nolint:unused // wired up in Task 8 func mReverseActivity(ctx context.Context, subID string) { reverseAct.Add(ctx, 1, metric.WithAttributes(attribute.String("sub_id", subID))) } From ede0943c9d5ff8a043350aeccd0673d2c3bd1de6 Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Fri, 8 May 2026 01:35:29 -0400 Subject: [PATCH 18/37] idleunsub: KeepSubscription + AutoReverse with cache-drift correction --- internal/db/querier.go | 3 + internal/db/users.sql.go | 20 ++ internal/feat/idleunsub/idleunsub.go | 151 +++++++++++++ internal/feat/idleunsub/idleunsub_test.go | 249 ++++++++++++++++++++++ internal/feat/idleunsub/metadata.go | 1 - internal/feat/idleunsub/metrics.go | 12 +- sql/queries/users.sql | 9 + 7 files changed, 435 insertions(+), 10 deletions(-) diff --git a/internal/db/querier.go b/internal/db/querier.go index aab84101..6a011229 100644 --- a/internal/db/querier.go +++ b/internal/db/querier.go @@ -30,6 +30,9 @@ type Querier interface { ClearSubStateOnDeletion(ctx context.Context, id uuid.UUID) error // Called by KeepSubscription / AutoReverse on reversal. Sets banner flag. ClearUserAutoCancelState(ctx context.Context, arg ClearUserAutoCancelStateParams) error + // For cache-drift correction in AutoReverse: clear gates without setting the + // banner. We only flip the banner when an actual reversal happened. + ClearUserAutoCancelStateNoBanner(ctx context.Context, arg ClearUserAutoCancelStateNoBannerParams) error // Batch-completes abandoned sessions that have at least one candidate message. // These are real interviews that the user forgot to end. CompleteAbandonedActiveSessions(ctx context.Context) ([]CompleteAbandonedActiveSessionsRow, error) diff --git a/internal/db/users.sql.go b/internal/db/users.sql.go index a52b19cb..1f31056d 100644 --- a/internal/db/users.sql.go +++ b/internal/db/users.sql.go @@ -75,6 +75,26 @@ func (q *Queries) ClearUserAutoCancelState(ctx context.Context, arg ClearUserAut return err } +const clearUserAutoCancelStateNoBanner = `-- name: ClearUserAutoCancelStateNoBanner :exec +UPDATE users +SET sub_cancel_at_period_end = FALSE, + sub_cancel_is_auto = FALSE, + sub_current_period_start = $2 +WHERE id = $1 +` + +type ClearUserAutoCancelStateNoBannerParams struct { + ID uuid.UUID `json:"id"` + SubCurrentPeriodStart pgtype.Timestamptz `json:"sub_current_period_start"` +} + +// For cache-drift correction in AutoReverse: clear gates without setting the +// banner. We only flip the banner when an actual reversal happened. +func (q *Queries) ClearUserAutoCancelStateNoBanner(ctx context.Context, arg ClearUserAutoCancelStateNoBannerParams) error { + _, err := q.db.Exec(ctx, clearUserAutoCancelStateNoBanner, arg.ID, arg.SubCurrentPeriodStart) + return err +} + const createOAuthUser = `-- name: CreateOAuthUser :one INSERT INTO users (email, email_verified, display_name) VALUES ($1, TRUE, $2) diff --git a/internal/feat/idleunsub/idleunsub.go b/internal/feat/idleunsub/idleunsub.go index 1815d938..61746496 100644 --- a/internal/feat/idleunsub/idleunsub.go +++ b/internal/feat/idleunsub/idleunsub.go @@ -11,6 +11,7 @@ import ( "log/slog" "time" + "github.com/google/uuid" "github.com/jackc/pgx/v5" "github.com/jackc/pgx/v5/pgtype" "github.com/jackc/pgx/v5/pgxpool" @@ -277,6 +278,156 @@ func (s *Service) enqueueCancelEmail(ctx context.Context, user db.User, subID st return nil // placeholder; replaced in Task 9 } +// KeepSubscription reverses cancel_at_period_end after a verified, single-use +// keep-link click. The endpoint is responsible for verifying the token AND +// claiming single-use BEFORE invoking this method. Refuses to act if the +// stored cache says the cancel is NOT our auto-cancel (manual portal cancel). +func (s *Service) KeepSubscription(ctx context.Context, claims KeepTokenClaims) error { + q := db.New(s.pool) + gates, err := q.GetUserAutoCancelGates(ctx, claims.UserID) + if err != nil { + return fmt.Errorf("read gates: %w", err) + } + if !gates.SubCancelAtPeriodEnd || !gates.SubCancelIsAuto { + // Either the cancel was already reversed, or it's a manual portal cancel. + // Don't touch Stripe; render confirmation page idempotently. + return nil + } + updated, err := s.stripe.UpdateSubscriptionCancel(ctx, claims.SubscriptionID, false, "") + if err != nil { + return fmt.Errorf("stripe reverse: %w", err) + } + if updated.Items == nil || len(updated.Items.Data) == 0 { + return fmt.Errorf("stripe reverse: subscription %s missing items", claims.SubscriptionID) + } + periodStart := time.Unix(updated.Items.Data[0].CurrentPeriodStart, 0).UTC() + + if err := q.ClearUserAutoCancelState(ctx, db.ClearUserAutoCancelStateParams{ + ID: claims.UserID, SubCurrentPeriodStart: pgxTime(periodStart), + }); err != nil { + return fmt.Errorf("clear gates: %w", err) + } + + mdJSON, err := marshalKeptMetadata(claims.SubscriptionID, "link", periodStart) + if err != nil { + return fmt.Errorf("marshal kept metadata: %w", err) + } + if err := q.InsertSubscriptionKeptEvent(ctx, db.InsertSubscriptionKeptEventParams{ + UserID: claims.UserID, Metadata: mdJSON, + }); err != nil { + return fmt.Errorf("insert kept event: %w", err) + } + + mReverseLink(ctx, claims.SubscriptionID) + + if err := s.enqueueKeptEmail(ctx, claims.UserID, claims.SubscriptionID, claims.CurrentPeriodEnd); err != nil { + mEmailEnqueue(ctx, "kept", "error") + s.log.Error("kept email enqueue failed", "user_id", claims.UserID, "err", err) + } else { + mEmailEnqueue(ctx, "kept", "ok") + } + return nil +} + +// AutoReverse is invoked by the auth middleware when an authenticated request +// arrives from a user whose cached gates are SubCancelAtPeriodEnd && SubCancelIsAuto. +// The current request itself is the activity signal; we do NOT re-read +// last_active (would race against TouchAuthSession). +// +// AutoReverse verifies Stripe state before acting. If Stripe says the sub is +// NOT canceled (cache drift from a prior partial failure), AutoReverse silently +// clears the cache and returns without sending email or inserting an event row. +func (s *Service) AutoReverse(ctx context.Context, userID uuid.UUID) error { + q := db.New(s.pool) + gates, err := q.GetUserAutoCancelGates(ctx, userID) + if err != nil { + return fmt.Errorf("read gates: %w", err) + } + if !gates.SubCancelAtPeriodEnd || !gates.SubCancelIsAuto || !gates.StripeSubscriptionID.Valid { + return nil // cache says off, or no sub — nothing to do + } + subID := gates.StripeSubscriptionID.String + + // Verify Stripe state — handles the partial-failure window where our cache + // says canceled but Stripe never confirmed. + stripeSub, err := s.stripe.GetSubscription(ctx, subID) + if err != nil { + return fmt.Errorf("get sub: %w", err) + } + if stripeSub.Items == nil || len(stripeSub.Items.Data) == 0 { + return fmt.Errorf("subscription %s missing items", subID) + } + if !stripeSub.CancelAtPeriodEnd { + // Cache drift. Silently correct via the no-banner query and return. + // We don't celebrate a reversal that didn't actually happen here — + // the user kept their sub via some other channel. + mCacheDriftCorrected(ctx, subID) + periodStart := time.Unix(stripeSub.Items.Data[0].CurrentPeriodStart, 0).UTC() + if err := q.ClearUserAutoCancelStateNoBanner(ctx, db.ClearUserAutoCancelStateNoBannerParams{ + ID: userID, SubCurrentPeriodStart: pgxTime(periodStart), + }); err != nil { + return fmt.Errorf("clear cache (drift): %w", err) + } + return nil + } + + // Real reversal. + updated, err := s.stripe.UpdateSubscriptionCancel(ctx, subID, false, "") + if err != nil { + return fmt.Errorf("stripe reverse: %w", err) + } + if updated.Items == nil || len(updated.Items.Data) == 0 { + return fmt.Errorf("stripe reverse: subscription %s missing items", subID) + } + periodStart := time.Unix(updated.Items.Data[0].CurrentPeriodStart, 0).UTC() + periodEnd := time.Unix(updated.Items.Data[0].CurrentPeriodEnd, 0).UTC() + + if err := q.ClearUserAutoCancelState(ctx, db.ClearUserAutoCancelStateParams{ + ID: userID, SubCurrentPeriodStart: pgxTime(periodStart), + }); err != nil { + return fmt.Errorf("clear gates: %w", err) + } + + mdJSON, err := marshalKeptMetadata(subID, "auto_activity", periodStart) + if err != nil { + return fmt.Errorf("marshal kept metadata: %w", err) + } + if err := q.InsertSubscriptionKeptEvent(ctx, db.InsertSubscriptionKeptEventParams{ + UserID: userID, Metadata: mdJSON, + }); err != nil { + return fmt.Errorf("insert kept event: %w", err) + } + + mReverseActivity(ctx, subID) + + if err := s.enqueueKeptEmail(ctx, userID, subID, periodEnd); err != nil { + mEmailEnqueue(ctx, "kept", "error") + s.log.Error("kept email enqueue failed", "user_id", userID, "err", err) + } else { + mEmailEnqueue(ctx, "kept", "ok") + } + return nil +} + +func marshalKeptMetadata(subID, via string, periodStart time.Time) ([]byte, error) { + return json.Marshal(keptMetadata{ + SubscriptionID: subID, + Via: via, + CurrentPeriodStart: periodStart.UTC(), + }) +} + +// enqueueKeptEmail composes and enqueues the kept email. Implementation lands +// in Task 9 alongside the templates; for now this is a stub so the keep paths +// compile. Task 9 replaces it with the real impl. +func (s *Service) enqueueKeptEmail(ctx context.Context, userID uuid.UUID, subID string, periodEnd time.Time) error { + _ = ctx + _ = userID + _ = subID + _ = periodEnd + return nil // placeholder; replaced in Task 9 +} + // pgxText / pgxTime are local pgtype constructors. Inlined here because the // project does not yet have a shared helper for these wrappers; if/when it // does, swap to that and delete these. diff --git a/internal/feat/idleunsub/idleunsub_test.go b/internal/feat/idleunsub/idleunsub_test.go index 16563097..5a409525 100644 --- a/internal/feat/idleunsub/idleunsub_test.go +++ b/internal/feat/idleunsub/idleunsub_test.go @@ -357,3 +357,252 @@ func TestHandleInvoiceUpcoming_FirstPeriodGrace(t *testing.T) { require.NoError(t, fx.Svc.HandleInvoiceUpcoming(fx.Ctx, fx.Event)) requireNoCancel(t, fx) } + +// --------------------------------------------------------------------------- +// KeepSubscription / AutoReverse fixtures +// --------------------------------------------------------------------------- + +// setupAutoCanceledState seeds a user in the "auto-canceled" cache state. We +// use raw SQL UPDATE per the spec's "start from real state and mutate" rule: +// SeedUser yields a real user row; we then mutate the cache columns to the +// post-cancel shape we'd see after HandleInvoiceUpcoming + the +// SyncSubStateFromWebhook follow-up. This is appropriate for unit-testing +// the reversal paths in isolation. +// +// Returns a fixture ready for KeepSubscription/AutoReverse calls. The fake +// Stripe sub mirrors the cached state (CancelAtPeriodEnd=true) by default; +// individual tests mutate fx.Sub before calling AutoReverse to simulate +// drift. +func setupAutoCanceledState(t *testing.T) *fixture { + t.Helper() + fx := setupHappyPath(t) + + // Mutate the user's cache columns to the auto-canceled shape and write + // the stripe_subscription_id (in production written by + // SyncSubStateFromWebhook on customer.subscription.updated). + _, err := fx.B.Pool().Exec(fx.Ctx, ` + UPDATE users + SET sub_cancel_at_period_end = TRUE, + sub_cancel_is_auto = TRUE, + stripe_subscription_id = $1, + sub_current_period_start = $2, + pending_kept_banner = FALSE + WHERE id = $3`, fx.SubID, fx.PeriodStart, fx.UserID) + require.NoError(t, err) + + // The fake Stripe sub should match: it's in the canceled state too. + fx.Sub.CancelAtPeriodEnd = true + + return fx +} + +// readUserCache returns the auto-cancel cache columns for a user. +func readUserCache(t *testing.T, fx *fixture) (cancelAtEnd, isAuto, banner bool) { + t.Helper() + err := fx.B.Pool().QueryRow(fx.Ctx, + `SELECT sub_cancel_at_period_end, sub_cancel_is_auto, pending_kept_banner + FROM users WHERE id = $1`, + fx.UserID).Scan(&cancelAtEnd, &isAuto, &banner) + require.NoError(t, err) + return +} + +// countKeptEvents returns the number of subscription_kept rows for a user. +func countKeptEvents(t *testing.T, fx *fixture) int { + t.Helper() + var count int + err := fx.B.Pool().QueryRow(fx.Ctx, ` + SELECT COUNT(*) FROM user_events + WHERE user_id = $1 AND event_type = 'subscription_kept'`, + fx.UserID).Scan(&count) + require.NoError(t, err) + return count +} + +// keepClaims constructs a KeepTokenClaims for fx. The endpoint constructs +// these from a verified token; here we synthesize directly since +// KeepSubscription's contract is "endpoint already verified the token". +func keepClaims(fx *fixture) idleunsub.KeepTokenClaims { + return idleunsub.KeepTokenClaims{ + UserID: fx.UserID, + SubscriptionID: fx.SubID, + Action: "keep_subscription", + CurrentPeriodEnd: fx.PeriodEnd, + IssuedAt: fx.PeriodStart.Unix(), + ExpiresAt: fx.PeriodEnd.Unix(), + } +} + +// --------------------------------------------------------------------------- +// KeepSubscription tests +// --------------------------------------------------------------------------- + +func TestKeepSubscription_HappyPath(t *testing.T) { + t.Parallel() + fx := setupAutoCanceledState(t) + + require.NoError(t, fx.Svc.KeepSubscription(fx.Ctx, keepClaims(fx))) + + // Stripe Update called once, with cancel_at_period_end=false. + require.Len(t, fx.Fake.updateCalls, 1) + require.Equal(t, fx.SubID, fx.Fake.updateCalls[0].ID) + require.False(t, fx.Fake.updateCalls[0].CancelAtPeriodEnd) + require.Empty(t, fx.Fake.updateCalls[0].IdempotencyKey, + "KeepSubscription passes empty idempotency key (Stripe treats as non-idempotent)") + + // Cache flipped, banner set. + cancelAtEnd, isAuto, banner := readUserCache(t, fx) + require.False(t, cancelAtEnd) + require.False(t, isAuto) + require.True(t, banner, "KeepSubscription is a real reversal: banner must be set") + + // One subscription_kept event row with via='link'. + require.Equal(t, 1, countKeptEvents(t, fx)) + var via, gotSubID string + err := fx.B.Pool().QueryRow(fx.Ctx, ` + SELECT metadata->>'via', metadata->>'subscription_id' + FROM user_events + WHERE user_id = $1 AND event_type = 'subscription_kept'`, + fx.UserID).Scan(&via, &gotSubID) + require.NoError(t, err) + require.Equal(t, "link", via) + require.Equal(t, fx.SubID, gotSubID) +} + +func TestKeepSubscription_RefusesManualCancel(t *testing.T) { + t.Parallel() + fx := setupAutoCanceledState(t) + + // Flip sub_cancel_is_auto OFF: this is now a manual portal cancel that + // happens to share the cache flag layout. KeepSubscription must refuse. + _, err := fx.B.Pool().Exec(fx.Ctx, + `UPDATE users SET sub_cancel_is_auto = FALSE WHERE id = $1`, fx.UserID) + require.NoError(t, err) + + require.NoError(t, fx.Svc.KeepSubscription(fx.Ctx, keepClaims(fx))) + + // Zero Stripe calls. + require.Empty(t, fx.Fake.updateCalls, "must not touch Stripe on a manual cancel") + // Zero new event rows. + require.Equal(t, 0, countKeptEvents(t, fx)) + // Cache unchanged: still cancel_at_period_end=true, is_auto=false (we set + // it false above), banner false. + var cancelAtEnd, isAuto, banner bool + err = fx.B.Pool().QueryRow(fx.Ctx, + `SELECT sub_cancel_at_period_end, sub_cancel_is_auto, pending_kept_banner + FROM users WHERE id = $1`, fx.UserID).Scan(&cancelAtEnd, &isAuto, &banner) + require.NoError(t, err) + require.True(t, cancelAtEnd, "manual-cancel cache flag must be left intact") + require.False(t, isAuto) + require.False(t, banner) +} + +func TestKeepSubscription_Idempotent(t *testing.T) { + t.Parallel() + fx := setupAutoCanceledState(t) + + // First call: real reversal. + require.NoError(t, fx.Svc.KeepSubscription(fx.Ctx, keepClaims(fx))) + // Second call: gates already cleared after first → must be a no-op. + require.NoError(t, fx.Svc.KeepSubscription(fx.Ctx, keepClaims(fx))) + + // Exactly one Stripe call, exactly one event row. + require.Len(t, fx.Fake.updateCalls, 1, "second call must not re-hit Stripe") + require.Equal(t, 1, countKeptEvents(t, fx), "second call must not re-insert event") +} + +// --------------------------------------------------------------------------- +// AutoReverse tests +// --------------------------------------------------------------------------- + +func TestAutoReverse_GateOff(t *testing.T) { + t.Parallel() + // Plain happy-path user: sub_cancel_at_period_end=false. AutoReverse is + // a no-op. + fx := setupHappyPath(t) + + require.NoError(t, fx.Svc.AutoReverse(fx.Ctx, fx.UserID)) + + require.Empty(t, fx.Fake.updateCalls, "gate off → no Stripe calls") + // Note: setupHappyPath does not call GetSubscription either; AutoReverse + // must early-return before any Stripe round-trip. + require.Equal(t, 0, countKeptEvents(t, fx)) +} + +func TestAutoReverse_HappyPath(t *testing.T) { + t.Parallel() + fx := setupAutoCanceledState(t) + + require.NoError(t, fx.Svc.AutoReverse(fx.Ctx, fx.UserID)) + + // Exactly one Stripe Update call, with cancel_at_period_end=false. + require.Len(t, fx.Fake.updateCalls, 1) + require.Equal(t, fx.SubID, fx.Fake.updateCalls[0].ID) + require.False(t, fx.Fake.updateCalls[0].CancelAtPeriodEnd) + + // Cache cleared with banner=true. + cancelAtEnd, isAuto, banner := readUserCache(t, fx) + require.False(t, cancelAtEnd) + require.False(t, isAuto) + require.True(t, banner, "real reversal must set banner") + + // One subscription_kept row with via='auto_activity'. + require.Equal(t, 1, countKeptEvents(t, fx)) + var via string + err := fx.B.Pool().QueryRow(fx.Ctx, ` + SELECT metadata->>'via' FROM user_events + WHERE user_id = $1 AND event_type = 'subscription_kept'`, + fx.UserID).Scan(&via) + require.NoError(t, err) + require.Equal(t, "auto_activity", via) +} + +func TestAutoReverse_CacheDriftCorrected(t *testing.T) { + t.Parallel() + fx := setupAutoCanceledState(t) + + // Stripe says NOT canceled (the auto-cancel webhook side never landed, + // or was reversed out-of-band). Cache says canceled. AutoReverse must + // silently reconcile without celebrating a reversal. + fx.Sub.CancelAtPeriodEnd = false + + require.NoError(t, fx.Svc.AutoReverse(fx.Ctx, fx.UserID)) + + // ZERO Stripe Update calls — only the GetSubscription read. + require.Empty(t, fx.Fake.updateCalls, "drift correction must not hit Stripe Update") + + // Cache silently cleared. + cancelAtEnd, isAuto, banner := readUserCache(t, fx) + require.False(t, cancelAtEnd) + require.False(t, isAuto) + require.False(t, banner, "drift correction must NOT set the banner") + + // No subscription_kept event row (no real reversal happened). + require.Equal(t, 0, countKeptEvents(t, fx)) +} + +func TestAutoReverse_RefusesManualCancel(t *testing.T) { + t.Parallel() + fx := setupAutoCanceledState(t) + + // Flip sub_cancel_is_auto OFF: this is a manual portal cancel. + // AutoReverse must early-return without any Stripe calls. + _, err := fx.B.Pool().Exec(fx.Ctx, + `UPDATE users SET sub_cancel_is_auto = FALSE WHERE id = $1`, fx.UserID) + require.NoError(t, err) + + require.NoError(t, fx.Svc.AutoReverse(fx.Ctx, fx.UserID)) + + // Zero Stripe calls of any kind. + require.Empty(t, fx.Fake.updateCalls) + // Cache unchanged. + var cancelAtEnd, isAuto, banner bool + err = fx.B.Pool().QueryRow(fx.Ctx, + `SELECT sub_cancel_at_period_end, sub_cancel_is_auto, pending_kept_banner + FROM users WHERE id = $1`, fx.UserID).Scan(&cancelAtEnd, &isAuto, &banner) + require.NoError(t, err) + require.True(t, cancelAtEnd) + require.False(t, isAuto) + require.False(t, banner) + require.Equal(t, 0, countKeptEvents(t, fx)) +} diff --git a/internal/feat/idleunsub/metadata.go b/internal/feat/idleunsub/metadata.go index 3054f541..1bd0da4c 100644 --- a/internal/feat/idleunsub/metadata.go +++ b/internal/feat/idleunsub/metadata.go @@ -17,7 +17,6 @@ type cancelMetadata struct { CurrentPeriodEnd time.Time `json:"current_period_end"` } -//nolint:unused // wired up by KeepSubscription/AutoReverse in Task 8 type keptMetadata struct { SubscriptionID string `json:"subscription_id"` Via string `json:"via"` // "link" | "auto_activity" diff --git a/internal/feat/idleunsub/metrics.go b/internal/feat/idleunsub/metrics.go index a84c68e3..76630fb0 100644 --- a/internal/feat/idleunsub/metrics.go +++ b/internal/feat/idleunsub/metrics.go @@ -19,12 +19,9 @@ var ( cancelFired, _ = meter.Int64Counter("idleunsub.cancel.fired") cancelSkipped, _ = meter.Int64Counter("idleunsub.cancel.skipped") cancelError, _ = meter.Int64Counter("idleunsub.cancel.error") - //nolint:unused // wired up by AutoReverse in Task 8 - cacheDrift, _ = meter.Int64Counter("idleunsub.cancel.cache_drift_corrected") - //nolint:unused // wired up by KeepSubscription in Task 8 - reverseLink, _ = meter.Int64Counter("idleunsub.reverse.link") - //nolint:unused // wired up by AutoReverse in Task 8 - reverseAct, _ = meter.Int64Counter("idleunsub.reverse.activity") + cacheDrift, _ = meter.Int64Counter("idleunsub.cancel.cache_drift_corrected") + reverseLink, _ = meter.Int64Counter("idleunsub.reverse.link") + reverseAct, _ = meter.Int64Counter("idleunsub.reverse.activity") emailEnqueue, _ = meter.Int64Counter("idleunsub.email.enqueue") ) @@ -40,17 +37,14 @@ func mCancelError(ctx context.Context, reason string) { cancelError.Add(ctx, 1, metric.WithAttributes(attribute.String("reason", reason))) } -//nolint:unused // wired up in Task 8 func mCacheDriftCorrected(ctx context.Context, subID string) { cacheDrift.Add(ctx, 1, metric.WithAttributes(attribute.String("sub_id", subID))) } -//nolint:unused // wired up in Task 8 func mReverseLink(ctx context.Context, subID string) { reverseLink.Add(ctx, 1, metric.WithAttributes(attribute.String("sub_id", subID))) } -//nolint:unused // wired up in Task 8 func mReverseActivity(ctx context.Context, subID string) { reverseAct.Add(ctx, 1, metric.WithAttributes(attribute.String("sub_id", subID))) } diff --git a/sql/queries/users.sql b/sql/queries/users.sql index b8077b8b..9fa52eff 100644 --- a/sql/queries/users.sql +++ b/sql/queries/users.sql @@ -103,6 +103,15 @@ SET sub_cancel_at_period_end = FALSE, sub_current_period_start = $2 WHERE id = $1; +-- name: ClearUserAutoCancelStateNoBanner :exec +-- For cache-drift correction in AutoReverse: clear gates without setting the +-- banner. We only flip the banner when an actual reversal happened. +UPDATE users +SET sub_cancel_at_period_end = FALSE, + sub_cancel_is_auto = FALSE, + sub_current_period_start = $2 +WHERE id = $1; + -- name: SyncSubStateFromWebhook :exec -- Called by handleSubscriptionUpdated. Does NOT touch sub_cancel_is_auto: -- only our handler sets that flag; webhook sync must not overwrite it. From 0e5f160f947de68664f65f5649c53e82764b49d9 Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Fri, 8 May 2026 01:46:54 -0400 Subject: [PATCH 19/37] idleunsub: address Task 8 code review feedback - Document empty idempotency key intent on both UpdateSubscriptionCancel reversal call sites (KeepSubscription and AutoReverse) - Wrap KeepSubscription and AutoReverse real-reversal DB writes in a single transaction so ClearUserAutoCancelState + InsertSubscriptionKeptEvent commit-or-fail atomically - Add stuck-state comment on StripeSubscriptionID.Valid early-return in AutoReverse - Add Task 9 email TODO comment in TestKeepSubscription_Idempotent - Add TestAutoReverse_StripeUpdateFails: Stripe error leaves gates ON for self-heal - Fix unlambda in NewTokenSigner (gocritic: replace lambda with time.Now directly) --- internal/feat/idleunsub/idleunsub.go | 45 +++++++++++++++++++++-- internal/feat/idleunsub/idleunsub_test.go | 26 +++++++++++++ internal/feat/idleunsub/token.go | 2 +- 3 files changed, 68 insertions(+), 5 deletions(-) diff --git a/internal/feat/idleunsub/idleunsub.go b/internal/feat/idleunsub/idleunsub.go index 61746496..dc4def73 100644 --- a/internal/feat/idleunsub/idleunsub.go +++ b/internal/feat/idleunsub/idleunsub.go @@ -293,6 +293,11 @@ func (s *Service) KeepSubscription(ctx context.Context, claims KeepTokenClaims) // Don't touch Stripe; render confirmation page idempotently. return nil } + // Empty idempotency key is intentional: KeepSubscription is gated upstream by + // keep_link_token_uses single-use enforcement; AutoReverse converges via the + // gate read on the next call. Unlike HandleInvoiceUpcoming (which uses event.ID + // because Stripe retries webhooks with the same ID), reversal is request-driven + // and idempotent at the gate-check layer. updated, err := s.stripe.UpdateSubscriptionCancel(ctx, claims.SubscriptionID, false, "") if err != nil { return fmt.Errorf("stripe reverse: %w", err) @@ -302,7 +307,14 @@ func (s *Service) KeepSubscription(ctx context.Context, claims KeepTokenClaims) } periodStart := time.Unix(updated.Items.Data[0].CurrentPeriodStart, 0).UTC() - if err := q.ClearUserAutoCancelState(ctx, db.ClearUserAutoCancelStateParams{ + tx, err := s.pool.BeginTx(ctx, pgx.TxOptions{}) + if err != nil { + return fmt.Errorf("begin tx: %w", err) + } + defer tx.Rollback(ctx) //nolint:errcheck // rollback after commit is a no-op + qtx := db.New(tx) + + if err := qtx.ClearUserAutoCancelState(ctx, db.ClearUserAutoCancelStateParams{ ID: claims.UserID, SubCurrentPeriodStart: pgxTime(periodStart), }); err != nil { return fmt.Errorf("clear gates: %w", err) @@ -312,12 +324,16 @@ func (s *Service) KeepSubscription(ctx context.Context, claims KeepTokenClaims) if err != nil { return fmt.Errorf("marshal kept metadata: %w", err) } - if err := q.InsertSubscriptionKeptEvent(ctx, db.InsertSubscriptionKeptEventParams{ + if err := qtx.InsertSubscriptionKeptEvent(ctx, db.InsertSubscriptionKeptEventParams{ UserID: claims.UserID, Metadata: mdJSON, }); err != nil { return fmt.Errorf("insert kept event: %w", err) } + if err := tx.Commit(ctx); err != nil { + return fmt.Errorf("commit: %w", err) + } + mReverseLink(ctx, claims.SubscriptionID) if err := s.enqueueKeptEmail(ctx, claims.UserID, claims.SubscriptionID, claims.CurrentPeriodEnd); err != nil { @@ -343,6 +359,11 @@ func (s *Service) AutoReverse(ctx context.Context, userID uuid.UUID) error { if err != nil { return fmt.Errorf("read gates: %w", err) } + // If StripeSubscriptionID is NULL, AutoReverse cannot self-heal here — only + // the link path (which carries the sub ID in the signed token) or a fresh + // customer.subscription.updated webhook (via SyncSubStateFromWebhook) can + // recover this state. This is consistent with the spec's "single sole + // population path" invariant for stripe_subscription_id. if !gates.SubCancelAtPeriodEnd || !gates.SubCancelIsAuto || !gates.StripeSubscriptionID.Valid { return nil // cache says off, or no sub — nothing to do } @@ -372,6 +393,11 @@ func (s *Service) AutoReverse(ctx context.Context, userID uuid.UUID) error { } // Real reversal. + // Empty idempotency key is intentional: KeepSubscription is gated upstream by + // keep_link_token_uses single-use enforcement; AutoReverse converges via the + // gate read on the next call. Unlike HandleInvoiceUpcoming (which uses event.ID + // because Stripe retries webhooks with the same ID), reversal is request-driven + // and idempotent at the gate-check layer. updated, err := s.stripe.UpdateSubscriptionCancel(ctx, subID, false, "") if err != nil { return fmt.Errorf("stripe reverse: %w", err) @@ -382,7 +408,14 @@ func (s *Service) AutoReverse(ctx context.Context, userID uuid.UUID) error { periodStart := time.Unix(updated.Items.Data[0].CurrentPeriodStart, 0).UTC() periodEnd := time.Unix(updated.Items.Data[0].CurrentPeriodEnd, 0).UTC() - if err := q.ClearUserAutoCancelState(ctx, db.ClearUserAutoCancelStateParams{ + tx, err := s.pool.BeginTx(ctx, pgx.TxOptions{}) + if err != nil { + return fmt.Errorf("begin tx: %w", err) + } + defer tx.Rollback(ctx) //nolint:errcheck // rollback after commit is a no-op + qtx := db.New(tx) + + if err := qtx.ClearUserAutoCancelState(ctx, db.ClearUserAutoCancelStateParams{ ID: userID, SubCurrentPeriodStart: pgxTime(periodStart), }); err != nil { return fmt.Errorf("clear gates: %w", err) @@ -392,12 +425,16 @@ func (s *Service) AutoReverse(ctx context.Context, userID uuid.UUID) error { if err != nil { return fmt.Errorf("marshal kept metadata: %w", err) } - if err := q.InsertSubscriptionKeptEvent(ctx, db.InsertSubscriptionKeptEventParams{ + if err := qtx.InsertSubscriptionKeptEvent(ctx, db.InsertSubscriptionKeptEventParams{ UserID: userID, Metadata: mdJSON, }); err != nil { return fmt.Errorf("insert kept event: %w", err) } + if err := tx.Commit(ctx); err != nil { + return fmt.Errorf("commit: %w", err) + } + mReverseActivity(ctx, subID) if err := s.enqueueKeptEmail(ctx, userID, subID, periodEnd); err != nil { diff --git a/internal/feat/idleunsub/idleunsub_test.go b/internal/feat/idleunsub/idleunsub_test.go index 5a409525..a683e5c7 100644 --- a/internal/feat/idleunsub/idleunsub_test.go +++ b/internal/feat/idleunsub/idleunsub_test.go @@ -2,6 +2,7 @@ package idleunsub_test import ( "context" + "errors" "io" "log/slog" "os" @@ -509,6 +510,10 @@ func TestKeepSubscription_Idempotent(t *testing.T) { // Exactly one Stripe call, exactly one event row. require.Len(t, fx.Fake.updateCalls, 1, "second call must not re-hit Stripe") require.Equal(t, 1, countKeptEvents(t, fx), "second call must not re-insert event") + + // TODO(Task 9): assert mEmailEnqueue counter == 1 once the real mailer wires up. + // The plan promises "exactly one email enqueued"; with the current stub this + // can't be verified without inspecting the OTEL counter directly. } // --------------------------------------------------------------------------- @@ -606,3 +611,24 @@ func TestAutoReverse_RefusesManualCancel(t *testing.T) { require.False(t, banner) require.Equal(t, 0, countKeptEvents(t, fx)) } + +func TestAutoReverse_StripeUpdateFails(t *testing.T) { + t.Parallel() + fx := setupAutoCanceledState(t) + + // Configure fakeStripe to fail on the next Update call. + fx.Fake.updateErr = errors.New("stripe down") + + err := fx.Svc.AutoReverse(fx.Ctx, fx.UserID) + require.Error(t, err) + require.Contains(t, err.Error(), "stripe down") + + // Gates must remain ON — the next call retries (self-healing). + cancelAtEnd, isAuto, banner := readUserCache(t, fx) + require.True(t, cancelAtEnd, "gates must remain ON after Stripe failure for self-heal") + require.True(t, isAuto) + require.False(t, banner, "banner must NOT be set on failure") + + // No subscription_kept rows because we returned before DB writes. + require.Equal(t, 0, countKeptEvents(t, fx)) +} diff --git a/internal/feat/idleunsub/token.go b/internal/feat/idleunsub/token.go index 69735e46..0c86d496 100644 --- a/internal/feat/idleunsub/token.go +++ b/internal/feat/idleunsub/token.go @@ -37,7 +37,7 @@ type TokenSigner struct { // NewTokenSigner constructs a signer with the given HMAC key. func NewTokenSigner(key []byte) *TokenSigner { - return &TokenSigner{key: key, now: func() time.Time { return time.Now() }} + return &TokenSigner{key: key, now: time.Now} } // Sign serializes and HMAC-signs the claims, returning a URL-safe base64 From a8118664ae8a23fcde884722a0ba6a42f9cfedb9 Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Fri, 8 May 2026 01:51:06 -0400 Subject: [PATCH 20/37] idleunsub: email composition + templates --- internal/feat/idleunsub/email.go | 68 ++++++++++++++++ internal/feat/idleunsub/email_test.go | 31 ++++++++ internal/feat/idleunsub/idleunsub.go | 78 ++++++++++++------- internal/feat/idleunsub/idleunsub_test.go | 2 +- .../feat/idleunsub/templates/cancel.html.tmpl | 14 ++++ .../feat/idleunsub/templates/cancel.txt.tmpl | 14 ++++ .../feat/idleunsub/templates/kept.html.tmpl | 8 ++ .../feat/idleunsub/templates/kept.txt.tmpl | 8 ++ 8 files changed, 195 insertions(+), 28 deletions(-) create mode 100644 internal/feat/idleunsub/email.go create mode 100644 internal/feat/idleunsub/email_test.go create mode 100644 internal/feat/idleunsub/templates/cancel.html.tmpl create mode 100644 internal/feat/idleunsub/templates/cancel.txt.tmpl create mode 100644 internal/feat/idleunsub/templates/kept.html.tmpl create mode 100644 internal/feat/idleunsub/templates/kept.txt.tmpl diff --git a/internal/feat/idleunsub/email.go b/internal/feat/idleunsub/email.go new file mode 100644 index 00000000..c837ca92 --- /dev/null +++ b/internal/feat/idleunsub/email.go @@ -0,0 +1,68 @@ +package idleunsub + +import ( + "bytes" + "embed" + "fmt" + "html/template" + "time" + + texttemplate "text/template" + + "github.com/btc/drill/internal/email" +) + +//go:embed templates/* +var templatesFS embed.FS + +type cancelEmailData struct { + DisplayName string + CurrentPeriodEnd string + KeepLink string +} + +type keptEmailData struct { + DisplayName string + NextRenewalDate string +} + +func composeCancelEmail(toEmail, displayName, keepURL string, periodEnd time.Time) (email.Message, error) { + data := cancelEmailData{ + DisplayName: displayName, + CurrentPeriodEnd: periodEnd.Format("January 2, 2006"), + KeepLink: keepURL, + } + return renderEmail("cancel", toEmail, "We won't charge you for the next period", data) +} + +func composeKeptEmail(toEmail, displayName string, nextRenewal time.Time) (email.Message, error) { + data := keptEmailData{ + DisplayName: displayName, + NextRenewalDate: nextRenewal.Format("January 2, 2006"), + } + return renderEmail("kept", toEmail, "Your subscription is still active", data) +} + +func renderEmail(name, to, subject string, data interface{}) (email.Message, error) { + htmlT, err := template.ParseFS(templatesFS, "templates/"+name+".html.tmpl") + if err != nil { + return email.Message{}, fmt.Errorf("parse %s.html: %w", name, err) + } + txtT, err := texttemplate.ParseFS(templatesFS, "templates/"+name+".txt.tmpl") + if err != nil { + return email.Message{}, fmt.Errorf("parse %s.txt: %w", name, err) + } + var htmlBuf, txtBuf bytes.Buffer + if err := htmlT.Execute(&htmlBuf, data); err != nil { + return email.Message{}, fmt.Errorf("execute %s.html: %w", name, err) + } + if err := txtT.Execute(&txtBuf, data); err != nil { + return email.Message{}, fmt.Errorf("execute %s.txt: %w", name, err) + } + return email.Message{ + To: to, + Subject: subject, + HTML: htmlBuf.String(), + Text: txtBuf.String(), + }, nil +} diff --git a/internal/feat/idleunsub/email_test.go b/internal/feat/idleunsub/email_test.go new file mode 100644 index 00000000..97aeb333 --- /dev/null +++ b/internal/feat/idleunsub/email_test.go @@ -0,0 +1,31 @@ +package idleunsub + +import ( + "testing" + "time" + + "github.com/stretchr/testify/require" +) + +func TestComposeCancelEmail_Renders(t *testing.T) { + t.Parallel() + end := time.Date(2026, 3, 1, 0, 0, 0, 0, time.UTC) + msg, err := composeCancelEmail("user@example.com", "Jane", + "https://sabermatic.dev/sub/keep?t=abc.def", end) + require.NoError(t, err) + require.Equal(t, "user@example.com", msg.To) + require.Equal(t, "We won't charge you for the next period", msg.Subject) + require.Contains(t, msg.Text, "Hi Jane,") + require.Contains(t, msg.Text, "March 1, 2026") + require.Contains(t, msg.Text, "https://sabermatic.dev/sub/keep?t=abc.def") + require.Contains(t, msg.HTML, "Jane") +} + +func TestComposeKeptEmail_Renders(t *testing.T) { + t.Parallel() + end := time.Date(2026, 3, 1, 0, 0, 0, 0, time.UTC) + msg, err := composeKeptEmail("user@example.com", "Jane", end) + require.NoError(t, err) + require.Equal(t, "Your subscription is still active", msg.Subject) + require.Contains(t, msg.Text, "March 1, 2026") +} diff --git a/internal/feat/idleunsub/idleunsub.go b/internal/feat/idleunsub/idleunsub.go index dc4def73..a30112f5 100644 --- a/internal/feat/idleunsub/idleunsub.go +++ b/internal/feat/idleunsub/idleunsub.go @@ -9,6 +9,7 @@ import ( "errors" "fmt" "log/slog" + "strings" "time" "github.com/google/uuid" @@ -31,21 +32,24 @@ type StripeClient interface { // Service owns the cancel/keep/auto-reverse logic. type Service struct { - pool *pgxpool.Pool - stripe StripeClient - mailer email.Sender - signer *TokenSigner - now func() time.Time - log *slog.Logger + pool *pgxpool.Pool + stripe StripeClient + mailer email.Sender + signer *TokenSigner + baseURL string + now func() time.Time + log *slog.Logger } // NewService constructs a Service. The signer may be nil for tests that // only exercise HandleInvoiceUpcoming (Task 7); Tasks 8/9 wire it in. -func NewService(pool *pgxpool.Pool, sc StripeClient, m email.Sender, sn *TokenSigner, log *slog.Logger) *Service { +// baseURL is the public-facing base URL (e.g. "https://sabermatic.dev") used +// to build keep-links; pass "http://localhost:3000" in tests. +func NewService(pool *pgxpool.Pool, sc StripeClient, m email.Sender, sn *TokenSigner, baseURL string, log *slog.Logger) *Service { if log == nil { log = slog.Default() } - return &Service{pool: pool, stripe: sc, mailer: m, signer: sn, now: time.Now, log: log} + return &Service{pool: pool, stripe: sc, mailer: m, signer: sn, baseURL: baseURL, now: time.Now, log: log} } // HandleInvoiceUpcoming evaluates the trigger rule. Idempotent: safe to call @@ -263,19 +267,34 @@ func marshalCancelMetadata(subID, eventID string, start, end time.Time) ([]byte, }) } -// enqueueCancelEmail composes and enqueues the cancel email. -// Implementation lands in Task 9 alongside the templates; for now this is a -// stub so the cancel path compiles. Task 9 replaces it with the real impl. -// -// TODO: Task 9 implements; the error path here is currently unreachable -// (stub always returns nil). Task 9's test must inject an erroring sender -// and assert mEmailEnqueue("cancel", "error") fires + cancel still succeeds. +// enqueueCancelEmail composes the cancel email and sends it via the mailer. +// Called from HandleInvoiceUpcoming after the cancel decision is committed. func (s *Service) enqueueCancelEmail(ctx context.Context, user db.User, subID string, periodEnd time.Time) error { - _ = ctx - _ = user - _ = subID - _ = periodEnd - return nil // placeholder; replaced in Task 9 + keepURL, err := s.buildKeepURL(user.ID, subID, periodEnd) + if err != nil { + return fmt.Errorf("build keep url: %w", err) + } + msg, err := composeCancelEmail(user.Email, user.DisplayName, keepURL, periodEnd) + if err != nil { + return fmt.Errorf("compose cancel: %w", err) + } + return s.mailer.Send(ctx, msg) +} + +func (s *Service) buildKeepURL(userID uuid.UUID, subID string, periodEnd time.Time) (string, error) { + if s.signer == nil { + return "", fmt.Errorf("token signer not configured") + } + periodEnd = periodEnd.UTC().Truncate(time.Second) + tok := s.signer.Sign(KeepTokenClaims{ + UserID: userID, + SubscriptionID: subID, + Action: "keep_subscription", + CurrentPeriodEnd: periodEnd, + IssuedAt: s.now().Unix(), + ExpiresAt: periodEnd.Unix(), + }) + return fmt.Sprintf("%s/sub/keep?t=%s", strings.TrimRight(s.baseURL, "/"), tok), nil } // KeepSubscription reverses cancel_at_period_end after a verified, single-use @@ -454,15 +473,20 @@ func marshalKeptMetadata(subID, via string, periodStart time.Time) ([]byte, erro }) } -// enqueueKeptEmail composes and enqueues the kept email. Implementation lands -// in Task 9 alongside the templates; for now this is a stub so the keep paths -// compile. Task 9 replaces it with the real impl. +// enqueueKeptEmail composes the kept-confirmation email and sends it. +// Called from KeepSubscription and AutoReverse after a successful reversal. func (s *Service) enqueueKeptEmail(ctx context.Context, userID uuid.UUID, subID string, periodEnd time.Time) error { - _ = ctx - _ = userID _ = subID - _ = periodEnd - return nil // placeholder; replaced in Task 9 + q := db.New(s.pool) + user, err := q.GetUserByID(ctx, userID) + if err != nil { + return fmt.Errorf("get user: %w", err) + } + msg, err := composeKeptEmail(user.Email, user.DisplayName, periodEnd) + if err != nil { + return fmt.Errorf("compose kept: %w", err) + } + return s.mailer.Send(ctx, msg) } // pgxText / pgxTime are local pgtype constructors. Inlined here because the diff --git a/internal/feat/idleunsub/idleunsub_test.go b/internal/feat/idleunsub/idleunsub_test.go index a683e5c7..fff5f0aa 100644 --- a/internal/feat/idleunsub/idleunsub_test.go +++ b/internal/feat/idleunsub/idleunsub_test.go @@ -96,7 +96,7 @@ func setupHappyPath(t *testing.T) *fixture { } fake := &fakeStripe{subs: map[string]*stripe.Subscription{subID: sub}} - svc := idleunsub.NewService(b.Pool(), fake, nullMailer{}, nil, silentLogger()) + svc := idleunsub.NewService(b.Pool(), fake, nullMailer{}, nil, "http://localhost:3000", silentLogger()) event := stripe.Event{ ID: "evt_" + uuid.NewString()[:8], diff --git a/internal/feat/idleunsub/templates/cancel.html.tmpl b/internal/feat/idleunsub/templates/cancel.html.tmpl new file mode 100644 index 00000000..04ab3097 --- /dev/null +++ b/internal/feat/idleunsub/templates/cancel.html.tmpl @@ -0,0 +1,14 @@ +

Hi {{.DisplayName}},

+ +

We noticed you haven't been around Sabermatic in the last two billing +periods, so we've stopped your auto-renewal. You'll keep access through +{{.CurrentPeriodEnd}} — we won't charge you for the next period.

+ +

If you'd like to keep your subscription active, one click does it:

+ +

Keep my subscription

+ +

If you're done for now, no action needed. We'll be here whenever you +want to come back.

+ +

— Sabermatic

diff --git a/internal/feat/idleunsub/templates/cancel.txt.tmpl b/internal/feat/idleunsub/templates/cancel.txt.tmpl new file mode 100644 index 00000000..1797904b --- /dev/null +++ b/internal/feat/idleunsub/templates/cancel.txt.tmpl @@ -0,0 +1,14 @@ +Hi {{.DisplayName}}, + +We noticed you haven't been around Sabermatic in the last two billing +periods, so we've stopped your auto-renewal. You'll keep access through +{{.CurrentPeriodEnd}} — we won't charge you for the next period. + +If you'd like to keep your subscription active, one click does it: + + {{.KeepLink}} + +If you're done for now, no action needed. We'll be here whenever you +want to come back. + +— Sabermatic diff --git a/internal/feat/idleunsub/templates/kept.html.tmpl b/internal/feat/idleunsub/templates/kept.html.tmpl new file mode 100644 index 00000000..c16f40f0 --- /dev/null +++ b/internal/feat/idleunsub/templates/kept.html.tmpl @@ -0,0 +1,8 @@ +

Hi {{.DisplayName}},

+ +

You're all set — your Sabermatic subscription will renew normally on +{{.NextRenewalDate}}.

+ +

Welcome back.

+ +

— Sabermatic

diff --git a/internal/feat/idleunsub/templates/kept.txt.tmpl b/internal/feat/idleunsub/templates/kept.txt.tmpl new file mode 100644 index 00000000..8ca6be71 --- /dev/null +++ b/internal/feat/idleunsub/templates/kept.txt.tmpl @@ -0,0 +1,8 @@ +Hi {{.DisplayName}}, + +You're all set — your Sabermatic subscription will renew normally on +{{.NextRenewalDate}}. + +Welcome back. + +— Sabermatic From 81014d0cd8e69a63980edf35cebdaaeff809719d Mon Sep 17 00:00:00 2001 From: Brian Tiger Chow <734339+btc@users.noreply.github.com> Date: Fri, 8 May 2026 20:16:39 -0400 Subject: [PATCH 21/37] idleunsub: address Task 9 code review feedback - Wire real TokenSigner (32-byte random key) in setupHappyPath; signer was nil, causing enqueueCancelEmail to silently exercise the error branch - Add recordingMailer to capture sent messages; assert cancel/kept email subject, To, and body content in happy-path tests - Assert require.Empty(fx.Mailer.msgs) in skip/no-op tests - Rename enqueueKeptEmail's unused subID param to _ (removes _ = subID smell) - Add sub_id to kept-email error log lines in KeepSubscription and AutoReverse - Add TestComposeCancelEmail_HTMLEscapes: verifies DisplayName is HTML-escaped - Add assertion to TestComposeCancelEmail_Renders - Use .UTC() before .Format() in composeCancelEmail and composeKeptEmail (defensive: robust to callers that pass non-UTC time) --- internal/feat/idleunsub/email.go | 4 +- internal/feat/idleunsub/email_test.go | 11 ++++++ internal/feat/idleunsub/idleunsub.go | 11 +++--- internal/feat/idleunsub/idleunsub_test.go | 44 ++++++++++++++++++++-- internal/feat/idleunsub/stripestub_test.go | 13 +++++++ 5 files changed, 71 insertions(+), 12 deletions(-) diff --git a/internal/feat/idleunsub/email.go b/internal/feat/idleunsub/email.go index c837ca92..fdd3f779 100644 --- a/internal/feat/idleunsub/email.go +++ b/internal/feat/idleunsub/email.go @@ -29,7 +29,7 @@ type keptEmailData struct { func composeCancelEmail(toEmail, displayName, keepURL string, periodEnd time.Time) (email.Message, error) { data := cancelEmailData{ DisplayName: displayName, - CurrentPeriodEnd: periodEnd.Format("January 2, 2006"), + CurrentPeriodEnd: periodEnd.UTC().Format("January 2, 2006"), KeepLink: keepURL, } return renderEmail("cancel", toEmail, "We won't charge you for the next period", data) @@ -38,7 +38,7 @@ func composeCancelEmail(toEmail, displayName, keepURL string, periodEnd time.Tim func composeKeptEmail(toEmail, displayName string, nextRenewal time.Time) (email.Message, error) { data := keptEmailData{ DisplayName: displayName, - NextRenewalDate: nextRenewal.Format("January 2, 2006"), + NextRenewalDate: nextRenewal.UTC().Format("January 2, 2006"), } return renderEmail("kept", toEmail, "Your subscription is still active", data) } diff --git a/internal/feat/idleunsub/email_test.go b/internal/feat/idleunsub/email_test.go index 97aeb333..6a7cf155 100644 --- a/internal/feat/idleunsub/email_test.go +++ b/internal/feat/idleunsub/email_test.go @@ -19,6 +19,17 @@ func TestComposeCancelEmail_Renders(t *testing.T) { require.Contains(t, msg.Text, "March 1, 2026") require.Contains(t, msg.Text, "https://sabermatic.dev/sub/keep?t=abc.def") require.Contains(t, msg.HTML, "Jane") + require.Contains(t, msg.HTML, `X", + "https://x.example/y", end) + require.NoError(t, err) + require.NotContains(t, msg.HTML, "