Skip to content

feat(bulldozer): add verifyDataIntegrity to all Table types#1333

Merged
N2D4 merged 5 commits into
payments-bulldozer-txn-reworkfrom
verify-data-int-payments-bulldozer-txn-rework
Apr 15, 2026
Merged

feat(bulldozer): add verifyDataIntegrity to all Table types#1333
N2D4 merged 5 commits into
payments-bulldozer-txn-reworkfrom
verify-data-int-payments-bulldozer-txn-rework

Conversation

@mantrakp04
Copy link
Copy Markdown
Collaborator

@mantrakp04 mantrakp04 commented Apr 14, 2026

Summary

  • Adds a verifyDataIntegrity() method to the Bulldozer Table interface that returns a SQL query producing error rows when materialized data diverges from re-derivation from inputs (empty result = healthy).
  • Each of the 13 table operators implements verification appropriate to its semantics: full re-derivation for flat-map/sort/group-by/left-join/compact/limit, delegation for filter/map, structural checks for reduce/l-fold/time-fold, and no-op for stored/concat.
  • Adds verifyAllTablesIntegrity(tables) helper that UNION ALLs all tables' queries with a tableid column.
  • Integrates verification into the test lifecycle via an afterEach hook — all 124 existing tests now automatically verify data integrity after each run.

Test plan

  • All 124 existing Bulldozer DB tests pass with the new afterEach verification hook active
  • Uninitialized tables are silently skipped (gated on isInitialized)
  • Verification covers all 13 table types

Made with Cursor

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Bulldozer storage engine for persistent hierarchical data management
    • Introduced Bulldozer TimeFold queue for time-aware scheduled processing
    • Implemented complete payments schema pipeline with multi-phase transformations
    • Added dual-write synchronization between payment records and Bulldozer
    • Launched Bulldozer Studio development tool for schema visualization
    • Added payment customer data query functions
  • Bug Fixes

    • Fixed registration verification null-safety check
  • Documentation

    • Updated development tooling guidance
    • Added PostgreSQL pg_cron support for background jobs
  • Tests

    • Added comprehensive fuzz and performance test suites
    • Added multi-phase payments integration tests

Add a `verifyDataIntegrity()` method to the Bulldozer `Table` interface
that returns a SQL query producing error rows when materialized data
diverges from what the input tables imply. Empty result = healthy.

Each table operator implements verification appropriate to its semantics:
- Full re-derivation (flat-map, sort, group-by, left-join, compact, limit)
- Delegation to internal table (filter, map → nested flat-map)
- Structural group-correspondence checks (reduce, l-fold, time-fold)
- No-op for leaf/virtual tables (stored, concat)

All queries are gated on isInitialized so uninitialized tables are
silently skipped. A `verifyAllTablesIntegrity(tables)` helper UNION ALLs
individual queries with a tableid column for easy debugging.

The test file now runs verification after every test via an afterEach
hook and a trackTable() wrapper around all 68 declare*Table() calls.

Made-with: Cursor
Copilot AI review requested due to automatic review settings April 14, 2026 02:33
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
3 Building Building Preview, Comment Apr 15, 2026 4:02am
3-1776188452445-5Jus Canceled Canceled Apr 15, 2026 4:02am
stack-auth-hosted-components Ready Ready Preview, Comment Apr 15, 2026 4:02am
stack-backend Error Error Apr 15, 2026 4:02am
stack-dashboard Ready Ready Preview, Comment Apr 15, 2026 4:02am
stack-demo Ready Ready Preview, Comment Apr 15, 2026 4:02am
stack-docs Error Error Apr 15, 2026 4:02am
stack-preview-backend Error Error Apr 15, 2026 4:02am
stack-preview-dashboard Ready Ready Preview, Comment Apr 15, 2026 4:02am

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 14, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 18dc4e96-1b5e-4cb8-8147-0ac403bac0b2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

A new Bulldozer database engine is introduced, providing a hierarchical JSONB storage system with composable table operators (group-by, map, filter, sort, join, fold, compact, reduce). Dual-write integration syncs Prisma payment records to Bulldozer stored tables. Payment schema built atop Bulldozer derives transaction events, compacted entries, owned products, and item quantities through multi-phase transformations. Extensive test suites validate operators, performance, and end-to-end payment pipeline correctness.

Changes

Cohort / File(s) Summary
Database Migrations & Schema
apps/backend/prisma/migrations/20260323..., apps/backend/prisma/schema.prisma
Added three new Prisma models (BulldozerStorageEngine, BulldozerTimeFoldQueue, BulldozerTimeFoldMetadata) and migrations with hierarchical JSONB storage, queue-driven processing, and PL/pgSQL worker function. Added endedAt to Subscription and revokedAt to OneTimePurchase models.
Bulldozer Core Engine
apps/backend/src/lib/bulldozer/db/index.ts, utilities.ts, bulldozer-sort-helpers-sql.ts
Core table abstraction, SQL generation utilities (toQueryableSqlQuery, toExecutableSqlStatements, toExecutableSqlTransaction), and large PL/pgSQL helper suite for pointer-based tree sorting and materialized set operations.
Bulldozer Table Operators
apps/backend/src/lib/bulldozer/db/tables/*.ts (12 modules)
Implementations of declareStoredTable, declareMapTable, declareFlatMapTable, declareFilterTable, declareLimitTable, declareConcatTable, declareSortTable, declareGroupByTable, declareLFoldTable, declareTimeFoldTable, declareLeftJoinTable, declareCompactTable, declareReduceTable, each materializing relational/temporal operations in JSONB storage.
Bulldozer Testing Infrastructure
apps/backend/src/lib/bulldozer/db/index.fuzz.test.ts, index.perf.test.ts, apps/backend/prisma/migrations/.../tests/*.ts
Comprehensive fuzz tests (~2220 lines) validating operator composition/invariants, performance regression tests (~1299 lines), and migration test helpers for ltree/timefold queue processing.
Payment Schema Type Definitions
apps/backend/src/lib/payments/schema/types.ts
Typed contracts for payment pipeline (subscriptions, invoices, OTPs, item changes, transactions, transaction entries, events, phase-3 outputs).
Payment Schema Phase 1
apps/backend/src/lib/payments/schema/phase-1/*.ts
Stored table seeding, event generation from source tables, subscription and OTP TimeFold reducers with repeat scheduling and item grants.
Payment Schema Phase 2
apps/backend/src/lib/payments/schema/phase-2/compacted-transaction-entries.ts
Multi-stage pipeline flattening transactions into entries, filtering by type, and compacting item-quantity-changes across time boundaries using expiry markers.
Payment Schema Phase 3
apps/backend/src/lib/payments/schema/phase-3/*.ts
Expiry split algorithm, ledger accumulation (grants/debt/removals with expiry tracking), owned-products folding, item quantities reduction producing final customer-facing ledger state.
Payment Schema Composition
apps/backend/src/lib/payments/schema/index.ts
Factory orchestrating phase-specific builders into cohesive schema with dependency ordering and table categorization.
Payment Integration (Dual-Write)
apps/backend/src/lib/payments/bulldozer-dual-write.ts, payment API routes (apps/backend/src/app/api/latest/.../*.ts)
Conversion functions for Prisma row→Bulldozer row with timestamp fields converted to millis. Dual-write calls integrated into Stripe webhooks, payment creation/update/refund routes, and general payment utility functions.
Payment Data Access
apps/backend/src/lib/payments/customer-data.ts, apps/backend/src/lib/stripe.tsx
Helper queries reading latest per-customer owned products and item quantities from Bulldozer tables. Stripe sync/invoice upserts extended with dual-writes.
Payment Test Suites
apps/backend/src/lib/payments/schema/__tests__/*.test.ts, test-helpers.ts
Integration test helpers for database setup/teardown. Tests for dual-write conversion, phase 1/2/3 pipeline correctness, split algorithm, ledger algorithm, compaction behavior, and end-to-end owned-products/item-quantities isolation.
Backend Scripts & Config
apps/backend/scripts/bulldozer-payments-init.ts, apps/backend/scripts/db-migrations.ts, apps/backend/scripts/run-cron-jobs.ts, apps/backend/package.json
New bulldozer-payments-init script to seed schema tables and ingest Prisma data. Updated dev script to run Bulldozer Studio watcher. Added 30s startup delay to cron-job loops. Added elkjs dependency.
Support & Documentation
.vscode/settings.json, AGENTS.md, apps/dev-launchpad/public/index.html, docker/dev-postgres-with-extensions/Dockerfile, claude/CLAUDE-KNOWLEDGE.md
cSpell word list updates, linting guidance, Bulldozer Studio dev-launchpad entry, pg_cron Docker setup, and Bulldozer operational/correctness knowledge base entries.
Minor API Changes
apps/backend/src/app/api/latest/auth/passkey/initiate-passkey-registration/route.tsx, verification-code-handler.tsx, apps/backend/src/lib/tokens.tsx, apps/backend/src/prisma-client.tsx, apps/backend/src/route-handlers/prisma-handler.tsx
Passkey hint handling via Reflect API, registration verification guard, removed signed_up_at JWT field, dynamic Neon import, and Prisma type inference refactor.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application<br/>(Prisma)
    participant DW as Dual-Write<br/>Converter
    participant Engine as Bulldozer<br/>Storage Engine
    participant Operator as Table<br/>Operator
    participant Store as BulldozerStorage<br/>Engine JSONB
    
    Note over App,Store: Payment Creation Flow
    App->>App: Create/Update<br/>Subscription/OTP
    App->>DW: Call bulldozerWrite*<br/>(prismaRow)
    DW->>DW: Convert to<br/>stored row shape
    DW->>Engine: Execute setRow()<br/>with converted data
    Engine->>Store: UPSERT into<br/>keyPath hierarchy
    Store-->>Engine: Row persisted
    Engine-->>DW: Complete
    DW-->>App: Dual-write done
    
    Note over App,Store: Operator Composition Flow
    Operator->>Operator: Register row-change<br/>trigger on input
    App->>Store: Update source<br/>rows via setRow()
    Store-->>Operator: Trigger fires
    Operator->>Operator: Normalize changes,<br/>recompute affected<br/>groups
    Operator->>Store: Upsert output<br/>rows & metadata
    Store-->>Operator: Changes applied
    Operator->>Operator: Emit downstream<br/>triggers
    
    Note over App,Store: Payment Pipeline (Phase 1→3)
    App->>Store: Seed source tables<br/>(subscriptions, OTPs)
    Store-->>Operator: Data available
    Operator->>Operator: Phase 1: Events<br/>(subscriptionStart,<br/>itemGrantRepeat, etc.)
    Operator->>Operator: Phase 1→2: Transactions<br/>(flatten, concat, group)
    Operator->>Operator: Phase 2: CompactedEntries<br/>(merge item-qty-change<br/>across expiry boundaries)
    Operator->>Operator: Phase 3: OwnedProducts<br/>(LFold: accumulate<br/>grants/revocations)
    Operator->>Operator: Phase 3: ItemQuantities<br/>(LFold: ledger with<br/>grants/debt/removals)
    Operator->>Store: Persist all phase<br/>outputs
    Store-->>App: Final state readable<br/>via customer-data queries
Loading
sequenceDiagram
    participant TimeFold as TimeFold Table<br/>(e.g., subscription<br/>repeat schedule)
    participant Queue as BulldozerTimeFold<br/>Queue
    participant Worker as bulldozer_timefold<br/>_process_queue()<br/>Worker
    participant Engine as Storage<br/>Engine
    
    Note over TimeFold,Engine: TimeFold Initialization & Scheduling
    TimeFold->>Engine: Register row-change<br/>trigger
    TimeFold->>Engine: Initialize state<br/>for each input row<br/>(oldRowData=null)
    Engine->>TimeFold: Trigger fires
    TimeFold->>TimeFold: Run reducer at<br/>timestamp=null<br/>(compute nextTimestamp)
    TimeFold->>Queue: Enqueue nextTimestamp<br/>if > cutoff
    Queue-->>Engine: Queue row created
    
    Note over TimeFold,Engine: TimeFold Scheduled Execution
    Worker->>Queue: SELECT next due<br/>row (scheduledAt ≤ now)<br/>FOR UPDATE SKIP LOCKED
    Queue-->>Worker: Queue row found
    Worker->>Engine: Fetch current state<br/>from Storage
    Engine-->>Worker: Current stateAfter
    Worker->>Worker: Run reducer at<br/>timestamp (reprocess)
    Worker->>Engine: Upsert state,<br/>emitted rows
    Engine-->>Worker: Persisted
    alt nextTimestamp > cutoff
        Worker->>Queue: Re-enqueue with<br/>updated stateAfter,<br/>scheduledAt=nextTimestamp
    end
    Worker->>Queue: DELETE processed<br/>queue row
    alt More due rows?
        Worker->>Queue: Loop (GOTO SELECT)
    else No more due rows
        Worker->>Engine: Update lastProcessedAt<br/>in metadata
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

This introduces a foundational new Bulldozer database system with:

  • Complex hierarchical JSONB storage and pointer-based tree algorithms (815+ lines of PL/pgSQL sort helpers)
  • 13 distinct table operator implementations (~5,500 lines total), each with SQL generation, trigger registration, and integrity checks
  • Dense state machine logic (TimeFold reducer with time-aware state folding and queue-driven scheduling)
  • Full four-phase payment schema pipeline (~2,000+ lines) mapping domain concepts to nested SQL transformations
  • Comprehensive dual-write integration across payment routes (~30 files modified)
  • 3,000+ lines of heterogeneous test suites (fuzz, performance, integration) with complex assertions
  • Novel type system and SQL DSL for table algebra

Requires deep understanding of Bulldozer architecture, operator semantics, state folding algorithms, expiry/ledger logic, and integration points.

Possibly related PRs

Suggested reviewers

  • BilalG1

Poem

🐰 A rabbit's ode to Bulldozer's might:

From JSONB depths and pointer-trees,
We fold the rows with graceful ease—
Each phase transforms and learns anew,
Till ledgers bloom with grants so true!
Dual-writes sweep through Prisma's hall,
While TimeFold schedulers heed the call. 🏗️✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch verify-data-int-payments-bulldozer-txn-rework

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands Bulldozer’s table interface to support integrity verification queries and adds a new payments “Bulldozer schema” pipeline (plus dual-write + ingestion wiring) that materializes customer payment state from Prisma data.

Changes:

  • Add Table.verifyDataIntegrity() and a verifyAllTablesIntegrity(tables) helper to aggregate per-table integrity checks.
  • Introduce a payments Bulldozer schema (stored tables → events → transactions → compacted entries → owned-products / item-quantities) plus real-Postgres tests for the pipeline.
  • Add payments dual-write + a migrations-time init/ingress script; add local dev pg_cron support and dev UX updates (launchpad + backend dev script).

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pnpm-lock.yaml Adds elkjs dependency to the lockfile.
docker/dev-postgres-with-extensions/Dockerfile Installs/configures pg_cron in the dev Postgres image.
claude/CLAUDE-KNOWLEDGE.md Adds Bulldozer/pg_cron and payments pipeline knowledge entries.
apps/dev-launchpad/public/index.html Adds “Bulldozer Studio” to the launchpad services list.
apps/backend/src/route-handlers/prisma-handler.tsx Makes Prisma create invocation safer via runtime Reflect checks.
apps/backend/src/prisma-client.tsx Converts a static import to a dynamic import to avoid eager dependency loading.
apps/backend/src/lib/tokens.tsx Removes signed_up_at from the access token payload.
apps/backend/src/lib/stripe.tsx Adds payments dual-write calls after Stripe upserts.
apps/backend/src/lib/payments/schema/types.ts Introduces payments pipeline TypeScript row/type definitions.
apps/backend/src/lib/payments/schema/phase-3/split-algo.ts Extracts Phase 3 split algorithm SQL builder for reuse/testing.
apps/backend/src/lib/payments/schema/phase-3/owned-products.ts Adds Phase 3 OwnedProducts fold table definition.
apps/backend/src/lib/payments/schema/phase-3/ledger-algo.ts Extracts Phase 3 ledger reducer SQL builder for item quantities.
apps/backend/src/lib/payments/schema/phase-3/item-quantities.ts Adds Phase 3 ItemQuantities fold table definition.
apps/backend/src/lib/payments/schema/phase-3/item-changes-with-expiries.ts Adds Phase 3 join/split pipeline to enrich changes with expiries.
apps/backend/src/lib/payments/schema/phase-2/compacted-transaction-entries.ts Adds Phase 2 entry flattening + compaction pipeline.
apps/backend/src/lib/payments/schema/phase-1/transactions.ts Adds Phase 1 transaction assembly from events/manual inputs.
apps/backend/src/lib/payments/schema/phase-1/stored-tables.ts Defines payments “seed” stored tables.
apps/backend/src/lib/payments/schema/phase-1/otp-timefold-algo.ts Adds OTP TimeFold reducer SQL builder.
apps/backend/src/lib/payments/schema/phase-1/events.ts Adds Phase 1 event derivation (TimeFold + maps + joins).
apps/backend/src/lib/payments/schema/index.ts Composes the payments schema and exposes init-order table lists/categories.
apps/backend/src/lib/payments/schema/tests/test-helpers.ts Adds real-Postgres test DB helper for payments pipeline tests.
apps/backend/src/lib/payments/schema/tests/phase-2.test.ts Adds Phase 2 correctness tests against real Postgres.
apps/backend/src/lib/payments/schema/tests/integration-2-3.test.ts Adds Phase 2→3 integration tests against real Postgres.
apps/backend/src/lib/payments/schema/tests/dual-write.test.ts Adds dual-write conversion + setRow behavior tests.
apps/backend/src/lib/payments/customer-data.ts Adds customer-facing reads from the Phase 3 output tables.
apps/backend/src/lib/payments/bulldozer-dual-write.ts Adds Prisma→Bulldozer conversions and dual-write executors.
apps/backend/src/lib/payments.tsx Adds dual-write calls after payment mutations in core payments logic.
apps/backend/src/lib/bulldozer/db/utilities.ts Adds/updates Bulldozer SQL helper types/builders and path utilities.
apps/backend/src/lib/bulldozer/db/tables/stored-table.ts Implements verifyDataIntegrity() for stored tables (no-op empty result).
apps/backend/src/lib/bulldozer/db/tables/map-table.ts Delegates verifyDataIntegrity() via nested FlatMap implementation.
apps/backend/src/lib/bulldozer/db/tables/filter-table.ts Delegates verifyDataIntegrity() via nested FlatMap implementation.
apps/backend/src/lib/bulldozer/db/tables/concat-table.ts Implements verifyDataIntegrity() as a no-op for concat.
apps/backend/src/lib/bulldozer/db/index.ts Adds verifyDataIntegrity() to Table and verifyAllTablesIntegrity().
apps/backend/src/lib/bulldozer/db/example-schema.ts Adds an example schema file demonstrating Bulldozer operators.
apps/backend/src/app/api/latest/payments/purchases/purchase-session/route.tsx Dual-writes subscription updates for purchase-session flows.
apps/backend/src/app/api/latest/payments/products/[customer_type]/[customer_id]/switch/route.ts Dual-writes subscription updates for product switch flows.
apps/backend/src/app/api/latest/payments/products/[customer_type]/[customer_id]/[product_id]/route.ts Dual-writes subscription updates for product cancellation.
apps/backend/src/app/api/latest/payments/items/[customer_type]/[customer_id]/[item_id]/update-quantity/route.ts Dual-writes item quantity changes after Prisma writes.
apps/backend/src/app/api/latest/internal/payments/transactions/refund/route.tsx Dual-writes refunded purchases/subscriptions after refund updates.
apps/backend/src/app/api/latest/integrations/stripe/webhooks/route.tsx Dual-writes OneTimePurchase upserts from Stripe webhooks.
apps/backend/src/app/api/latest/auth/passkey/register/verification-code-handler.tsx Adds a missing null check for registrationInfo.
apps/backend/src/app/api/latest/auth/passkey/initiate-passkey-registration/route.tsx Avoids relying on hints typing by using Reflect and deletion.
apps/backend/scripts/run-cron-jobs.ts Adds an initial wait before starting cron loop.
apps/backend/scripts/db-migrations.ts Runs payments Bulldozer init/ingress after seed/init/migrate.
apps/backend/scripts/bulldozer-payments-init.ts Adds payments schema initialization + Prisma→Bulldozer ingress script.
apps/backend/prisma/schema.prisma Adds Bulldozer tables to Prisma schema; adds endedAt/revokedAt.
apps/backend/prisma/migrations/20260413043028_add_revoked_at_to_otp/migration.sql Adds revokedAt to OneTimePurchase.
apps/backend/prisma/migrations/20260413040008_add_subscription_ended_at/migration.sql Adds endedAt to Subscription.
apps/backend/prisma/migrations/20260323150000_add_bulldozer_timefold_queue/tests/process-queue.ts Adds migration test for timefold queue processing.
apps/backend/prisma/migrations/20260323150000_add_bulldozer_timefold_queue/migration.sql Adds timefold queue + worker function + pg_cron best-effort setup.
apps/backend/prisma/migrations/20260323120000_add_bulldozer_data/tests/ltree-queries.ts Adds migration test coverage for BulldozerStorageEngine semantics.
apps/backend/prisma/migrations/20260323120000_add_bulldozer_data/migration.sql Adds BulldozerStorageEngine table + indexes + seeded roots.
apps/backend/package.json Adds elkjs and runs run-bulldozer-studio in pnpm dev.
AGENTS.md Updates lint instructions and adds an agent guideline.
.vscode/settings.json Reorders/adds cSpell words.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 14, 2026

Greptile Summary

This PR adds a verifyDataIntegrity() method to all 13 Bulldozer Table implementations and wires it into the test lifecycle via an afterEach hook, giving the existing 124 tests an automatic data-integrity check after each run. The approach is sound — full re-derivation for flat-map, group-by, left-join, compact, and (partially) limit/sort; structural guards for reduce, l-fold, and time-fold; no-ops for stored and concat.

Two implementations diverge from the PR's stated "full re-derivation" goal in ways that leave real gaps:

  • sort-table: only rowData is compared; the computed rowSortKey — the sole purpose of the operator and a direct input to downstream LFold tables — is never verified.
  • limit-table: only extra rows and over-limit counts are checked; under-selection (too few rows materialized, or wrong rows chosen) is completely undetected.

Confidence Score: 4/5

  • Safe to merge once the two gaps in sort-table and limit-table verifyDataIntegrity are addressed — those operators silently pass integrity checks that should fail on corrupt materialized state.
  • Two P1 findings: sort-table does not verify computed sort keys (its primary output), and limit-table does not detect under-selection. Both contradict the PR's own "full re-derivation" claim and could leave real bugs undetected by the test lifecycle hook. All other 11 table implementations are correct or appropriately limited by design.
  • apps/backend/src/lib/bulldozer/db/tables/sort-table.ts and apps/backend/src/lib/bulldozer/db/tables/limit-table.ts need updated verifyDataIntegrity implementations.

Important Files Changed

Filename Overview
apps/backend/src/lib/bulldozer/db/index.ts Adds verifyDataIntegrity() to the Table interface and a verifyAllTablesIntegrity() helper that UNION ALLs all tables' checks. SQL escaping (single-quote replacement) is correct for PostgreSQL. Empty-table guard returns a well-typed zero-row query.
apps/backend/src/lib/bulldozer/db/tables/sort-table.ts verifyDataIntegrity only compares rowData, never the computed rowSortKey — a critical gap since the sort key is the primary output of this operator and is relied on by downstream LFold tables.
apps/backend/src/lib/bulldozer/db/tables/limit-table.ts verifyDataIntegrity only detects extra rows and over-limit counts; under-selection (too few materialized rows or wrong rows selected) goes entirely undetected, contrary to the PR's "full re-derivation" claim.
apps/backend/src/lib/bulldozer/db/tables/l-fold-table.ts Structural-only check: verifies group membership (missing/extra groups) but not actual folded row data. Incorrect accumulated values would pass. Coverage gap is notable but likely constrained by the recursive WITH design.
apps/backend/src/lib/bulldozer/db/tables/reduce-table.ts Structural checks: verifies group existence and that each group has exactly one row, but not the data values. Custom PostgreSQL aggregate cannot be embedded in a plain SqlQuery, constraining the check.
apps/backend/src/lib/bulldozer/db/tables/time-fold-table.ts Only checks for extra groups (orphaned output groups with no corresponding input); missing groups and data correctness are not verified. Non-deterministic timestamps make full re-derivation impractical.
apps/backend/src/lib/bulldozer/db/index.test.ts afterEach hook cleanly wires verifyAllTablesIntegrity into the test lifecycle. Opt-in trackTable pattern could silently miss tables if future tests forget to call it.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[verifyAllTablesIntegrity] -->|UNION ALL| B[stored-table\nno-op ✓]
    A -->|UNION ALL| C[concat-table\nno-op ✓]
    A -->|UNION ALL| D[flat-map-table\nfull re-derive ✓]
    A -->|UNION ALL| E[filter-table\ndelegates to flat-map ✓]
    A -->|UNION ALL| F[map-table\ndelegates to flat-map ✓]
    A -->|UNION ALL| G[group-by-table\nfull re-derive ✓]
    A -->|UNION ALL| H[left-join-table\nfull re-derive ✓]
    A -->|UNION ALL| I[compact-table\nfull re-derive ✓]
    A -->|UNION ALL| J[sort-table\nrowData only ⚠️]
    A -->|UNION ALL| K[limit-table\nextra rows + count only ⚠️]
    A -->|UNION ALL| L[reduce-table\nstructural only 📋]
    A -->|UNION ALL| M[l-fold-table\ngroup membership only 📋]
    A -->|UNION ALL| N[time-fold-table\nextra groups only 📋]

    style J fill:#ff9999
    style K fill:#ff9999
    style L fill:#ffffcc
    style M fill:#ffffcc
    style N fill:#ffffcc
Loading

Fix All in Claude Code Fix All in Cursor Fix All in Codex

Prompt To Fix All With AI
This is a comment left during a code review.
Path: apps/backend/src/lib/bulldozer/db/tables/sort-table.ts
Line: 417-452

Comment:
**Sort key never verified**

Both `"expected"` and `"actual"` CTEs select only `rowdata`, not `rowsortkey`. A stored sort key that is corrupted (wrong value, wrong row, stale after an incremental update) passes silently. The sort key is the primary product of this operator; downstream `LFold` tables fold in sort-key order, so a silent sort-key mismatch here propagates undetected to those tables. The PR description claims "full re-derivation" for sort, but the check only verifies row-data propagation.

The fix is to also compute the expected sort key via `getSortKey` and compare:

```
"expected" AS (
  SELECT
    "r"."groupkey" AS "groupKey",
    "r"."rowidentifier" AS "rowIdentifier",
    "r"."rowdata" AS "rowData",
    to_jsonb("sk"."newSortKey") AS "computedSortKey"
  FROM (${allInputRows}) AS "r"
  CROSS JOIN LATERAL (
    SELECT ${options.getSortKey}
    FROM (
      SELECT "r"."rowidentifier" AS "rowIdentifier",
             "r"."rowsortkey"   AS "oldSortKey",
             "r"."rowdata"      AS "rowData"
    ) AS "sortKeyInput"
  ) AS "sk"
),
"actual" AS (
  SELECT "r"."groupkey" AS "groupKey", "r"."rowidentifier" AS "rowIdentifier",
         "r"."rowdata"    AS "rowData",
         "r"."rowsortkey" AS "computedSortKey"
  FROM (${allActualRows}) AS "r"
)
-- then JOIN on groupKey + rowIdentifier and flag differences in rowData OR computedSortKey
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: apps/backend/src/lib/bulldozer/db/tables/limit-table.ts
Line: 496-535

Comment:
**Under-selection goes undetected**

The check catches (1) rows in `actual` that don't exist in the input (`extraRows`) and (2) groups whose count exceeds the limit (`overLimit`), but it never detects the case where the materialized table stores *fewer* rows than it should. For example, if the input has 5 rows and the limit is 3, a bug that only materializes 2 rows produces no errors — count 2 ≤ 3 and no ghost rows. Similarly, if the wrong 3 rows are selected (all valid inputs, just not the "first 3 by sort key"), neither check fires.

The PR description claims "full re-derivation" for limit, which requires re-computing the expected selected rows per group (re-applying the same sort-key ordering + LIMIT N logic) and diffing that against `actual`. The current check is a one-sided "sanity guard" rather than a full re-derivation.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: apps/backend/src/lib/bulldozer/db/tables/l-fold-table.ts
Line: 802-835

Comment:
**Structural-only check doesn't verify folded row data**

The verification only confirms that the set of output groups matches the set of input groups (`missing_group` / `extra_group`). Wrong data values in any output row — for example, a stale accumulated state that was never recomputed after an upstream change — would pass this check entirely. While `reduce` and `time-fold` face similar constraints (the custom aggregate / recursive WITH can't easily be embedded in a bare `SqlQuery`), it's worth documenting this coverage gap explicitly in the method body or a code comment, so future maintainers know these table types only get a structural guard rather than a full re-derivation.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: apps/backend/src/lib/bulldozer/db/index.test.ts
Line: 239-251

Comment:
**Opt-in tracking could silently miss tables**

`allInitializedTables` is populated only when a test (or its helper) explicitly calls `trackTable`. Any test that declares and initializes a table directly — without going through one of the helper functions — won't have that table integrity-checked in `afterEach`. A future author adding a test might not notice this requirement. Consider adding a comment here warning that every initialized table must be wrapped with `trackTable`, or explore hooking into table `init()` so tracking is automatic.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat(bulldozer): add verifyDataIntegrity..." | Re-trigger Greptile

Comment on lines +417 to +452
verifyDataIntegrity: () => {
const allInputRows = options.fromTable.listRowsInGroup({
start: "start", end: "end", startInclusive: true, endInclusive: true,
});
const allActualRows = table.listRowsInGroup({
start: "start", end: "end", startInclusive: true, endInclusive: true,
});
return sqlQuery`
WITH "expected" AS (
SELECT "r"."groupkey" AS "groupKey", "r"."rowidentifier" AS "rowIdentifier", "r"."rowdata" AS "rowData"
FROM (${allInputRows}) AS "r"
),
"actual" AS (
SELECT "r"."groupkey" AS "groupKey", "r"."rowidentifier" AS "rowIdentifier", "r"."rowdata" AS "rowData"
FROM (${allActualRows}) AS "r"
)
SELECT
CASE
WHEN "expected"."rowIdentifier" IS NULL THEN 'extra_row'
WHEN "actual"."rowIdentifier" IS NULL THEN 'missing_row'
ELSE 'data_mismatch'
END AS errortype,
COALESCE("expected"."groupKey", "actual"."groupKey") AS groupkey,
COALESCE("expected"."rowIdentifier", "actual"."rowIdentifier") AS rowidentifier,
"expected"."rowData" AS expected,
"actual"."rowData" AS actual
FROM "expected"
FULL OUTER JOIN "actual"
ON "expected"."groupKey" IS NOT DISTINCT FROM "actual"."groupKey"
AND "expected"."rowIdentifier" = "actual"."rowIdentifier"
WHERE ("expected"."rowIdentifier" IS NULL
OR "actual"."rowIdentifier" IS NULL
OR "expected"."rowData" IS DISTINCT FROM "actual"."rowData")
AND ${isInitializedExpression}
`;
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Sort key never verified

Both "expected" and "actual" CTEs select only rowdata, not rowsortkey. A stored sort key that is corrupted (wrong value, wrong row, stale after an incremental update) passes silently. The sort key is the primary product of this operator; downstream LFold tables fold in sort-key order, so a silent sort-key mismatch here propagates undetected to those tables. The PR description claims "full re-derivation" for sort, but the check only verifies row-data propagation.

The fix is to also compute the expected sort key via getSortKey and compare:

"expected" AS (
  SELECT
    "r"."groupkey" AS "groupKey",
    "r"."rowidentifier" AS "rowIdentifier",
    "r"."rowdata" AS "rowData",
    to_jsonb("sk"."newSortKey") AS "computedSortKey"
  FROM (${allInputRows}) AS "r"
  CROSS JOIN LATERAL (
    SELECT ${options.getSortKey}
    FROM (
      SELECT "r"."rowidentifier" AS "rowIdentifier",
             "r"."rowsortkey"   AS "oldSortKey",
             "r"."rowdata"      AS "rowData"
    ) AS "sortKeyInput"
  ) AS "sk"
),
"actual" AS (
  SELECT "r"."groupkey" AS "groupKey", "r"."rowidentifier" AS "rowIdentifier",
         "r"."rowdata"    AS "rowData",
         "r"."rowsortkey" AS "computedSortKey"
  FROM (${allActualRows}) AS "r"
)
-- then JOIN on groupKey + rowIdentifier and flag differences in rowData OR computedSortKey
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/backend/src/lib/bulldozer/db/tables/sort-table.ts
Line: 417-452

Comment:
**Sort key never verified**

Both `"expected"` and `"actual"` CTEs select only `rowdata`, not `rowsortkey`. A stored sort key that is corrupted (wrong value, wrong row, stale after an incremental update) passes silently. The sort key is the primary product of this operator; downstream `LFold` tables fold in sort-key order, so a silent sort-key mismatch here propagates undetected to those tables. The PR description claims "full re-derivation" for sort, but the check only verifies row-data propagation.

The fix is to also compute the expected sort key via `getSortKey` and compare:

```
"expected" AS (
  SELECT
    "r"."groupkey" AS "groupKey",
    "r"."rowidentifier" AS "rowIdentifier",
    "r"."rowdata" AS "rowData",
    to_jsonb("sk"."newSortKey") AS "computedSortKey"
  FROM (${allInputRows}) AS "r"
  CROSS JOIN LATERAL (
    SELECT ${options.getSortKey}
    FROM (
      SELECT "r"."rowidentifier" AS "rowIdentifier",
             "r"."rowsortkey"   AS "oldSortKey",
             "r"."rowdata"      AS "rowData"
    ) AS "sortKeyInput"
  ) AS "sk"
),
"actual" AS (
  SELECT "r"."groupkey" AS "groupKey", "r"."rowidentifier" AS "rowIdentifier",
         "r"."rowdata"    AS "rowData",
         "r"."rowsortkey" AS "computedSortKey"
  FROM (${allActualRows}) AS "r"
)
-- then JOIN on groupKey + rowIdentifier and flag differences in rowData OR computedSortKey
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code Fix in Cursor Fix in Codex

Comment on lines +496 to +535
verifyDataIntegrity: () => {
const allInputRows = options.fromTable.listRowsInGroup({
start: "start", end: "end", startInclusive: true, endInclusive: true,
});
const allActualRows = table.listRowsInGroup({
start: "start", end: "end", startInclusive: true, endInclusive: true,
});
return sqlQuery`
WITH "inputRows" AS (
SELECT "r"."groupkey" AS "groupKey", "r"."rowidentifier" AS "rowIdentifier", "r"."rowdata" AS "rowData"
FROM (${allInputRows}) AS "r"
),
"actual" AS (
SELECT "r"."groupkey" AS "groupKey", "r"."rowidentifier" AS "rowIdentifier", "r"."rowdata" AS "rowData"
FROM (${allActualRows}) AS "r"
),
"extraRows" AS (
SELECT 'extra_row' AS errortype,
"actual"."groupKey" AS groupkey, "actual"."rowIdentifier" AS rowidentifier,
NULL::jsonb AS expected, "actual"."rowData" AS actual
FROM "actual"
LEFT JOIN "inputRows"
ON "inputRows"."groupKey" IS NOT DISTINCT FROM "actual"."groupKey"
AND "inputRows"."rowIdentifier" = "actual"."rowIdentifier"
WHERE "inputRows"."rowIdentifier" IS NULL
),
"overLimit" AS (
SELECT 'over_limit' AS errortype,
"counts"."groupKey" AS groupkey, NULL::text AS rowidentifier,
to_jsonb("counts"."cnt") AS expected, to_jsonb(${normalizedLimit}) AS actual
FROM (
SELECT "groupKey", COUNT(*)::int AS "cnt" FROM "actual" GROUP BY "groupKey"
) AS "counts"
WHERE "counts"."cnt" > ${normalizedLimit}
)
SELECT * FROM "extraRows" WHERE ${isInitializedExpression}
UNION ALL
SELECT * FROM "overLimit" WHERE ${isInitializedExpression}
`;
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Under-selection goes undetected

The check catches (1) rows in actual that don't exist in the input (extraRows) and (2) groups whose count exceeds the limit (overLimit), but it never detects the case where the materialized table stores fewer rows than it should. For example, if the input has 5 rows and the limit is 3, a bug that only materializes 2 rows produces no errors — count 2 ≤ 3 and no ghost rows. Similarly, if the wrong 3 rows are selected (all valid inputs, just not the "first 3 by sort key"), neither check fires.

The PR description claims "full re-derivation" for limit, which requires re-computing the expected selected rows per group (re-applying the same sort-key ordering + LIMIT N logic) and diffing that against actual. The current check is a one-sided "sanity guard" rather than a full re-derivation.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/backend/src/lib/bulldozer/db/tables/limit-table.ts
Line: 496-535

Comment:
**Under-selection goes undetected**

The check catches (1) rows in `actual` that don't exist in the input (`extraRows`) and (2) groups whose count exceeds the limit (`overLimit`), but it never detects the case where the materialized table stores *fewer* rows than it should. For example, if the input has 5 rows and the limit is 3, a bug that only materializes 2 rows produces no errors — count 2 ≤ 3 and no ghost rows. Similarly, if the wrong 3 rows are selected (all valid inputs, just not the "first 3 by sort key"), neither check fires.

The PR description claims "full re-derivation" for limit, which requires re-computing the expected selected rows per group (re-applying the same sort-key ordering + LIMIT N logic) and diffing that against `actual`. The current check is a one-sided "sanity guard" rather than a full re-derivation.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code Fix in Cursor Fix in Codex

Comment on lines +802 to +835
verifyDataIntegrity: () => {
const allInputGroups = options.fromTable.listGroups({
start: "start", end: "end", startInclusive: true, endInclusive: true,
});
const allActualGroups = table.listGroups({
start: "start", end: "end", startInclusive: true, endInclusive: true,
});
return sqlQuery`
WITH "inputGroups" AS (
SELECT "g"."groupkey" AS "groupKey" FROM (${allInputGroups}) AS "g"
),
"actualGroups" AS (
SELECT "g"."groupkey" AS "groupKey" FROM (${allActualGroups}) AS "g"
),
"missingGroups" AS (
SELECT 'missing_group' AS errortype,
"inputGroups"."groupKey" AS groupkey, NULL::text AS rowidentifier,
NULL::jsonb AS expected, NULL::jsonb AS actual
FROM "inputGroups"
LEFT JOIN "actualGroups" ON "actualGroups"."groupKey" IS NOT DISTINCT FROM "inputGroups"."groupKey"
WHERE "actualGroups"."groupKey" IS NULL
),
"extraGroups" AS (
SELECT 'extra_group' AS errortype,
"actualGroups"."groupKey" AS groupkey, NULL::text AS rowidentifier,
NULL::jsonb AS expected, NULL::jsonb AS actual
FROM "actualGroups"
LEFT JOIN "inputGroups" ON "inputGroups"."groupKey" IS NOT DISTINCT FROM "actualGroups"."groupKey"
WHERE "inputGroups"."groupKey" IS NULL
)
SELECT * FROM "missingGroups" WHERE ${isInitializedExpression}
UNION ALL SELECT * FROM "extraGroups" WHERE ${isInitializedExpression}
`;
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Structural-only check doesn't verify folded row data

The verification only confirms that the set of output groups matches the set of input groups (missing_group / extra_group). Wrong data values in any output row — for example, a stale accumulated state that was never recomputed after an upstream change — would pass this check entirely. While reduce and time-fold face similar constraints (the custom aggregate / recursive WITH can't easily be embedded in a bare SqlQuery), it's worth documenting this coverage gap explicitly in the method body or a code comment, so future maintainers know these table types only get a structural guard rather than a full re-derivation.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/backend/src/lib/bulldozer/db/tables/l-fold-table.ts
Line: 802-835

Comment:
**Structural-only check doesn't verify folded row data**

The verification only confirms that the set of output groups matches the set of input groups (`missing_group` / `extra_group`). Wrong data values in any output row — for example, a stale accumulated state that was never recomputed after an upstream change — would pass this check entirely. While `reduce` and `time-fold` face similar constraints (the custom aggregate / recursive WITH can't easily be embedded in a bare `SqlQuery`), it's worth documenting this coverage gap explicitly in the method body or a code comment, so future maintainers know these table types only get a structural guard rather than a full re-derivation.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code Fix in Cursor Fix in Codex

Comment on lines +239 to +251
const allInitializedTables: Table<any, any, any>[] = [];
function trackTable<T extends Table<any, any, any>>(t: T): T {
allInitializedTables.push(t);
return t;
}

afterEach(async () => {
if (allInitializedTables.length > 0) {
const errors = await readRows(verifyAllTablesIntegrity(allInitializedTables));
expect(errors).toEqual([]);
}
allInitializedTables.length = 0;
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Opt-in tracking could silently miss tables

allInitializedTables is populated only when a test (or its helper) explicitly calls trackTable. Any test that declares and initializes a table directly — without going through one of the helper functions — won't have that table integrity-checked in afterEach. A future author adding a test might not notice this requirement. Consider adding a comment here warning that every initialized table must be wrapped with trackTable, or explore hooking into table init() so tracking is automatic.

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/backend/src/lib/bulldozer/db/index.test.ts
Line: 239-251

Comment:
**Opt-in tracking could silently miss tables**

`allInitializedTables` is populated only when a test (or its helper) explicitly calls `trackTable`. Any test that declares and initializes a table directly — without going through one of the helper functions — won't have that table integrity-checked in `afterEach`. A future author adding a test might not notice this requirement. Consider adding a comment here warning that every initialized table must be wrapped with `trackTable`, or explore hooking into table `init()` so tracking is automatic.

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code Fix in Cursor Fix in Codex

@mantrakp04 mantrakp04 self-assigned this Apr 14, 2026
Copy link
Copy Markdown

@vercel vercel Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Suggestion:

Type declaration for SqlQuery.toStatement is missing the outputColumns parameter, causing TypeScript build failure when called with 2 arguments.

Fix on Vercel

@N2D4 N2D4 requested review from N2D4 and aadesh18 April 14, 2026 03:35
Copy link
Copy Markdown
Collaborator

@aadesh18 aadesh18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a verify-data-integrity.ts script — can you update that one so it runs all of these integrity checks?

Comment on lines +245 to +251
afterEach(async () => {
if (allInitializedTables.length > 0) {
const errors = await readRows(verifyAllTablesIntegrity(allInitializedTables));
expect(errors).toEqual([]);
}
allInitializedTables.length = 0;
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also do this in index.fuzz.test.ts and index.perf.test.ts?

Comment on lines +145 to +157
// any is used here because the verifier works with heterogeneous table types
export function verifyAllTablesIntegrity(tables: Table<any, any, any>[]): SqlQuery<Iterable<{ tableId: string, errorType: string, groupKey: Json | null, rowIdentifier: RowIdentifier | null, expected: Json | null, actual: Json | null }>> {
if (tables.length === 0) {
return sqlQuery`SELECT NULL::text AS tableid, NULL::text AS errortype, NULL::jsonb AS groupkey, NULL::text AS rowidentifier, NULL::jsonb AS expected, NULL::jsonb AS actual WHERE false`;
}
const combined: { sql: string } = {
sql: tables.map(t => {
const label = tableIdToDebugString(t.tableId).replaceAll("'", "''");
return `SELECT '${label}' AS tableid, "v".* FROM (${t.verifyDataIntegrity().sql}) AS "v"`;
}).join("\nUNION ALL\n"),
};
return sqlQuery`${combined}`;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary? feels like it could just be a normal for-loop inside the verify-data-integrity script

…tables

Added a recursive verification process for all payments tables in the Bulldozer system. This includes the implementation of a `verifyDataIntegrity()` method for each table, which checks for data consistency and throws errors if discrepancies are found. The verification is integrated into the main execution flow, ensuring that data integrity is maintained across all operations. Additionally, updated tests to validate the integrity of initialized tables after each test execution.
Comment thread apps/backend/src/lib/tokens.tsx Outdated
Co-authored-by: vercel[bot] <35613825+vercel[bot]@users.noreply.github.com>
@N2D4 N2D4 merged commit de8ab95 into payments-bulldozer-txn-rework Apr 15, 2026
6 of 12 checks passed
@N2D4 N2D4 deleted the verify-data-int-payments-bulldozer-txn-rework branch April 15, 2026 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants