[Refactor] Improve CI Run Times by Reducing Test Flakiness and Speeding up Test Suite#1166
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRemoved STACK_FORCE_EXTERNAL_DB_SYNC from multiple GitHub Actions; removed stopWhenIdle idle-stop behavior from external-db-sync poller and sequencer routes; consolidated outbox email helpers into shared e2e backend-helpers and updated tests to use status-based waiting; removed force-sync utilities from external-db-sync test helpers. Changes
Sequence Diagram(s)sequenceDiagram
participant CI as CI / Workflow
participant Test as E2E Tests
participant API as Backend API (poller/sequencer)
participant DB as External DB
CI->>Test: start e2e job (no STACK_FORCE_EXTERNAL_DB_SYNC)
Test->>API: GET /internal/external-db-sync/(poller|sequencer) (start iteration)
API->>DB: query pending requests / sequences
alt pending requests exist
API->>API: process/enqueue pending work
API-->>Test: iteration result (processed>0, stopReason=null)
else no pending requests
API->>API: continue until max-duration (no idle stop)
API-->>Test: iteration result (processed=0, stopReason=null)
end
Test->>DB: poll for marker/metadata (waitForCondition)
DB-->>Test: marker appears -> Test asserts external state
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile OverviewGreptile Summary
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Cron as Vercel Cron
participant Seq as Sequencer endpoint
participant DB as Primary DB
participant Queue as Sync queue
participant Poll as Poller endpoint
participant QStash as Upstash QStash
Cron->>Seq: "GET /sequencer?maxDurationMs"
loop "until maxDurationMs"
Seq->>DB: "backfill sequence ids"
DB-->>Seq: "didUpdate + tenantIds"
opt "didUpdate"
Seq->>Queue: "enqueueExternalDbSyncBatch(tenantIds)"
end
end
Seq-->>Cron: "200 OK"
Cron->>Poll: "GET /poller?maxDurationMs"
loop "until maxDurationMs"
Poll->>DB: "claimPendingRequests()"
DB-->>Poll: "OutgoingRequest[]"
opt "requests.length>0"
Poll->>QStash: "batchJSON/publishJSON"
Poll->>DB: "deleteOutgoingRequests(ids)"
end
end
Poll-->>Cron: "200 OK"
|
Additional Comments (2)
|
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce E2E test flakiness by removing email/cron timing assumptions and consolidating polling helpers used across the backend API test suite.
Changes:
- Replace OTP-based sign-in and fixed sleeps with faster signup + polling helpers for email outbox status.
- Remove “force external DB sync” test utilities and adjust external DB sync tests to rely on cron-driven syncing.
- Update internal external-db-sync cron endpoints and CI workflows (env + test invocation behavior).
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| apps/e2e/tests/backend/endpoints/api/v1/users.test.ts | Switches to Auth.fastSignUp() to avoid OTP email timing. |
| apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-utils.ts | Removes forced-sync feature and related env/config logic. |
| apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-race.test.ts | Replaces forced sync / long sleep with a short waitForCondition-based check. |
| apps/e2e/tests/backend/endpoints/api/v1/emails/outbox-api.test.ts | Uses centralized waitForOutboxEmailWithStatus polling instead of fixed sleeps. |
| apps/e2e/tests/backend/endpoints/api/v1/emails/email-queue.test.ts | Removes duplicated outbox polling helpers in favor of backend-helpers exports. |
| apps/e2e/tests/backend/endpoints/api/v1/emails/email-helpers.ts | Re-exports outbox helpers from backend-helpers. |
| apps/e2e/tests/backend/backend-helpers.ts | Adds shared outbox helpers + includes outbox dump in OTP failure diagnostics. |
| apps/backend/src/app/api/latest/internal/external-db-sync/sequencer/route.ts | Removes stopWhenIdle and adds backfill progress logging. |
| apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts | Removes stopWhenIdle behavior and adds “pending requests” logging. |
| .github/workflows/setup-tests.yaml | Removes STACK_FORCE_EXTERNAL_DB_SYNC. |
| .github/workflows/setup-tests-with-custom-base-port.yaml | Removes STACK_FORCE_EXTERNAL_DB_SYNC. |
| .github/workflows/restart-dev-and-test.yaml | Removes STACK_FORCE_EXTERNAL_DB_SYNC. |
| .github/workflows/restart-dev-and-test-with-custom-base-port.yaml | Removes STACK_FORCE_EXTERNAL_DB_SYNC. |
| .github/workflows/e2e-source-of-truth-api-tests.yaml | Removes STACK_FORCE_EXTERNAL_DB_SYNC. |
| .github/workflows/e2e-custom-base-port-api-tests.yaml | Removes STACK_FORCE_EXTERNAL_DB_SYNC. |
| .github/workflows/e2e-api-tests.yaml | Removes STACK_FORCE_EXTERNAL_DB_SYNC and changes prod-mode PR test command to append a mail filter. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@apps/e2e/tests/backend/backend-helpers.ts`:
- Around line 203-210: In waitForOutboxEmailWithStatus, don't assume the first
returned email matches the desired status; instead inspect all emails returned
by getOutboxEmails({ subject }) (e.g., use Array.prototype.find or filter) and
return when any email's status === status (or return the list of matching
emails) so the function succeeds if any email with the subject has the requested
status rather than only checking emails[0]; update the return value and loop
condition accordingly.
- Around line 189-210: The helpers getOutboxEmails and
waitForOutboxEmailWithStatus use untyped "any" (and an eslint-disable) which
violates the guideline—replace the implicit any with a concrete type (e.g.,
define/use OutboxEmail or reuse EmailOutboxCrud/EmailOutboxItem from
stack-shared) and update function signatures/variables to use that type; remove
the eslint-disable and, if you truly cannot avoid any, add a one-line comment
explaining why and where (but prefer creating/importing the OutboxEmail type and
using it for listResponse.body.items and the emails variable in
waitForOutboxEmailWithStatus).
In `@apps/e2e/tests/backend/endpoints/api/v1/external-db-sync-race.test.ts`:
- Around line 376-393: The current waitForCondition is vacuous because the
baseline already has one row, so change the wait predicate to detect a new sync
completion (e.g., wait for a changing sentinel field, a sync timestamp/sequence
column, or force a sync cycle and wait for that side-effect) rather than just
res.rows.length === 1; update the block using externalClient.query(...) inside
waitForCondition to check that sentinel/timestamp/version has advanced compared
to a captured pre-sync value (or trigger a sync and wait for its known marker),
and replace the loose row: any declaration with a proper type matching the query
result ({ display_name: string | null }) or a small explanatory comment if any
is unavoidable; refer to waitForCondition, externalClient.query, row, and dbName
to locate and implement these changes.
🧹 Nitpick comments (2)
apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts (1)
231-234: Stale"idle"variant inPollerIterationResult.With the removal of idle-based termination, no code path returns
{ stopReason: "idle" }anymore. Consider removing"idle"from thestopReasonunion to keep the type accurate.type PollerIterationResult = { - stopReason: "disabled" | "idle" | null, + stopReason: "disabled" | null, processed: number, };apps/backend/src/app/api/latest/internal/external-db-sync/sequencer/route.ts (1)
197-199: Stale"idle"variant inSequencerIterationResult.Same as the poller: no code path returns
"idle"anymore. Clean up for consistency.type SequencerIterationResult = { - stopReason: "disabled" | "idle" | null, + stopReason: "disabled" | null, };
Before the waitFor condition would just terminate early. Now, we use a marker txn and verify that a sync has run by polling until the committed marker txn appears in the external db. We do a secondary check to see if seq number has grown to ensure that a sync has happened. Note external dbs are unique for each test, so no chance of conflict or overlap. The cron scheduler is running the seq + poller in the background.
7bed7c5 to
1a2f3a9
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
apps/backend/src/app/api/latest/internal/external-db-sync/poller/route.ts (1)
228-257:⚠️ Potential issue | 🟠 Major
stopReasonis computed but never acted upon — disabled poller spins for the full duration.
iterationResult.stopReasonis set to"disabled"when the fusebox disables the poller (line 242), but the while loop never checks it. This means a disabled poller will keep looping every 50ms for the entiremaxDurationMs(default 3 minutes), wasting resources on no-op iterations.If removing the
"idle"break was intentional, the"disabled"case likely still warrants an early exit.Proposed fix
iterationCount++; totalRequestsProcessed += iterationResult.processed; + if (iterationResult.stopReason != null) { + break; + } + await wait(pollIntervalMs); }apps/backend/src/app/api/latest/internal/external-db-sync/sequencer/route.ts (1)
194-227:⚠️ Potential issue | 🟠 MajorSame issue:
stopReason"disabled" is never checked — disabled sequencer spins for the full duration.Identical to the poller route: when
fusebox.sequencerEnabledis false (line 207), the iteration returns{ stopReason: "disabled" }, but the while loop never breaks on it. Consider adding a break condition here as well.Proposed fix
iterations++; + + if (iterationResult.stopReason != null) { + break; + } + await wait(pollIntervalMs); }
0ab2ba0 to
93fda47
Compare
7142997 to
aa101fc
Compare
Cron waits for up to 120 seconds between runs. The test has a 120 second timeout. So if cron is at the end of its random wait when the tests start, there can be a timeout. This is an edge case that causes flakiness. We bump the timeout to 150s to provide a buffer. These tests need to wait for a sync to happen, so they need to wait for the cron job to run.
Specifically the cleanup and init hooks. The default timeout is 10s. We create so many dbs in the tests that timeout can take a while. This makes the tests less flakey.
4501fba to
a8c94c7
Compare
getExternalDbFusebox used to lazy init the settings singleton. However, this caused issues. Prisma's upsert isn't safe from race conditions. So this could cause collisions/ unique key issues. Get doesn't need to update anything. Let's just return default values and have the update handle the init. Tested on local that fusebox updates work fine.
a8c94c7 to
7f97015
Compare
This is unrelated to the previous changes. This test was flaky before.
We order by createdAt, not status
waiting can often lead to timeout issues/ race conditions. Polling with early exit is a better pattern. We refactor to bring things in line with the other tests
Context
Our CI has evinced a few flakey tests recently. Additionally, the
freestyle-prodaction takes a long time to run. These end up delaying the merging/iterative development processes. There was also a bug with the existing implementation ofgetExternalDbSyncFuseboxwhere it was performing an upsert, which according to prisma, could cause conflicts.getExternalDbSyncFuseboxwas trying to upsert the settings singleton for the fusebox lazily if it wasn't updated yet, and if multiple upserts are occurring at once it can cause a rare unique key constraint violation on the singleton settings key. The fix is to not have the get function attempt an update.Note
Medium Risk
Touches internal external-db-sync control flow and test/CI orchestration; behavior changes could affect how quickly sync-driven tests converge, though changes are scoped to internal endpoints and test harnesses.
Overview
Reduces E2E/CI flakiness by removing the test-only “force external DB sync” mechanism and the
stopWhenIdlequery behavior from the internal external-db-syncsequencer/poller, so sync runs consistently for the fullmaxDurationMs(with added lightweight logging when work is found).Stabilizes email-related tests by centralizing outbox helpers in
backend-helpers, switching tests from fixed sleeps to polling for outbox status, and improving failure diagnostics by including outbox state in assertion errors. CI workflows are updated accordingly (dropSTACK_FORCE_EXTERNAL_DB_SYNC, and adjust prod PR runs to execute themailtest subset).Written by Cursor Bugbot for commit 7bed7c5. This will update automatically on new commits. Configure here.
Additional Notes
Note that we also update the Action for
freestyle-prodto only run the mail tests on non-dev and non-main branches. This is because freestyle is used for email-rendering, and so, those tests are the ones we're chiefly concerned about. Runningfreestyle-prodtests for non-mail tests broadly doesn't reveal any information that we wouldn't get from thefreestyle-mocktests and the mail tests. However, to account for the edge cases where it would, we let it run to completion on dev and main.Summary by CodeRabbit