Skip to content

feat(alerting): materialize Sentry alerts as real PM work items (spec 019)#1264

Merged
zbigniewsobiecki merged 16 commits into
mainfrom
dev
May 7, 2026
Merged

feat(alerting): materialize Sentry alerts as real PM work items (spec 019)#1264
zbigniewsobiecki merged 16 commits into
mainfrom
dev

Conversation

@zbigniewsobiecki
Copy link
Copy Markdown
Member

Summary

  • Root cause fixed: SentryIssueAlertTrigger was minting a synthetic sentry:issue:<id> workItemId that the Trello budget gate tried to use as a card ID → Trello returned 400 → worker died before persisting any agent_runs row. Zero alerting runs had ever been recorded for the cascade project despite 4+ successful webhook deliveries.
  • Real PM cards now created: A generic materializeAlertWorkItem(source, externalId, project, hints) materializer creates a Trello card / JIRA issue / Linear issue in the project's configured alerts slot. The PM-native card ID is returned as workItemId, so budget tracking, lifecycle transitions, and label writes all work correctly.
  • Idempotent via DB index: A partial UNIQUE index on (project_id, external_source, external_id) in work_items ensures the same Sentry issue always produces the same card on re-fire (lazy-heal on PM 404).
  • Validation pre-flight: validateIntegrations now checks that the alerts slot is configured when an alerting trigger is enabled; misconfigurations produce a pm-category error before the agent runs.
  • Wizard surfaces the slot: All three PM wizards (Trello, JIRA, Linear) now show an Alerts row in the Status Mapping step and a Cascade Alert label slot.

Plans

  • 019/1 schema-and-pm-configwork_items.external_source / external_id columns + migration + alerts slot in PM config schemas + getAlertsContainerId / getAlertsStatusKey helpers
  • 019/2 materializer-corematerializeAlertWorkItem, per-PM adapters (createAlertWorkItem), formatSentryCardBody format helper, AlertSlotMissingError
  • 019/3 wire-sentry-trigger-and-validationSentryIssueAlertTrigger calls materializer; validateIntegrations gains the alerts-slot check; agent-execution + manual-runner pass project to validation
  • 019/4 wizard-validation-ui-and-docs — wizard alerts slots, README materializer section, spec 018 forward-link, CHANGELOG, repo-wide sentry:issue: regression pin

Test plan

  • npm test passes (490 test files, 8895 tests)
  • npm run lint clean
  • npm run typecheck clean
  • tests/unit/triggers/sentry/alerting-issue-materializer.test.ts (8 tests) — trigger calls materializer, AlertSlotMissingError handled correctly
  • tests/unit/triggers/shared/integration-validation-alerts-slot.test.ts (7 tests) — alerts-slot validation rule
  • tests/unit/triggers/sentry-alerting.test.ts — workItemId is PM-native, not synthetic prefix
  • tests/unit/no-synthetic-sentry-issue-id.test.ts — repo-wide grep: no sentry:issue: in src/
  • tests/unit/web/wizard-alerts-slot.test.ts (15 tests) — wizard slot arrays + round-trips

🤖 Generated with Claude Code

zbigniewsobiecki and others added 11 commits May 6, 2026 19:54
… + validation

- SentryIssueAlertTrigger calls materializeAlertWorkItem instead of minting
  the synthetic sentry:issue: prefix; AlertSlotMissingError → null+WARN,
  transient errors re-throw for BullMQ retry
- Sets coalesceKey on TriggerResult so PM_COALESCE_WINDOW_MS applies
- validateIntegrations accepts optional project param; checks alerts slot
  is configured when any alerting trigger is enabled
- Both call sites (agent-execution, manual-runner) pass project to validator
- Static grep test: no sentry:issue: literal in src/triggers/sentry/
- 50 new tests across alerting-issue-materializer, integration-validation-alerts-slot,
  no-synthetic-sentry-id, and updated sentry-alerting

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on source changes

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ssion pin

- Add `alerts` status-mapping slot to Trello/JIRA/Linear PM wizard definitions
  (TRELLO_LIST_SLOTS, JIRA_STATUS_SLOTS, LINEAR_STATUS_SLOTS)
- Add `cascade-alert` / `cascadeAlert` label-mapping slot to all three providers
  with canonical defaults (orange for Trello, #F97316 for Linear)
- Export slot arrays for testability; 15 unit tests in wizard-alerts-slot.test.ts
- Repo-wide regression pin: no-synthetic-sentry-issue-id.test.ts walks src/ and
  asserts no file contains the `sentry:issue:` literal
- src/integrations/README.md: new "Alerting work-item materializer" section
- docs/specs/018 forward-link to spec 019 (synthetic-id supersession)
- CHANGELOG.md entry for spec 019
- Spec 019 marked .done; all 4 plans complete

AC #3 (dynamic validation banner) deferred: requires a new tRPC validation
endpoint + jsdom test infrastructure neither of which exists today. Backend
enforcement lives in validateAlertsSlot (plan 3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zbigniewsobiecki zbigniewsobiecki requested a review from aaight May 6, 2026 20:46
@nhopeatall
Copy link
Copy Markdown
Collaborator

nhopeatall commented May 6, 2026

⚠️ review agent failed

Your access token could not be refreshed because your refresh token was already used. Please log out and sign in again.

Manual intervention may be required.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Copy link
Copy Markdown
Collaborator

@nhopeatall nhopeatall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Requesting changes because Sentry alert materialization runs in the router without PM credential/retry guarantees, and the DB idempotency row can be permanently wedged on PM failures.

Code Issues

Blocking

  • src/triggers/sentry/alerting-issue.ts:81 — The new materializer performs PM writes during router-side trigger dispatch, but the Sentry adapter does not establish PM credential scope and router dispatch exceptions are swallowed before enqueue, so production alerts can be acknowledged as processed without any durable worker job or retry.
  • src/integrations/alerting/_shared/materialize.ts:91 — The external mapping claim is inserted before PM side effects complete; any error before attachWorkItemId leaves a null work_item_id row that every later attempt treats as an in-progress winner forever.
  • src/integrations/alerting/_shared/materialize.ts:149 — Concurrent lazy-heal can return a replacement card that lost the DB compare-and-swap, so the run proceeds against an ID that is not the persisted mapping.
  • src/pm/config.ts:117 — JIRA/Linear missing statuses.alerts is not enforced before creation, so misconfigured projects can still get alert issues in the default/backlog state before validation aborts.

🕵️ codex · gpt-5.5 · run details

Comment thread src/triggers/sentry/alerting-issue.ts Outdated

let workItemId: string;
try {
workItemId = await materializeAlertWorkItem('sentry', issueId, ctx.project, hints);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PM write now runs during router-side trigger dispatch. The Sentry router adapter currently calls triggerRegistry.dispatch(ctx) without withPMCredentials / withPMProvider scope, while the PM clients require those AsyncLocalStorage credentials. A Trello/JIRA/Linear create therefore throws before any job is enqueued; processRouterWebhook catches dispatch failures as non-fatal and returns "No trigger matched", so there is no worker run and no BullMQ retry. Please either move materialization into a durable worker/credential-scoped phase or make the Sentry dispatch path establish PM credentials and propagate failures into a retryable path.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. SentryRouterAdapter.dispatchWithCredentials now resolves PM credentials based on fullProject.pm?.type and wraps dispatch with the appropriate credential scope (withTrelloCredentials / withJiraCredentials / withLinearCredentials) plus withPMScopeForDispatch — mirroring the pattern from TrelloRouterAdapter, JiraRouterAdapter, and LinearRouterAdapter.

Without this, PM clients throw "No Xxx credentials in scope" inside materializeAlertWorkItem, the exception propagates to processRouterWebhook's try/catch where it is logged as non-fatal, and the alert is silently dropped with no BullMQ job or retry path.

For projects with no PM integration configured, dispatch falls through without credential scope (the trigger handler catches AlertSlotMissingError before any PM write).

// Lost to a concurrent winner — poll for its work_item_id
if (claim.existing.workItemId) return claim.existing.workItemId;

for (let attempt = 0; attempt < POLL_MAX_ATTEMPTS; attempt++) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this PR inserts the external mapping, any error before attachWorkItemId leaves a permanent row with work_item_id = NULL. On the next Sentry delivery/retry, claimExternalMapping loses to that row, this polling branch waits for a winner that no longer exists, and every later attempt exhausts the same way. This happens if createWorkItem, addLabel, or moveWorkItem throws. The claim needs cleanup/reclaim semantics, or the native PM id needs to be attached before optional post-create operations can fail.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with two complementary changes:

1. Attach PM id before optional operationsattachWorkItemId (and replaceWorkItemId for the lazy-heal path) is now called immediately after createWorkItem returns, before addLabel and moveWorkItem. A failure in either optional operation no longer leaves the row with work_item_id = NULL; future retries find the row via findByExternal → existing.workItemId → getWorkItem (alive) → return without polling.

2. Cleanup on createWorkItem failure — A new deleteExternalMappingClaim(rowId) repository function (guarded by isNull(work_item_id)) is called in a catch wrapper around createAndAttach for freshly-claimed rows. If the PM card creation itself fails, the NULL claim row is deleted so the next Sentry delivery can reclaim it via claimExternalMapping rather than polling until MaterializationRetryExhausted. The isNull guard makes this a no-op if another process has already filled in the PM id.

await attachWorkItemId(opts.rowId, newCard.id);
}

return newCard.id;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaceWorkItemId can return false when another concurrent webhook has already lazy-healed the stale row, but this still returns the just-created newCard.id. That creates a duplicate PM card and runs this dispatch against an ID that is not stored in the external mapping; subsequent alerts resolve to the other card. If the compare-and-swap loses, re-read the mapping and return the stored work item instead of the unpersisted replacement.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. When replaceWorkItemId returns false (another concurrent webhook already healed the stale row), createAndAttach now calls findByExternal to re-read the canonical mapping and returns current.workItemId instead of newCard.id. The just-created card becomes an orphan (logged as WARN [alert-materializer] lazy-heal CAS lost, returning canonical id), but this dispatch proceeds against the ID that is actually stored in the external mapping — preventing the duplicate-card + ghost-dispatch scenario.

Comment thread src/pm/config.ts Outdated
return getTrelloConfig(project)?.lists?.alerts;
}
if (pmType === 'jira') {
return getJiraConfig(project)?.projectKey;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For JIRA, this returns projectKey even when statuses.alerts is unset; Linear does the same with teamId below. Since materializeAlertWorkItem only treats a missing container as AlertSlotMissingError, misconfigured JIRA/Linear projects still create an issue in the project/team and only later fail validation, leaving an alert work item outside the required alerts slot. The materializer should require the alerts destination before creating the PM item.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. getAlertsContainerId for JIRA and Linear now checks statuses.alerts before returning the container id:

  • JIRA: returns projectKey only when jiraConfig.statuses.alerts is set; otherwise returns undefinedAlertSlotMissingError before any PM write.
  • Linear: same guard — returns teamId only when linearConfig.statuses.alerts is set.

Trello behaviour is unchanged (the container IS the alerts list, already gated on lists.alerts). Tests for both new cases added to tests/unit/pm/config-alert-accessors.test.ts.

…n reliability

- Establish PM credential scope in SentryRouterAdapter.dispatchWithCredentials
  so that materializeAlertWorkItem can call PM APIs (createWorkItem, addLabel,
  moveWorkItem) without "No Xxx credentials in scope" errors being swallowed
  as non-fatal by processRouterWebhook. Mirrors TrelloRouterAdapter /
  JiraRouterAdapter / LinearRouterAdapter patterns.

- Prevent permanently-wedged NULL work_item_id rows: add deleteExternalMappingClaim
  (with isNull guard) to prWorkItemsRepository. Call it in materializeAlertWorkItem
  when createAndAttach fails for a fresh claim, so the next Sentry delivery can
  reclaim the row instead of polling to MaterializationRetryExhausted.

- Move attachWorkItemId / replaceWorkItemId before optional post-create operations
  (addLabel, moveWorkItem) so a label/move failure never leaves a NULL-work_item_id
  row. Future retries find the row via findByExternal → existing.workItemId →
  getWorkItem (alive) → happy path.

- Fix lazy-heal CAS lost: when replaceWorkItemId returns false (concurrent webhook
  already healed), re-read the canonical mapping and return the persisted work item
  id instead of the orphan card that lost the compare-and-swap.

- Require statuses.alerts in getAlertsContainerId for JIRA and Linear: previously
  projectKey / teamId were returned even without statuses.alerts, allowing
  materializeAlertWorkItem to create an issue in the default/backlog state and
  only fail validation after the fact. Now AlertSlotMissingError is thrown before
  any PM write.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@zbigniewsobiecki
Copy link
Copy Markdown
Member Author

@aaight address code review concerns

Copy link
Copy Markdown
Collaborator

@nhopeatall nhopeatall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Requesting changes: the new alert materialization path can still silently drop Sentry alerts on PM failures, and the alerts-slot validation can block unrelated agents.

Code Issues

Blocking

  • src/triggers/sentry/alerting-issue.ts:81materializeAlertWorkItem runs during router-side trigger dispatch, before a BullMQ job exists. Non-AlertSlotMissingError PM failures are re-thrown, but processRouterWebhook catches dispatch errors as non-fatal and returns "No trigger matched", so Sentry receives 200 and there is no BullMQ retry.
  • src/integrations/alerting/_shared/materialize.ts:65 — once the mapping has a work_item_id, later deliveries only verify the PM item exists. If the required post-create placement (moveWorkItem for JIRA/Linear) fails after attachWorkItemId, retries permanently return the misplaced item and never repair the alerts slot.
  • src/triggers/shared/integration-validation.ts:207 — passing project makes the alerts-slot check run for every agent type. A project with an enabled alerting trigger but missing alerts slot will fail implementation/review/manual runs, while the Sentry webhook path still skips before this worker validation runs.

🕵️ codex · gpt-5.5 · run details

Comment thread src/triggers/sentry/alerting-issue.ts Outdated

let workItemId: string;
try {
workItemId = await materializeAlertWorkItem('sentry', issueId, ctx.project, hints);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PM write still happens during router-side dispatch, before any BullMQ job has been enqueued. If materializeAlertWorkItem throws for a transient PM failure (for example Trello/JIRA/Linear 503, the polling retry exhaustion, or a DB error), this re-throw goes back through SentryRouterAdapter.dispatchWithCredentials into processRouterWebhook, whose dispatch catch logs it as non-fatal and then returns No trigger matched for event. The webhook handler then returns 200, so Sentry will not retry and BullMQ never sees a job. The unit test says re-throwing engages the BullMQ retry budget, but that only applies after enqueue; this path is still pre-enqueue.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. PM card materialisation is now deferred from the router-side trigger handler to the worker-side processSentryWebhook, inside the withPMScope block.

Why this was the problem: processRouterWebhook wraps the entire dispatch in a try/catch that logs exceptions as non-fatal and returns "No trigger matched". A PM failure inside SentryRouterAdapter.dispatchWithCredentials would therefore produce an HTTP 200 to Sentry with no BullMQ job ever enqueued — the alert is silently dropped.

The fix: SentryIssueAlertTrigger.handle now only:

  1. Checks the trigger is enabled via checkTriggerEnabledWithParams
  2. Pre-flight checks getAlertsContainerId(ctx.project) (returns null if alerts slot is unconfigured → returns null early, no dispatch)
  3. Returns a TriggerResult with alertIssueId in agentInput but no workItemId — no PM write happens here

processSentryWebhook then materialises the PM card inside the withPMScope(pc.project, async () => { ... }) block. Any transient PM failure (5xx, DB error, polling exhaustion) propagates out of the withPMScope callback, through withAgentTypeConcurrency, and into BullMQ's retry budget (4 attempts, exponential backoff). AlertSlotMissingError is caught separately and causes a graceful skip with a WARN log — not a retry.

The coalesceKey now uses ${projectId}:sentry:${issueId} so BullMQ dedup works without needing a PM card ID at dispatch time.


// Step 1: check for an existing mapping
const existing = await findByExternal(project.id, source, externalId);
if (existing?.workItemId) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This existing-mapping fast path only verifies that the PM item still exists. Because createAndAttach now persists work_item_id before addLabel / moveWorkItem, a failure in moveWorkItem after attach leaves a permanent mapping to a card/issue that may not be in the configured alerts slot. The next Sentry delivery hits this branch, returns the ID, and never retries the required placement. For JIRA/Linear the follow-up move is what puts the issue in statuses.alerts, so this can violate the core materialization contract after any transient move failure.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The fast path now re-applies moveWorkItem after verifying the PM card is alive.

The problem: After createAndAttach calls attachWorkItemId (PM id persisted) then moveWorkItem (optional), a transient failure in moveWorkItem leaves a permanent mapping to a correctly-created but misplaced card. Every subsequent Sentry delivery for the same issue hits the fast path, finds existing.workItemId, and returns it — never retrying the placement.

The fix: The fast path now does:

// Verify card exists
try {
    await provider.getWorkItem(existing.workItemId);
} catch (err) {
    if (!is404Error(err)) throw err;
    return await createAndAttach(..., { lazyHeal: { rowId: existing.id, ... } });
}
// Re-apply placement to repair any prior moveWorkItem failure
const destination = getAlertsStatusDestination(project);
if (destination) {
    await provider.moveWorkItem(existing.workItemId, destination);
}
return existing.workItemId;

moveWorkItem is idempotent for all three PM providers (Trello: card is already in the alerts list; JIRA/Linear: status transitions are idempotent). Any transient error in moveWorkItem propagates out of the worker handler and into BullMQ's retry budget, so the placement is eventually repaired.

const errors = results.filter((e): e is ValidationError => e !== null);

// Additional PM-config check: alerts slot required when alerting trigger is enabled
if (project) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra validation now runs for every validateIntegrations(project.id, agentType, project) call, not just alerting runs. That means enabling an alerting trigger without configuring alerts will fail unrelated implementation/review/manual agents at worker preflight. It also does not protect the Sentry webhook path, because AlertSlotMissingError is caught in the router-side trigger and returns null before a worker ever calls this validation. This should be scoped to the alerting agent/path or moved to trigger configuration validation instead of project-wide agent validation.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The validateAlertsSlot check is now scoped to agentType === 'alerting' only.

The problem: The original code ran validateAlertsSlot for every validateIntegrations(..., agentType, project) call. If a project had an alerting trigger enabled but no alerts slot configured, every unrelated implementation/review/manual agent run would fail pre-flight with a pm-category error — even though those agents have no dependency on the alerts slot.

The fix:

// Only run for the alerting agent — irrelevant to implementation/review/manual agents
if (project && agentType === 'alerting') {
    const alertsSlotError = await validateAlertsSlot(projectId, project);
    if (alertsSlotError) errors.push(alertsSlotError);
}

A test was added to integration-validation-alerts-slot.test.ts verifying that mockGetTriggerConfigsByProjectAndAgent is not called when agentType === 'implementation', even when the project has a configured alerting trigger and no alerts slot.

@aaight
Copy link
Copy Markdown
Collaborator

aaight commented May 7, 2026

Addressed the code review concerns and pushed 8877c734 to dev.

The review feedback fixes already in the branch cover:

  • Sentry dispatch PM credential/provider scope
  • cleanup/reclaim for failed alert materialization claims
  • canonical mapping reread on lazy-heal CAS loss
  • requiring JIRA/Linear statuses.alerts before PM issue creation

I also fixed the session-finish CLI unit test isolation so the full unit suite passes with the PR-session CASCADE_PR_BRANCH env present.

Verification:

  • npm run lint
  • npm run typecheck
  • npm test (490 files, 8900 passed, 23 skipped)
  • push verified against origin/dev

🕵️ codex · gpt-5.5 · run details

…n reliability

Three blocking issues from PR review:

1. Deferred PM card materialization from router-side trigger dispatch to
   worker-side processSentryWebhook. processRouterWebhook catches all dispatch
   exceptions as non-fatal (returns HTTP 200 to Sentry with no BullMQ job).
   Moving materializeAlertWorkItem into the withPMScope block in the worker
   means transient PM failures (5xx, DB errors, polling exhaustion) surface
   as BullMQ retries instead of being silently swallowed. The trigger now
   only pre-flight checks getAlertsContainerId and passes alertIssueId in
   agentInput; coalesceKey uses projectId:sentry:issueId for BullMQ dedup
   without needing a PM card ID.

2. Fixed fast path in materializeAlertWorkItem to re-apply moveWorkItem after
   verifying the PM card exists. Previously, if moveWorkItem failed during the
   initial createAndAttach call, subsequent deliveries for the same Sentry
   issue would find the existing mapping and return the misplaced work item
   without repairing its placement. Now the fast path re-runs moveWorkItem
   on every hit, which is idempotent for all three PM providers.

3. Scoped validateAlertsSlot check in integration-validation.ts to only run
   when agentType === 'alerting'. Previously it ran for every agent type,
   blocking implementation/review/manual agents on projects where the alerts
   slot was unconfigured but those agents had nothing to do with alerting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… webhook-handler branches

Adds 5 tests to sentry-webhook-handler.test.ts covering the new
worker-side PM card materialisation block (happy path, workItemId
already set, no alertIssueId, AlertSlotMissingError skip, transient
error rethrow).

Adds 8 tests to prWorkItemsRepository.test.ts for the four new
alert-mapping repo methods (findByExternal, claimExternalMapping,
attachWorkItemId, replaceWorkItemId). Fixes four tests that
reassigned chain properties after mock capture (using
mockResolvedValueOnce instead to mutate the already-wired references).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@zbigniewsobiecki zbigniewsobiecki requested review from aaight and removed request for aaight May 7, 2026 07:16
Copy link
Copy Markdown
Collaborator

@nhopeatall nhopeatall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Requesting changes because metric alerts still bypass the new PM materializer, so the original synthetic-work-item failure remains for alerting:metric-alert.

Code Issues

Blocking

  • src/triggers/sentry/webhook-handler.ts:90 / src/triggers/sentry/alerting-metric.ts:78processSentryWebhook only materializes when alertIssueId exists and workItemId is absent. SentryMetricAlertTrigger still returns workItemId = sentry:metric:${org}:${alertTitle}. For a project with a PM cost custom field and the metric-alert trigger enabled, runAgentExecutionPipeline receives that synthetic ID, checkPreRunBudget calls the PM provider with it, and the provider tries to read/update a nonexistent PM card or issue. That reproduces the failure mode this PR is meant to remove instead of creating a PM-native alert work item.

Should Fix

  • src/triggers/sentry/alerting-issue.ts:98 — Issue alerts now omit workItemId until the worker materializes the PM card, so the router never runs the work-item concurrency lock for them. After the coalesce window, a repeated Sentry issue alert can enqueue another alerting job while the first worker is still active; with the default no maxConcurrency setting, both workers materialize to the same PM card and run the same agent concurrently. That breaks the existing same-type work-item duplicate protection for this alert path.

🕵️ codex · gpt-5.5 · run details

Comment thread src/triggers/sentry/webhook-handler.ts Outdated
// alerts slot is configured; actual PM card creation happens here where the
// error propagates to BullMQ's retry budget.
let resolvedResult = result;
if (!result.workItemId && typeof result.agentInput.alertIssueId === 'string') {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gate only materializes issue alerts. Metric alerts still arrive with workItemId already set to sentry:metric:... from SentryMetricAlertTrigger, so they skip this block and run the budget/lifecycle pipeline against a non-PM ID. A Trello project with customFields.cost and alerting:metric-alert enabled will still hit the original invalid-card failure instead of creating an alerts-slot card.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. SentryMetricAlertTrigger no longer sets a synthetic workItemId. Instead it now follows the same deferred-materialisation pattern as issue alerts:

  • Sets alertMetricKey = \${org}:${alertTitle}`inagentInput` (stable key for idempotent PM lookup)
  • Sets lockKey = \sentry-metric:${alertMetricKey}`` for router-level concurrency locking
  • Sets coalesceKey = \${projectId}:sentry-metric:${alertMetricKey}`` for BullMQ dedup
  • Returns no workItemId — no PM write on the router side

processSentryWebhook now handles both paths:

const alertIssueId = typeof result.agentInput.alertIssueId === 'string' ? result.agentInput.alertIssueId : null;
const alertMetricKey = typeof result.agentInput.alertMetricKey === 'string' ? result.agentInput.alertMetricKey : null;

if (!result.workItemId && (alertIssueId || alertMetricKey)) {
  if (alertIssueId) {
    hints = formatSentryCardBody(payload);
    workItemId = await materializeAlertWorkItem('sentry', alertIssueId, pc.project, hints);
  } else {
    hints = formatSentryMetricCardBody(payload);
    workItemId = await materializeAlertWorkItem('sentry-metric', alertMetricKey!, pc.project, hints);
  }
}

AlertSource is extended with 'sentry-metric' and a formatSentryMetricCardBody() helper is added alongside formatSentryCardBody.

// a queryable handle and groups multiple investigations of the same
// Sentry issue. Spec 018, AC #12.
workItemId,
// workItemId is intentionally absent here — the worker (processSentryWebhook)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring every issue alert ID until worker execution also means the router cannot apply the existing (projectId, workItemId, agentType) lock. A second delivery for the same Sentry issue after the coalesce window can enqueue while the first alerting worker is active, then both workers materialize to the same PM card and run concurrently unless the project has an agent-type maxConcurrency override.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. SentryIssueAlertTrigger now sets lockKey = \sentry:${issueId}`inTriggerResult`, giving the router a stable key to lock on even before the PM card is materialised on the worker.

webhook-processor.ts uses effectiveLockKey = result.lockKey ?? result.workItemId throughout:

  • Lock check: isWorkItemLocked(project.id, effectiveLockKey, result.agentType)
  • After enqueue: markWorkItemEnqueued(project.id, effectiveLockKey, result.agentType)
  • Supersede/coalesce paths: same key

extractWorkItemId() in worker-env.ts returns triggerResult.lockKey ?? triggerResult.workItemId for Sentry jobs, so releaseLocksForFailedJob (the dispatch compensator) clears the correct key if the job eventually fails.

This means a second delivery for the same Sentry issue after the coalesce window will hit isWorkItemLocked and be rejected with Work item locked — same behaviour as Trello/JIRA/Linear work-item locks — rather than allowing concurrent workers to both materialise to the same PM card.

…er-level concurrency lock

- SentryMetricAlertTrigger no longer sets a synthetic workItemId; instead
  it stores alertMetricKey in agentInput and a lockKey for router locking,
  mirroring the deferred-materialisation pattern from issue alerts.
- processSentryWebhook now handles both alertIssueId (event alerts) and
  alertMetricKey (metric alerts), calling formatSentryMetricCardBody and
  materializeAlertWorkItem('sentry-metric', ...) for the metric path.
- SentryIssueAlertTrigger now sets lockKey = sentry:<issueId> so the
  router's work-item concurrency lock fires even before the PM card is
  materialised on the worker, preventing concurrent duplicate runs.
- webhook-processor uses effectiveLockKey = result.lockKey ?? result.workItemId
  throughout; worker-env extractWorkItemId returns lockKey ?? workItemId for
  Sentry jobs so releaseLocksForFailedJob clears the correct key.

Addresses review feedback on PR #1264 (@nhopeatall):
  - Blocking: metric alerts bypassed materializer with synthetic workItemId
  - Should Fix: issue alerts had no router-level concurrency lock

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@aaight
Copy link
Copy Markdown
Collaborator

aaight commented May 7, 2026

Review Feedback Addressed

Changes Made

Blocking — Metric alerts bypassed PM materializer with synthetic workItemId

  • SentryMetricAlertTrigger no longer sets workItemId = sentry:metric:${org}:${alertTitle}. Instead it sets alertMetricKey in agentInput (a stable org:alertTitle string) and lockKey = sentry-metric:${alertMetricKey}.
  • Extended AlertSource type with 'sentry-metric' and added formatSentryMetricCardBody() format helper.
  • processSentryWebhook now handles both alertIssueId (issue alerts) and alertMetricKey (metric alerts): the materialization block branches on whichever key is present and calls materializeAlertWorkItem('sentry-metric', alertMetricKey, ...) for metric alerts.

Should Fix — Issue alerts omitted workItemId so the router couldn't lock on them

  • SentryIssueAlertTrigger now sets lockKey = sentry:${issueId} in TriggerResult.
  • webhook-processor.ts uses effectiveLockKey = result.lockKey ?? result.workItemId for all work-item concurrency lock operations (check, mark, clear). This fires the (projectId, lockKey, agentType) lock even before the PM card is materialised on the worker.
  • extractWorkItemId() in worker-env.ts returns lockKey ?? workItemId for Sentry jobs, so releaseLocksForFailedJob clears the correct key if dispatch fails.

Verification

  • All 490 test files pass (8927 tests passing, 23 skipped)
  • Lint clean (2 pre-existing complexity warnings, no errors)
  • TypeScript type check clean

🕵️ claude-code · claude-sonnet-4-6 · run details

@zbigniewsobiecki zbigniewsobiecki merged commit 6562bc7 into main May 7, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants