chore: release - merge dev into main by zbigniewsobiecki · Pull Request #1293 · mongrel-intelligence/cascade

zbigniewsobiecki · 2026-05-09T17:07:11Z

Automated release PR created by the release workflow.

Commits (17):

ff5048ff Merge pull request #1292 from mongrel-intelligence/fix/agent-contract-robustness
56b56391 fix(gadgets): truncate hook output in error messages on commit/push failure
63d821fd fix(backends): distinguish empty/malformed/missing sidecar in WARN log
7de0b821 fix(implementation): harden agent contract against tool-output bloat + shell-state corruption
ce051a45 Merge pull request #1291 from mongrel-intelligence/feat/sentry-issue-lifecycle-trigger
63ac36f5 fix(alerting): declare alerting:issue-lifecycle in alerting agent definition
7345285c Merge pull request #1290 from mongrel-intelligence/refactor/cli-command-factory
26acea97 Merge pull request #1289 from mongrel-intelligence/docs/trigger-architecture-contracts
cd7e8e5f feat(alerting): handle Sentry resource:issue webhooks (Internal Integration surface)
b8ca17c1 refactor(gadgets): split CLI command factory helpers
2949b246 docs(triggers): fix resolveTriggerResult usage and resilience doc boundary
c03c2c3a docs(resilience): qualify dispatch-failure compensation boundary accurately
cfbceef5 docs(triggers): qualify buildDeferredRecheckResult as GitHub-only in all active contributor docs
03035f8e docs(trigger): qualify deferred re-check as GitHub-only in architecture docs
5e0a2c9d docs(triggers): qualify deferred re-check as GitHub-only and fix router failure table
0c10ed6d docs(trigger): update trigger architecture contracts
746e450b test(triggers): expand trigger contract conformance coverage (#1287)

* test(triggers): expand trigger contract conformance coverage * fix(test): address review feedback on trigger conformance fixtures and guard - Mirror production `allPassing` predicate in `createCheckSuiteStatus`: require at least one check and treat success/skipped/neutral as passing, matching `githubClient.getCheckSuiteStatus()` exactly. The neutral conformance case now correctly yields `allPassing: true`. - Extend the raw-trigger-literal guard regex to catch backtick template literals in addition to single- and double-quoted strings, closing the syntax-level bypass identified in the review. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Cascade Bot <bot@cascade.dev> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…er failure table - src/triggers/README.md: buildDeferredRecheckResult authoring guidance now explains that bare re-dispatch is a GitHub-only contract today. GitHub's buildJob() strips triggerResult and sets mergeabilityRecheckAttempt; non-GitHub adapters (Trello, JIRA, Linear, Sentry) always embed triggerResult, so their workers use resolveTriggerResult() with the pre-resolved result and skip registry dispatch. A non-GitHub handler returning buildDeferredRecheckResult would schedule a job that reuses the same agentType: null result. - docs/architecture/02-webhook-pipeline.md: split the combined Redis failure row into two distinct rows. PM immediate-dispatch and coalesced-dispatch failures call onBlocked and return "Failed to enqueue/schedule job to Redis". Deferred re-check schedule failures capture Sentry under deferred_recheck_schedule_failure, skip onBlocked, and still return "Deferred re-check scheduled" (same as success). Updated the Coalescing and deferred re-check prose to match. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…re docs The "Deferred re-checks" section in 03-trigger-system.md and the "Deferred re-check exhaustion" section in 10-resilience.md both described worker re-dispatch as a generic contract, but only GitHubRouterAdapter.buildJob() strips triggerResult and sets mergeabilityRecheckAttempt. Non-GitHub adapters embed triggerResult so resolveTriggerResult() short-circuits, and the registry is never re-invoked. Updated both paragraphs to mirror the GitHub-only qualification already present in src/triggers/README.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…all active contributor docs The builder summary table, TriggerResult JSDoc, and CLAUDE.md all described deferred re-check as a generic re-dispatch contract. In practice, bare re-dispatch only works for GitHub: GitHubRouterAdapter.buildJob() strips triggerResult and sets mergeabilityRecheckAttempt, while non-GitHub adapters embed the pre-resolved result so resolveTriggerResult() short-circuits without registry dispatch. Update all three surfaces to state this constraint clearly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rately The paragraph added in this PR overstated the compensation invariant by implying all pre-worker failures flow through BullMQ `failed` → `releaseLocksForFailedJob`. That path only applies post-enqueue. Redis enqueue/schedule failures (`addJob` / `scheduleCoalescedJob` throwing before `markImmediateDispatchEnqueued` / `markCoalescedDispatchEnqueued`) have no BullMQ job and therefore no `failed` event. Lock protection on that path comes from the success-first ordering contract: marks happen only after a successful enqueue, so a Redis failure leaves no lock to release. `onBlocked()` is the inline compensation. Split the single paragraph into two distinct boundaries — enqueue/schedule failures and post-enqueue dispatch failures — consistent with the router outcomes table in docs/architecture/02-webhook-pipeline.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ndary - Update trigger-resolution.ts "Used By" column from "Sentry (GitHub and PM use inline logic)" to "GitHub, PM, Sentry" — all three worker handlers call resolveTriggerResult() (PM: webhook-handler.ts:69, GitHub: webhook-handler.ts:230, Sentry: webhook-handler.ts:53). - Replace "[inline] if triggerResult → use it, else dispatchTrigger(...)" in the GitHub flow diagram with the actual call: resolveTriggerResult(registry, ctx, triggerResult) → result. - Amend "Every failed dispatch path flows through the BullMQ failed event" to "Post-enqueue dispatch failures … flow through the BullMQ failed event" with a forward reference to the split below, removing the internal contradiction with the two-boundary section added earlier. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ration surface) Closes the silent-skip path for Sentry's "Internal Integration" / Custom Webhook surface. Prod 2026-05-09: a wedged-lock-canary alert fired in the cascade Sentry project, the alerting agent was enabled, but no agent ran. Webhook log id `fbdc6d87-b962-444c-8a2a-a9452a74ff71` shows `processed=false, decisionReason="Event unparseable or not processable"` — the trigger was never invoked. Root cause: `src/router/adapters/sentry.ts:31` whitelisted only `['event_alert', 'metric_alert']` (Sentry Alert Rule surfaces). The webhook arrived with `Sentry-Hook-Resource: issue` (Internal Integration default surface — the natural way users wire Sentry → cascade). Spec 019 was scoped to event_alert; the issue-lifecycle path was deferred and never landed. Users who configured Sentry the natural way got silent skips for every issue. This adds end-to-end support, mirroring the event_alert pattern: - Router adapter accepts `'issue'` resource + new test asserting it parses. - New `SentryIssueLifecycleTrigger` (matches `resource: 'issue'` + `action: 'created'`) fires the alerting agent. Resolved/archived/etc. actions are deferred (would auto-close the cascade card; out of scope). - Distinct AlertSource literal `'sentry-issue'` so the `(project_id, external_source, external_id)` partial-unique index on `pr_work_items` doesn't collide if the same Sentry issue arrives via both surfaces (event_alert and issue) — each surface materializes its own card. - New `formatSentryIssueLifecycleCardBody` builds AlertHints from `data.issue.{title, web_url, level, shortId, culprit, metadata.{filename, function}}`. Mirror of `formatSentryCardBody` adapted to the issue-lifecycle payload shape. - Worker-side `processSentryWebhook` extends the materializer dispatch with a third branch keyed on `agentInput.triggerEvent === 'alerting:issue-lifecycle'`. Same AlertSlotMissingError graceful-skip + transient-PM-error retry semantics as the existing two branches. - `SentryIssuePayload` type updated to match the actual Sentry webhook shape (nested `data.issue.{...}` instead of flattened `data.{...}` — the captured prod fixture confirmed the existing type was wrong). Drive-by lint cleanup (per request): refactored `materializeAlertWorkItem` to extract `reuseOrLazyHealMapping` and `pollForConcurrentWinner` helpers, bringing the parent function under the cognitive-complexity ceiling. Verification: - 9112 unit tests passing (3 new test files: issue-lifecycle-format, issue-lifecycle handler, plus extensions to sentry-webhook-handler and the router/adapters/sentry tests). Captured live prod fixture used as the regression baseline. - Lint clean (0 errors, 0 warnings). - Typecheck clean. Operator notes: - Cascade project's PM `lists.alerts` (Trello) / `statuses.alerts` (JIRA, Linear) must be configured for materialisation to actually create a card. The pre-flight validation rule at `src/triggers/shared/integration-validation.ts` already emits a `pm`-category error when alerting is enabled but the slot is unset — unchanged; same message will fire for the new `'sentry-issue'` source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tecture-contracts

…nd-factory

…inition Adds `alerting:issue-lifecycle` to `src/agents/definitions/alerting.yaml` so `getResolvedTriggerConfig` can find the event. Without this entry the lookup returned `null`, causing `checkTriggerEnabledWithParams` to always see `enabled=false` — leaving the prod incident path effectively unfixed even though the router now accepts `resource: issue` webhooks. Also adds `TRIGGER_EVENTS.ALERTING.ISSUE_LIFECYCLE` to the events catalog and updates the handler + webhook-handler to use the constant (required by the trigger-event-string consistency static guard for new handlers). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…lifecycle-trigger

…+ shell-state corruption Closes the silent-failure path identified by prod run d8e31665 (cascade implementation, 2026-05-09): the agent created PR #1290 successfully but the run was marked failed with "Agent completed but no authoritative PR creation was recorded" — and operators had zero signal of why. Three independent buggy layers compounded; fix all three so any one being robust catches the failure mode. ## Layer 1 — gadget output bloat (the trigger; cascade-specific) Cascade's lefthook pre-push runs `npm run test:fast` (159 files / 2981 tests / ~34 s / ~97 KB of output). That output was captured into the gadget result's `pushOutput` field by `cascade-tools scm create-pr`. Codex's tool-result parser couldn't reliably extract the JSON envelope buried under that volume, retried the call, and the resulting concurrent invocations raced against the same sidecar path leaving prUrl missing. Fix: truncate `pushOutput` and `commitOutput` at 4 KB at the gadget result boundary. Full hook output already streams through `runCommand`'s heartbeat into `LLMIST_LOG_FILE` for operator visibility — only the agent-visible result-stream copy is capped. New head + truncation-marker + tail format preserves both ends of the output for context. ## Layer 2 — sidecar contract gap (the fault; provider-agnostic) `writePRSidecar` accepted `prUrl: string` but never validated. If the caller passed undefined for any reason, `JSON.stringify({prUrl: undefined, ...})` silently omitted the key and produced a half-baked sidecar. Downstream hydration reported "missing required fields { hasPrUrl: false }" with no signal of what the writer actually persisted. Fix: validate at the write boundary. Refuse to write + log ERROR with full input context when prUrl is missing/non-string or prNumber is non-finite. Same hardening for writeReviewSidecar's reviewUrl. Caller bugs now produce a precise log entry instead of a silent contract violation. ## Layer 3 — diagnostic gap (operator visibility) `hydratePrSidecar` logged "PR sidecar missing required fields { hasPrUrl: false }" with no further context — no path, no parsed keys, no actual values. Operators hit Loki archaeology to determine whether the file was empty, malformed, or a valid JSON without prUrl. Fix: hydratePrSidecar now re-reads the raw payload and dumps `sidecarPath`, `rawSidecarKeys`, `rawSidecarPrUrl`, `rawSidecarPrNumber` so a single log line distinguishes empty-file / valid-JSON-without-prUrl / malformed-JSON. postProcessResult emits a Sentry capture under tag `pr_sidecar_invalid` (with engine + agentType + identifier extras) when authoritative PR evidence is missing — gives ops a single dashboard for prod frequency. ## Layer 4 — codex shell-state corruption (the propagation; codex-specific) Codex's persistent bash session breaks with `ERROR codex_core::tools::router: error=write_stdin failed: stdin is closed for this session`. Once that signal fires, every subsequent command in the run inherits a stale stdout buffer (we observed lint output from one command bleeding into a later post-comment call, plus sidecar writes racing). Codex itself exits cleanly (exit=0) — the stderr signal is the only evidence the session corrupted. Fix: detect the signal in stderr at run completion. When present + exit=0, return `success: false` with explicit error "codex shell-state corrupted: write_stdin failed (stdin closed for session)" + log ERROR. Ops retry against a fresh codex session instead of trusting potentially-corrupted state. Conservative: false-positive case (benign stderr) pinned by a "passes through cleanly when stderr is benign" regression test. ## Tests - `tests/unit/gadgets/github/core/createPR.test.ts` (+3 cases) — 97 KB pushOutput → ~4 KB with marker; 50 KB commitOutput same; small output unchanged. - `tests/unit/gadgets/session/core/sidecar.test.ts` (+4 cases) — refuses empty prUrl; refuses non-string prUrl; refuses non-finite prNumber; refuses empty reviewUrl. Each pairs an ERROR log assertion with a no-write filesystem assertion. - `tests/unit/backends/sidecarManager.test.ts` (+2 cases) — diagnostic WARN includes rawSidecarKeys + rawSidecarPrUrl for valid-JSON-without-prUrl case AND for empty-file case. - `tests/unit/backends/postProcess.test.ts` (+2 cases) — Sentry capture fires under tag `pr_sidecar_invalid` when authoritative PR evidence is missing; does NOT fire on the happy path. - `tests/unit/backends/codex.test.ts` (+2 cases) — codex shell-state signal in stderr → success=false + ERROR log; benign stderr does not regress. ## Verification - `npm test` — 9141 / 9141 passing - `npm run lint` — clean - `npm run typecheck` — clean - All 4 fixes are additive; no behavioral change on healthy paths Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`readRawPRSidecar` returned `null` for all three failure modes (missing file, empty/whitespace file, malformed JSON), so every 'PR sidecar missing required fields' WARN emitted the same `rawSidecarKeys: null` shape — operators could not tell the cases apart without Loki archaeology. Replace the nullable return with a discriminated union (`RawSidecarDiagnostic`) with four statuses: 'missing', 'empty', 'malformed', 'parsed'. Each case carries its own supplementary fields: - 'empty': rawByteLength (confirms zero-byte truncation vs. missing) - 'malformed': rawByteLength + parseError + rawPreview (shows how far the write got and what the JSON syntax error was) - 'parsed': full key list + prUrl/prNumber values (the original Layer-3 signal) - 'missing': status alone (file was never written) Update the two existing regression tests to assert the new `rawSidecarStatus` field. Add a third test for the truncated-mid-write (malformed) case to pin that it is distinguishable from the empty-file case. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ailure Extend the 4 KB truncation to the error paths of `stageAndCommit` and `pushBranch`. Previously, only the success-path `commitOutput`/`pushOutput` fields were capped; a pre-commit or pre-push hook failure that emits 97+ KB of test output still embedded the full output in `err.message`, which `createCLICommand` serialises into the JSON error envelope — leaving the same Codex parser/retry bloat path open. Full hook output stays in the worker engine log (LLMIST_LOG_FILE) for operator visibility. Adds two regression tests pinning the failure-path truncation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…-robustness

codecov · 2026-05-09T17:11:46Z

Codecov Report

❌ Patch coverage is 91.64786% with 74 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/gadgets/shared/cli/params.ts	87.86%	18 Missing and 3 partials ⚠️
src/gadgets/shared/cli/booleanArgv.ts	83.33%	14 Missing ⚠️
src/gadgets/shared/cli/parseErrors.ts	82.05%	13 Missing and 1 partial ⚠️
src/backends/sidecarManager.ts	78.72%	9 Missing and 1 partial ⚠️
src/gadgets/shared/cli/flags.ts	93.13%	7 Missing ⚠️
src/gadgets/github/core/createPR.ts	89.47%	2 Missing ⚠️
src/integrations/alerting/_shared/format.ts	90.47%	0 Missing and 2 partials ⚠️
src/gadgets/shared/cli/errorSink.ts	93.75%	1 Missing ⚠️
src/gadgets/shared/cli/examples.ts	97.91%	0 Missing and 1 partial ⚠️
src/integrations/alerting/_shared/materialize.ts	97.56%	1 Missing ⚠️
... and 1 more

📢 Thoughts on this report? Let us know!

aaight and others added 17 commits May 9, 2026 14:00

docs(trigger): update trigger architecture contracts

0c10ed6

refactor(gadgets): split CLI command factory helpers

b8ca17c

Merge pull request #1289 from mongrel-intelligence/docs/trigger-archi…

26acea9

…tecture-contracts

Merge pull request #1290 from mongrel-intelligence/refactor/cli-comma…

7345285

…nd-factory

Merge pull request #1291 from mongrel-intelligence/feat/sentry-issue-…

ce051a4

…lifecycle-trigger

Merge pull request #1292 from mongrel-intelligence/fix/agent-contract…

ff5048f

…-robustness

zbigniewsobiecki merged commit 7369cf9 into main May 9, 2026
15 of 16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: release - merge dev into main#1293

chore: release - merge dev into main#1293
zbigniewsobiecki merged 17 commits into
mainfrom
dev

zbigniewsobiecki commented May 9, 2026

Uh oh!

codecov Bot commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zbigniewsobiecki commented May 9, 2026

Commits (17):

Uh oh!

codecov Bot commented May 9, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants