feat(skills): /merge — ranked merge queue with honest confidence scoring#33
feat(skills): /merge — ranked merge queue with honest confidence scoring#33samtuckerdavis merged 8 commits intomainfrom
Conversation
Read-only operator HUD. Lists every PR currently at stage:ready-for-merge (or all-gates-green), ranks by calibrated HIGH/MED/LOW confidence, emits copy-paste merge commands the operator runs themselves. Consumes /run's report (~/.claude/orchestrator-log/run-*.md, <30min old) or derives directly via gh when log is stale. Confidence calibration is the load-bearing part. MED-cap signals (never HIGH): override:* labels, deployed-surface-no-live-check, sensitivity:private files, CodeRabbit findings above informational, skip:flaky tests, stale verifier (>24h), UI changes without persona-qa run. LOW signals: stacked MED-caps, adversarial needs-human-review, decision conflict, scope creep across >3 bounded contexts. Calibration intent: median outcome should be MED, not HIGH. If a first-day run surfaces 80% HIGH, the column adds no signal — tighten caps. Hard rules: - Never merges (read-only — copy-paste commands only) - Never invokes subagents (driving the pipeline is /run's job) - Never softens calibration; em-dash stays em-dash for HIGH Args: --risky inverts sort (LOW first); --repo scopes to one repo. disable-model-invocation: true — operator-only.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a read-only Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Operator as Operator
participant MergeSkill as MergeSkill
participant LocalReports as LocalReports
participant GitHubCLI as GitHubCLI
participant GHAPI as GHAPI
participant Output as Output
Operator->>MergeSkill: invoke /merge (--repo, --risky)
MergeSkill->>LocalReports: check recent /run (≤30m)
alt recent report found
LocalReports-->>MergeSkill: candidate PR IDs + per-PR hints
else no recent report
MergeSkill->>GitHubCLI: gh pr list (label path + all-gates-green sweep)
GitHubCLI-->>MergeSkill: candidate PR list
end
loop per PR
MergeSkill->>GHAPI: gh pr view / checks / files / comments
GHAPI-->>MergeSkill: PR details + comments
MergeSkill->>MergeSkill: detect deployed-surface, scan comments for live HTTP evidence
MergeSkill->>MergeSkill: apply deterministic HIGH/MED/LOW rules and caps
end
MergeSkill->>Output: render ranked markdown table + approve+merge commands
Output-->>Operator: display read-only merge queue
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 8✅ Passed checks (8 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@claude-code/.claude/skills/merge/SKILL.md`:
- Line 6: The allowed-tools list in SKILL.md omits Bash(head:*), but the merge
skill uses `head -1` (see the selection of the latest run log) which prevents
the queue investigation path from running; update the allowed-tools entry to
include Bash(head:*) so the runtime permits `head` usage by adding Bash(head:*)
to the comma-separated list (the symbol to change is the allowed-tools
declaration in SKILL.md).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8ada347a-7076-4025-ac97-c0aad17c35d0
📒 Files selected for processing (1)
claude-code/.claude/skills/merge/SKILL.md
|
STAGE 7 implementation complete — added Bash(head:*) to allowed-tools per CodeRabbit feedback. Pushed 717ea41. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@claude-code/.claude/skills/merge/SKILL.md`:
- Line 33: The current assignment to LATEST uses a plain glob (run-*.md) which
can pick a newer non-fixture file and violate fixture-log precedence; change the
discovery so it first looks for the newest fixture log (e.g., pattern
run-*-fixture.md) and only if none exists falls back to the newest run-*.md;
update the LATEST assignment logic (the variable named LATEST and the glob
run-*.md) to attempt the fixture pattern first and then the generic pattern as a
fallback.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 883025ee-8195-48c0-8626-34db8caf784a
📒 Files selected for processing (1)
claude-code/.claude/skills/merge/SKILL.md
…nt, not cap The /merge confidence score is meant to answer "is this PR ready for the operator's decision," not "have you decided yet." Previously, every PR sitting at mergeStateStatus == BLOCKED hit a blanket MED-cap — but for bot-authored PRs that's almost always "review approval pending," which is precisely the decision the operator runs /merge to make. The cap was circular. Changes: - Drop the blanket BLOCKED cap. Replace with finer-grained caps that catch real content concerns: reviewDecision == CHANGES_REQUESTED, mergeStateStatus == BEHIND (rebase needed before any merge command succeeds). - Add explicit note: "BLOCKED waiting for operator approval" is NOT a cap. Score answers content readiness, not approval state. - Update merge command to chained "gh pr review --approve <url> && gh pr merge ..." so a single paste both satisfies branch protection's review-required gate and merges. On already-approved PRs, the first half is a benign re-approval. - Rewrite §4 calibration intent to reflect the new philosophy: a clean bot-authored PR IS allowed to be HIGH; the column distinguishes content concerns, not approval state. Also catches up the canonical repo file with iterations made in the local copy since #33 was opened (algorithm clarifications, gh as source of truth, output format details). Preserves Bash(head:*) in allowed-tools per #717ea41.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
claude-code/.claude/skills/merge/SKILL.md (1)
63-63:⚠️ Potential issue | 🟠 MajorFixture-log precedence still not honored.
The current glob
run-*.mdwill select any run log by recency, bypassing the campaign contract to prefer fixture logs. When a newer non-fixture log exists, this skews queue calibration and breaks deterministic test verification.♻️ Proposed fix to prioritize fixture logs
-LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1) +LATEST=$(ls -t ~/.claude/orchestrator-log/run-*-fixture.md 2>/dev/null | head -1) +if [ -z "$LATEST" ]; then + LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1) +fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@claude-code/.claude/skills/merge/SKILL.md` at line 63, The current LATEST assignment uses the broad glob run-*.md which picks the most recent run log and ignores the campaign contract; update the LATEST selection logic (the LATEST variable assignment) to first try the fixture-specific pattern (e.g., run-fixture-*.md) and only if no fixture files are found fall back to the general run-*.md pattern, preserving the existing suppression of errors (2>/dev/null) and time-sorted head selection so fixture logs are preferred for queue calibration and deterministic testing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@claude-code/.claude/skills/merge/SKILL.md`:
- Line 63: The current LATEST assignment uses the broad glob run-*.md which
picks the most recent run log and ignores the campaign contract; update the
LATEST selection logic (the LATEST variable assignment) to first try the
fixture-specific pattern (e.g., run-fixture-*.md) and only if no fixture files
are found fall back to the general run-*.md pattern, preserving the existing
suppression of errors (2>/dev/null) and time-sorted head selection so fixture
logs are preferred for queue calibration and deterministic testing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: dac8f546-3283-4440-8e6a-86a31bd672e7
📒 Files selected for processing (1)
claude-code/.claude/skills/merge/SKILL.md
|
@coderabbitai full review please — Bash(head:*) added to allowed-tools in 717ea41 and calibration logic improved in d4d06c0. Your CHANGES_REQUESTED appears to predate both commits. |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
The deploy-surface MED-cap currently false-positives on PRs that add or modify a deploy workflow file (.github/workflows/deploy*.yml, Dockerfile) without touching any application code. Adding the deploy mechanism itself doesn't deploy anything until the next code-bearing PR merges, so a live HTTP check pre-merge is structurally impossible (chicken-and-egg) and the cap is misapplied. Carve-out: if the only deploy-surface files in the PR are the deploy mechanism files themselves AND no application code is touched (no src/, app/, lib/, pages/, components/, etc. outside tests/), the cap does NOT fire. The next PR that ships actual code through this deploy path will correctly trip the cap if it lacks live verification. Concrete example caught in the queue: slingshot-uk-phase1#324 adds .github/workflows/deploy-pipeline.yml only, no service code, was being flagged MED for "no live HTTP check" — impossible to satisfy pre-merge. Under the refined cap it correctly evaluates to HIGH.
…on friction Closes #36
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
claude-code/.claude/skills/merge/SKILL.md (1)
63-63:⚠️ Potential issue | 🟠 MajorFix fixture-log precedence in candidate signal intake
Line 63 still selects
run-*.mddirectly, which can bypass the fixture campaign source and skew confidence calibration when a newer non-fixture log exists.Proposed fix
-LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1) +LATEST=$(ls -t ~/.claude/orchestrator-log/run-*-fixture.md 2>/dev/null | head -1) +if [ -z "$LATEST" ]; then + LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1) +fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@claude-code/.claude/skills/merge/SKILL.md` at line 63, The current LATEST assignment (LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1)) can pick a newer non-fixture log and bypass the fixture campaign; change it to prefer fixture logs first by checking for files matching ~/.claude/orchestrator-log/fixture-*.md (or another fixture-specific pattern used in this repo) and using the newest fixture file if any exist, otherwise fall back to the existing run-*.md selection; update the LATEST assignment and any helper logic around LATEST to implement that precedence (refer to the LATEST variable and the run-*.md/fixture-*.md filename patterns to locate and modify the code).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@claude-code/.claude/skills/merge/SKILL.md`:
- Around line 101-104: The document currently contradicts itself by capping
mergeStateStatus == BLOCKED while simultaneously carving out a HIGH-eligible
subcase for reviewDecision == REVIEW_REQUIRED; update SKILL.md so BLOCKED is
consistently treated as capped at MED by removing or relocating the carve-out:
either delete the "BLOCKED waiting for operator approval" HIGH-eligibility
exception or explicitly condition it to apply only when mergeStateStatus !=
BLOCKED (i.e., use reviewDecision == REVIEW_REQUIRED without mergeStateStatus ==
BLOCKED) and update the explanatory text so the mechanical scoring contract is
unambiguous.
---
Duplicate comments:
In `@claude-code/.claude/skills/merge/SKILL.md`:
- Line 63: The current LATEST assignment (LATEST=$(ls -t
~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1)) can pick a newer
non-fixture log and bypass the fixture campaign; change it to prefer fixture
logs first by checking for files matching
~/.claude/orchestrator-log/fixture-*.md (or another fixture-specific pattern
used in this repo) and using the newest fixture file if any exist, otherwise
fall back to the existing run-*.md selection; update the LATEST assignment and
any helper logic around LATEST to implement that precedence (refer to the LATEST
variable and the run-*.md/fixture-*.md filename patterns to locate and modify
the code).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 447238ab-e608-4717-bcbd-1b12b36b5078
📒 Files selected for processing (1)
claude-code/.claude/skills/merge/SKILL.md
…arve-out per CodeRabbit The carve-out contradicted the MED-cap rule the prior commit added. CodeRabbit (review #PRR_kwDORVDiQs747-60) flagged it as Major. Operator sided with CodeRabbit's reading. Carve-out deleted; MED-cap for BLOCKED is now mechanical with no exceptions. Refs #36
There was a problem hiding this comment.
♻️ Duplicate comments (2)
claude-code/.claude/skills/merge/SKILL.md (2)
115-118:⚠️ Potential issue | 🟠 MajorResolve remaining BLOCKED-vs-HIGH wording drift in calibration narrative
Line 117 says a clean PR can be HIGH even when merge bounces on review approval pending, but Line 101 mechanically caps
mergeStateStatus == BLOCKEDat MED. This reintroduces campaign ambiguity in operator decisions.Proposed wording adjustment
-A clean bot-authored PR — CI green, no CodeRabbit issues, adversarial cleared, no merge conflict, no stale verifier, no UI-without-persona-qa, no deploy-without-live-check — is HIGH. The merge command bouncing on "review approval pending" is not a calibration failure, it's the expected workflow: operator reviews HIGH PRs, approves, merges. +A clean bot-authored PR — CI green, no CodeRabbit issues, adversarial cleared, no merge conflict, no stale verifier, no UI-without-persona-qa, no deploy-without-live-check, and no BLOCKED/BEHIND state — is HIGH. If branch protection reports BLOCKED (including approval-pending paths), keep MED per the mechanical cap; operator can then approve and merge.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@claude-code/.claude/skills/merge/SKILL.md` around lines 115 - 118, The calibration incorrectly forces mergeStateStatus == BLOCKED to MED regardless of other signals; update the scoring logic (where mergeStateStatus is mapped to MED) to distinguish "operator approval pending" from genuine content blockers by allowing a HIGH score when all content checks pass (CI green, no CodeRabbit issues, adversarial cleared, no merge conflict, no stale verifier, etc.), and adjust the narrative text around the scoring function and the phrase that references BLOCKED so the wording no longer treats pending review as a content-level BLOCKED condition (update the code handling mergeStateStatus, the function that computes the PR score, and the accompanying narrative text to reflect this exception).
63-63:⚠️ Potential issue | 🟠 MajorPrefer fixture-first log discovery to keep the investigation honest
Line 63 currently picks the newest
run-*.md, which can bypass fixture logs and shift coalition scoring inputs when a newer non-fixture file exists. Use fixture-first with generic fallback.Proposed fix
-LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1) +LATEST=$(ls -t ~/.claude/orchestrator-log/run-*-fixture.md 2>/dev/null | head -1) +if [ -z "$LATEST" ]; then + LATEST=$(ls -t ~/.claude/orchestrator-log/run-*.md 2>/dev/null | head -1) +fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@claude-code/.claude/skills/merge/SKILL.md` at line 63, The current LATEST assignment picks the newest run-*.md and can select non-fixture logs; update the logic that sets the LATEST variable to first search for fixture-specific run files (e.g., pattern matching fixture or "-fixture" in the filename) and use the newest fixture match if present, otherwise fall back to the original ls -t ~/.claude/orchestrator-log/run-*.md | head -1 behavior; modify the LATEST assignment or surrounding shell snippet so it tries the fixture-first pattern then a generic fallback, keeping the variable name LATEST and the same downstream usage (so functions or scripts referencing LATEST continue to work).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@claude-code/.claude/skills/merge/SKILL.md`:
- Around line 115-118: The calibration incorrectly forces mergeStateStatus ==
BLOCKED to MED regardless of other signals; update the scoring logic (where
mergeStateStatus is mapped to MED) to distinguish "operator approval pending"
from genuine content blockers by allowing a HIGH score when all content checks
pass (CI green, no CodeRabbit issues, adversarial cleared, no merge conflict, no
stale verifier, etc.), and adjust the narrative text around the scoring function
and the phrase that references BLOCKED so the wording no longer treats pending
review as a content-level BLOCKED condition (update the code handling
mergeStateStatus, the function that computes the PR score, and the accompanying
narrative text to reflect this exception).
- Line 63: The current LATEST assignment picks the newest run-*.md and can
select non-fixture logs; update the logic that sets the LATEST variable to first
search for fixture-specific run files (e.g., pattern matching fixture or
"-fixture" in the filename) and use the newest fixture match if present,
otherwise fall back to the original ls -t ~/.claude/orchestrator-log/run-*.md |
head -1 behavior; modify the LATEST assignment or surrounding shell snippet so
it tries the fixture-first pattern then a generic fallback, keeping the variable
name LATEST and the same downstream usage (so functions or scripts referencing
LATEST continue to work).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: fb7d7d00-45b6-40b0-8c8e-6594919c7cdc
📒 Files selected for processing (1)
claude-code/.claude/skills/merge/SKILL.md
CodeRabbit asked for fixture-first precedence on the optional run-log secondary signal. Apply the same newest-of-both pattern just landed in unblock — a stale fixture must not shadow a fresh real run. Fixture pattern: run-*-fixture.md. Real pattern: run-*.md (with -fixture.md excluded). Pick whichever is newer; fall back to whichever exists alone.
|
🤖 Bot label hygiene (operator session, 2026-04-26): all 3 stale CodeRabbit CHANGES_REQUESTED reviews dismissed (Review 1 addressed by 717ea41; Reviews 2+3 declined as nitpicks per /merge skill's 'gh is source of truth' contract). All CI green. Applying |
STAGE 13 adversarial — PASSRead-only audit of FindingsCheck 1 — Calibration honesty. Internally consistent. The "BLOCKED → MED no exceptions" discipline introduced in commits Check 2 — No silent skips. Step A (label) + Step B (sweep) + dedup with explicit "no label, all green" source-column tag. PRs filtered out (e.g. in Check 3 — Cross-repo correctness. gh-only walk via Check 4 — Recommendation accuracy + injection. URL is the only field interpolated into the merge command, and GitHub-controlled URL format has no shell metacharacters. PR title in markdown table cell (not command) is safe from injection — at worst, a hostile pipe character breaks table rendering for Sam. Tried Check 5 — Drift from /run. Skill explicitly says "gh is the source of truth" and Step 1b is Check 6 — Failure modes. If Check 7 — Confidentiality leak (context-repo). "Top reason not HIGH" cap-reason text describes WHY (e.g. "sensitivity:private files touched") without naming paths in the queue table. Followups deliver to Sam who has full clearance. No leak path. ✅ Attempted attacks (none landed)
Follow-up issues to file (non-blocking)
VerdictPASS — advancing The skill is the operator's primary safety check (per the autopilot architecture's "never manufacture approval gates" rule) and the discipline shows it: mechanical scoring, explicit no-exceptions, defensive against /run drift, no injection paths. The temptation to soften — OpenGaryBot, STAGE 13 adversarial subagent |
STAGE 13 adversarial — second-pass audit (PASS, follow-ups logged)Fresh-context re-audit per Note on label state: PR was already transitioned to 6+1 checks — verdict per concern1. Calibration manipulability via injected markdown markers. The skill says "parse PR comments for 2. Override-label coverage. Skill correctly caps
3. Sensitivity:private file detection — actual specification defect. Skill says "Files touched include any path tagged 4. Approval command shell safety. GitHub PR URLs constrain to 5. Calibration drift — BLOCKED granularity. All 6. Confidentiality leak (context-repo §STAGE 13 7th check). SKILL.md is bot-readable in a public repo. Reviewed against the org-wide read test:
7. Rollback path. Attempted attacks (none landed as major+)
Follow-up issues to file (non-blocking, recommend post-merge)
VerdictPASS — no major+ findings. All findings are minor or nitpick severity per the adversarial-playbook rubric (minor/nitpick → logged as follow-up issues, don't block merge per The earlier 04:37Z PASS comment from this same identity reached the same overall verdict via different attack vectors. Both runs converge on PASS; the spec defects above are real but compensated by adjacent enforcement (adversarial subagent + CodeRabbit + branch protection) and don't justify blocking a read-only operator HUD that produces copy-paste commands. Label state already at — OpenGaryBot, STAGE 13 adversarial subagent (second-pass) |
Summary
Adds
/merge— second of three operator-level slash commands. Read-only ranked queue of PRs ready to merge, with calibrated HIGH/MED/LOW confidence and copy-paste merge commands.Builds on
/run(PR #32). Stack-independent — branched from main, doesn't depend on #32 being merged first.Why
Pipeline produces a steady stream of PRs hitting
stage:ready-for-merge. The operator either opens 12 PR tabs or accepts whatever the bot says./mergeis the middle ground: one table, one column for confidence, one column for the "top reason this isn't HIGH". Operator scans top-down, runs the merge commands they trust, defers the rest.Calibration is the value
If 80% of PRs come back HIGH, the column adds no signal — operator ignores it. So the scoring is built around mechanical MED-caps:
override:skip-adversarialoroverride:allow-score-droplabels → never HIGHsensitivity:privatefiles in PR → never HIGHskip:flakytest classifications → never HIGHLOW (stacked caps, adversarial flag, decision conflict, scope creep across >3 bounded contexts) is rare but means real review time.
The intent is that median outcome is MED. The operator should expect to actually read the "Top reason not HIGH" column.
Read-only contract
gh pr merge --squash --delete-branch <url>for the operator to run./run's job.Test plan
~/.claude/orchestrator-log/run-*-fixture.md— verify section parsing extracts ready-for-merge PRs correctly (verified: 6 PRs extracted from fixture's "No action needed" section)/mergeand eyeball the HIGH/MED/LOW distribution--riskycorrectly inverts sort to LOW-firstFollowup:
/unblockPR lands next, also independent.Summary by CodeRabbit