Skip to content

fix(ci): handle clean internal-docs sync PRs#901

Open
Nikhil (shadowfax92) wants to merge 2 commits intodevfrom
fix/itnernal-doc-sync
Open

fix(ci): handle clean internal-docs sync PRs#901
Nikhil (shadowfax92) wants to merge 2 commits intodevfrom
fix/itnernal-doc-sync

Conversation

@shadowfax92
Copy link
Copy Markdown
Contributor

Summary

  • Fixes the internal-docs sync workflow failure where gh pr merge --auto exits when GitHub reports a newly created sync PR as CLEAN.
  • Reuses an existing open sync PR/branch so retries do not strand duplicate bot PRs.
  • Adds workflow concurrency and head-SHA matching around merge operations.

Design

The workflow now creates or updates one bot/sync-internal-docs PR, then inspects mergeStateStatus before merging. CLEAN PRs are merged directly with squash; non-clean PRs use auto-merge so branch requirements can finish first. Existing sync branches are updated with --force-with-lease and --match-head-commit prevents merging a PR head that changed unexpectedly.

Test plan

  • ruby -ryaml -e 'YAML.load_file(".github/workflows/sync-internal-docs.yml"); puts "yaml ok"'
  • ruby -ryaml -e 'workflow = YAML.load_file(".github/workflows/sync-internal-docs.yml"); puts workflow.fetch("jobs").fetch("sync").fetch("steps").last.fetch("run")' | bash -n
  • git diff --check
  • Verified the live stale sync PR query returns PR chore: sync internal-docs submodule #899 and reports mergeStateStatus: CLEAN.

@shadowfax92
Copy link
Copy Markdown
Contributor Author

Greptile (@greptileai) review

@shadowfax92
Copy link
Copy Markdown
Contributor Author

Claude (@claude) review

@github-actions github-actions Bot added the fix label May 1, 2026
@claude
Copy link
Copy Markdown

claude Bot commented May 1, 2026

Claude finished Nikhil (@shadowfax92)'s task in 1m 38s —— View job


Code Review

  • Gather context
  • Read changed file
  • Analyze diff
  • Post review feedback

Overall this is a solid, well-scoped fix. The design correctly addresses the root cause, and the helper function is clean. A few things worth considering:

The good

  • The merge_or_enable_auto_merge helper cleanly separates the CLEAN/non-CLEAN path and is easy to reason about.
  • --match-head-commit is a great defensive touch — prevents merging a PR whose head changed between the point you pushed and the point the merge runs.
  • Switching from a timestamped branch (bot/sync-internal-docs-YYYYMMDD-HHmmss) to a static name (bot/sync-internal-docs) is the right call for idempotent bot PRs.
  • --force-with-lease with an explicitly fetched remote SHA (lines 99–105) is more robust than a plain --force-with-lease and avoids clobbering unexpected concurrent pushes.
  • cancel-in-progress: false is correct here — you want queued runs to execute rather than be discarded, so no sync window is silently skipped.

Issues / things to consider

1. --match-head-commit + auto-merge + force-push interaction (subtle race)

If run 1 sets auto-merge on an open PR (non-CLEAN state) and run 2 (queued by the concurrency group) wakes up, finds the same open PR via EXISTING_SYNC_PR, and force-pushes a new commit to it, the auto-merge from run 1 is now attached to a stale SHA and will fail silently. The PR stays open, waiting for a merge that will never complete. The next scheduled run (4 hours later) would recover, but that's a silent failure mode.

Since auto-merge is inherently async, the safest approach is to cancel any pending auto-merge before force-pushing, or to simply not reuse a PR that already has auto-merge enabled. You could check:

AUTO_MERGE=$(gh pr view "$PR_URL" --json autoMergeRequest --jq '.autoMergeRequest // empty')
if [ -n "$AUTO_MERGE" ]; then
  gh pr merge --disable-auto "$PR_URL"
fi

before the push, then re-enable at the end.

2. UNKNOWN exhaustion falls through silently

After 5 polls (15s total), if mergeStateStatus is still UNKNOWN, the function falls into the else branch and calls --auto. That's safe as a fallback, but it's worth knowing this path exists. A log line would help:

if [ "$attempt" = "5" ] && [ "$merge_state" = "UNKNOWN" ]; then
  echo "Merge state still UNKNOWN after 5 attempts; falling back to auto-merge."
fi

3. Timing window between PR check and gh pr create

The check for an existing open PR (line 78) and the gh pr create call (line 108) are not atomic. If two workflow runs both see no open PR and pass the if [ -z "$PR_URL" ] guard simultaneously, both will attempt gh pr create. With cancel-in-progress: false this should be rare (GitHub queues jobs, not just the steps), but it's worth noting gh pr create will fail with a 422 if a PR for that branch already exists. Since the outer script has set -e, this would cause the whole run to fail. A || true or explicit error handling on gh pr create would make this more resilient.

4. Static branch + closed PR scenario

If bot/sync-internal-docs was merged/closed previously but the branch was not deleted (e.g. the --delete-branch flag failed), git ls-remote on line 99 would find a stale remote branch and force-push over it. This is probably fine in practice, but worth being aware of.

5. Minor: git rev-parse HEAD vs the pushed SHA

Line 115 passes $(git rev-parse HEAD) as the head_sha to match. HEAD after a git commit but before any branch manipulation is the just-committed SHA — this is correct. Just confirming it's intentional: if git push --force-with-lease failed above, set -e would have aborted, so we'll never reach line 115 with an un-pushed commit.


Verdict

The core fix is correct and the code is clearly written. Item #1 above is the most interesting failure mode worth addressing before merge; items #2#5 are lower priority but would make the workflow more robust. Happy to look at a follow-up or inline fix.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 1, 2026

Greptile Summary

This PR fixes the sync-internal-docs CI workflow where gh pr merge --auto would exit non-zero on GitHub-CLEAN PRs, by adding a mergeStateStatus polling helper that squash-merges directly when clean and falls back to auto-merge otherwise. It also reuses the existing bot/sync-internal-docs branch/PR across retries, adds a --force-with-lease guard on force-pushes, uses --match-head-commit for safety, and adds a concurrency group to prevent parallel runs.

  • P1: gh pr list has a default limit of 30 results; if more than 30 PRs are open, the existing sync PR can be missed, causing gh pr create to fail on the already-open branch.

Confidence Score: 3/5

Safe to merge after addressing the missing --limit on gh pr list, which can cause the workflow to fail in repos with many open PRs.

One P1 finding (missing --limit 100 on gh pr list breaks the reuse-existing-PR guarantee when >30 PRs are open) lowers confidence from the P1 ceiling of 4; the overall fix logic is sound and well-structured.

.github/workflows/sync-internal-docs.yml — specifically the gh pr list call at line 78.

Important Files Changed

Filename Overview
.github/workflows/sync-internal-docs.yml Adds concurrency guard, CLEAN-state direct-merge path, and existing-PR reuse logic; one P1 (missing --limit on gh pr list) and one P2 (spurious sleep on final UNKNOWN-state retry).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Workflow triggered] --> B{Submodule configured?}
    B -- No --> C[Exit 0]
    B -- Yes --> D[git submodule update --remote]
    D --> E{Any changes?}
    E -- No --> C
    E -- Yes --> F[gh pr list open sync PRs]
    F --> G{Existing PR found?}
    G -- Yes --> H[Reuse branch + PR URL]
    G -- No --> I[Use new branch 'bot/sync-internal-docs']
    H --> J[git checkout -B branch]
    I --> J
    J --> K[git add + commit]
    K --> L{Branch exists on origin?}
    L -- Yes --> M[fetch remote SHA\ngit push --force-with-lease]
    L -- No --> N[git push]
    M --> O{PR URL set?}
    N --> O
    O -- No --> P[gh pr create]
    O -- Yes --> Q[call merge_or_enable_auto_merge]
    P --> Q
    Q --> R[Poll mergeStateStatus\nup to 5x3s]
    R --> S{State?}
    S -- CLEAN --> T[gh pr merge --squash\n--match-head-commit]
    S -- other --> U[gh pr merge --auto --squash\n--match-head-commit]
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
.github/workflows/sync-internal-docs.yml:78-82
**`gh pr list` default limit may miss the existing sync PR**

`gh pr list` defaults to returning at most 30 results. If the repo has more than 30 open PRs and the sync PR falls outside that window, `EXISTING_SYNC_PR` will be empty and the workflow will attempt `gh pr create` with a head branch that already has an open PR — causing the step to fail. Adding `--limit 100` (or a similar safe upper bound) ensures the existing bot PR is always found.

```suggestion
          EXISTING_SYNC_PR=$(gh pr list \
            --base dev \
            --state open \
            --limit 100 \
            --json headRefName,url \
            --jq 'map(select(.headRefName | startswith("bot/sync-internal-docs"))) | first // empty | [.url, .headRefName] | @tsv')
```

### Issue 2 of 2
.github/workflows/sync-internal-docs.yml:45-53
**Unnecessary `echo`+`sleep` executed on the final loop iteration**

When `mergeStateStatus` remains `UNKNOWN` after all 5 attempts, the `echo` and `sleep 3` at the bottom of the loop body still run on iteration 5 before the `for` exits — adding a wasted 3-second delay and a misleading "attempt 5/5" message. Guarding with `[ "$attempt" -lt 5 ]` avoids this.

Reviews (1): Last reviewed commit: "fix(ci): handle clean internal-docs sync..." | Re-trigger Greptile

Comment thread .github/workflows/sync-internal-docs.yml
@shadowfax92
Copy link
Copy Markdown
Contributor Author

Review follow-up pushed in acde1ed: added --limit 100 to the sync PR lookup, removed the final unnecessary UNKNOWN-state sleep, and disabled existing auto-merge before force-pushing a reused sync branch. I left the PR-create race path unchanged because the workflow-level concurrency group serializes these sync runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant