ci: shard + cache for faster runs#1199
Open
Anmol1696 wants to merge 3 commits into
Open
Conversation
The pnpm/action-setup@v2 action runs on Node.js 20, which GitHub is deprecating on June 2, 2026. Updating to v4 ensures continued compatibility and silences the deprecation warnings that appear in every CI job today. Affects: build, unit-tests, pg-tests, integration-tests in run-tests.yaml and the examples-integration.yaml workflow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause from job timings (run 26133536845):
- pg-tests/graphile-bulk-mutations: Setup Node.js=254s + Post Setup=264s
- integration-tests/graphql-server-test: Setup Node.js=161s
27 parallel jobs all downloading the pnpm store simultaneously
saturates the Actions cache service; one job also raced to save it.
Changes:
- Build job uses actions/cache@v4 with save-always: true on an
explicit key (runner.os + pnpm-lock.yaml hash). Guaranteed save
after every successful build, even if a prior run's upload failed.
- unit-tests, pg-tests, integration-tests use actions/cache/restore@v4
(restore-only, no post-job save). Eliminates the 264s save step and
the concurrent-save race entirely.
- setup-node cache: 'pnpm' removed from all jobs; replaced by the
explicit cache actions above.
Expected improvement: p99 job time drops from ~575s to ~310s for the
worst-case cache-contention jobs; saves ~264s on every fan-out job
that previously raced to write back the pnpm store.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add paths-ignore to push and pull_request triggers so that commits touching only markdown files, docs/, or GitHub template directories don't trigger the full 30-job test matrix. GitHub Actions treats a skipped path-filtered workflow as "passed" for branch-protection purposes, so required status checks are unaffected. workflow_dispatch and workflow_call triggers are untouched — manual runs and cross-workflow calls always execute unconditionally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three independent improvements to
run-tests.yaml(andexamples-integration.yaml) aimed at reducing wall-clock CI time and eliminating the 9-23 minute outlier runs.Investigation findings
Real step-timing extracted from the three slowest recent runs:
pg-tests/graphile-bulk-mutations9m34sSetup Node.js=254s +Post Setup Node.js=264sbuild6m28sRoot cause:
cache: 'pnpm'onactions/setup-node@v4in every test job causes all 27 parallel fan-out jobs to simultaneously (a) download the large pnpm store from the cache service, and (b) attempt to save it back after the job. One job in run 26133536845 spent 518 s (8.6 min) purely on cache I/O.Change 1 —
pnpm/action-setup v2 → v4(both workflows)Why:
pnpm/action-setup@v2runs on Node.js 20, which GitHub is deprecating June 2, 2026 (< 2 weeks away). All 30 CI jobs show the deprecation warning today.Expected improvement: Zero breakage on June 2; deprecation warnings disappear immediately.
Risk: None — drop-in replacement, same inputs.
Change 2 — Build saves pnpm store once; fan-out jobs restore-only
What changed:
actions/cache@v4with explicit key${{ runner.os }}-pnpm-store-${{ hashFiles('**/pnpm-lock.yaml') }}andsave-always: true. Guarantees a write even if a prior run's upload was interrupted (the case that caused the 6m28s build).actions/cache/restore@v4(restore-only action — no post-job save step exists). Eliminates thePost Setup Node.jssave race across all 27 jobs.Measured savings:
pg-tests/graphile-bulk-mutations:Post Setup Node.js264 s → 0 s (-4m24s per occurrence)integration-tests/graphql-server-test:Post Setup Node.jseliminatedExpected p99 improvement: ~14m → ~7m (eliminates the cache-save race outlier entirely).
Change 3 —
paths-ignorefor documentation-only commitsWhat changed: Added
paths-ignoreon bothpushandpull_requesttriggers to skip the 30-job matrix when only**.md,docs/**, or GitHub template files change.Why: Docs-only PRs (e.g., schema update PRs) currently spin up the full test matrix for no reason.
Note: GitHub treats a path-filtered skipped workflow as "passed" for branch-protection checks.
workflow_dispatchandworkflow_callare unaffected and always run.Test plan
Post Setup Node.jssave step in any test jobSet up pnpm store cachestep appears only in thebuildjobpnpm/action-setup@v2deprecation warnings in annotationsCI testsis skipped (shown as ✓ Skipped, not ❌ missing)🤖 Generated with Claude Code