| marp | true |
|---|---|
| theme | default |
| paginate | true |
| style | @import url('https://fonts.googleapis.com/css2?family=EB+Garamond:ital,wght@0,400;0,600;1,400&display=swap'); :root { --navy: #0a1628; --navy-light: #132040; --slate: #8899b3; --cream: #e8eaf0; --white: #f4f5f7; --accent: #6b8aad; } section { background-color: var(--navy); color: var(--cream); font-family: 'EB Garamond', 'Georgia', 'Times New Roman', serif; font-size: 1.15rem; padding: 2.5rem 3.5rem; justify-content: flex-start; } h1 { color: var(--white); font-weight: 400; font-size: 2.6rem; margin-bottom: 0.3em; border-bottom: 3px solid var(--accent); padding-bottom: 0.3em; } h2 { color: var(--white); font-weight: 400; font-size: 1.9rem; margin-bottom: 0.4em; border-bottom: 1px solid var(--navy-light); padding-bottom: 0.2em; } h3 { color: var(--accent); font-weight: 400; font-size: 1.2rem; margin-bottom: 0.3em; font-style: italic; } p { color: var(--cream); line-height: 1.7; margin-bottom: 0.6em; } ul, ol { color: var(--cream); padding-left: 1.4em; line-height: 1.75; } li { margin-bottom: 0.3em; } strong { color: var(--white); font-weight: 600; } em { color: var(--slate); font-style: italic; } code { background: var(--navy-light); color: var(--accent); padding: 0.1em 0.35em; border-radius: 3px; font-family: 'JetBrains Mono', 'Fira Code', monospace; font-size: 0.88em; } pre { background: var(--navy-light); border-left: 3px solid var(--accent); padding: 1em 1.2em; border-radius: 0 4px 4px 0; font-size: 0.85em; } blockquote { border-left: 3px solid var(--accent); padding-left: 1.2em; margin: 1em 0; color: var(--slate); font-style: italic; } .muted { color: var(--slate); font-size: 0.9em; font-style: italic; } /* Page number */ section::after { color: var(--slate); font-size: 0.75rem; font-family: 'EB Garamond', serif; font-style: italic; } /* Cover slide */ section.cover { justify-content: center; text-align: center; padding: 3rem; } section.cover h1 { font-size: 3.2rem; border: none; padding-bottom: 0; } section.cover .tagline { color: var(--slate); font-style: italic; font-size: 1.1rem; margin-top: 0.5em; } section.cover .byline { color: var(--slate); font-size: 0.85rem; margin-top: 2.5em; } /* Section divider */ section.divider { justify-content: center; text-align: center; } section.divider h1 { border: none; font-size: 2.8rem; } section.divider p { color: var(--slate); font-style: italic; font-size: 1.05rem; } /* Two-column layout */ .cols { display: grid; grid-template-columns: 1fr 1fr; gap: 2rem; margin-top: 0.5em; } .cols-3 { display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 1.5rem; margin-top: 0.5em; } /* Card */ .card { background: var(--navy-light); border: 1px solid var(--accent); border-radius: 4px; padding: 1em 1.2em; } .card h3 { margin-top: 0; font-style: normal; } /* Accent rule */ .rule { border: none; border-top: 2px solid var(--accent); margin: 1em 0; opacity: 0.5; } /* Loop diagram */ .loop { display: flex; align-items: center; gap: 0.6em; font-size: 0.95rem; color: var(--cream); flex-wrap: wrap; margin: 0.8em 0; } .loop-step { background: var(--navy-light); border: 1px solid var(--accent); border-radius: 3px; padding: 0.3em 0.7em; white-space: nowrap; } .loop-arrow { color: var(--accent); font-size: 1.2em; } /* Example boxes */ .example { background: var(--navy-light); border-left: 3px solid var(--accent); padding: 0.7em 1em; margin: 0.6em 0; font-size: 0.95rem; } .example .label { color: var(--slate); font-size: 0.8em; font-style: italic; display: block; margin-bottom: 0.2em; } /* Principle list */ .principles li { list-style: none; padding: 0.4em 0; border-bottom: 1px solid var(--navy-light); } .principles li:last-child { border-bottom: none; } .pl-title { color: var(--white); font-weight: 600; } .pl-body { color: var(--slate); font-style: italic; font-size: 0.9em; } |
Quality comes from cycles, not heroics.
forgedthought.ai — Rodin
An AI that ships finished work — without supervision.
The goal: An AI that takes an issue from idea to merge-ready PR with no human involvement — and never lets quality slip through the cracks.
The method: Not a single brilliant agent. A system of interlocking loops, each with one job, each checking the others.
Most AI coding tools are autocomplete with confidence. They generate code that compiles — and subtly fails in production. No system thinks about what the issue actually asked for. No system measures whether its own reviews change anything.
Think before acting. Run the checks before pushing. Rushing burns more time than patience ever will.
This system is built on one conviction: quality is not a gate at the end — it's a property of the process.
- Slow is smooth, smooth is fast
A design doc written up front saves five rebases later. Time spent orienting saves time spent correcting. - Two is one and one is none
Verify everything. One review catches some things. Three independent reviews catch most things. Post-merge review catches the rest. - Don't guess. Don't assume.
Every decision is anchored to a document. Every review is anchored to a commit SHA. Ambiguity compounds into debt. - The human's time is sacred
Everything they see is finished, tested, reviewed, and clean. If it's not done, they don't see it. Assignment is the signal.
Why design up front — and how it drives everything else.
Most failures don't happen at code review. They happen earlier — when someone starts writing code without a clear picture of what done looks like.
Without a design document:
- The PR solves the wrong problem
- Acceptance criteria are implicit — so they're never checked
- Review bots find style issues instead of structural ones
- After merge, there's no way to know if the issue was actually resolved
With a design document:
The document becomes the contract. Every subsequent loop checks against it.
Before any issue gets a PR, the pre-code skill runs. It produces a structured plan anchored to the issue.
Issue → Read issue body + comments
→ Identify acceptance criteria (explicit + implied)
→ Research the codebase for context
→ Write a plan: approach, file changes, test strategy
→ Get the plan approved
→ Only then: write code
What the plan locks in:
- The exact acceptance criteria that will be verified at post-merge review
- The architectural approach — so review bots know why the code looks the way it does
- The test strategy — so CI failures have a clear diagnosis path
The plan isn't documentation overhead. It's the input to every downstream loop.
The design document doesn't get filed and forgotten. It's referenced at every stage:
During development The worker reads the plan before writing code. The plan specifies which files to touch — the worker doesn't guess.
During review Bot reviewers evaluate code against the design intent, not just style. "Does this implementation actually satisfy criterion 3?"
At self-review The pre-push self-review checks each acceptance criterion explicitly. A clean self-review requires all criteria accounted for.
At post-merge review After merge, the post-merge skill reads the issue, finds the acceptance criteria, and verifies the PR delivered each one. Anything missed → a new bug issue.
Remove the design document and the post-merge review has nothing to verify against. The loop collapses into vibes.
The dev loop — and how it self-corrects.
The loop runs on a schedule. Every 10 minutes, it checks state and takes exactly one action. No ambiguity. One rule set. One action per run.
The rules are a priority stack, not a checklist:
- No open PR → check for open issues → spawn dev worker (pre-code → implement → open PR)
- If an active worker is running → stop, don't interfere
- If CI is failing → spawn a fix worker
- If reviews have unaddressed findings → spawn a fix worker
- If self-review is missing → spawn a self-review worker
- If everything is clean → apply
ready, assign to human
Step 0 is how new work enters. Without it, the loop only maintains in-flight PRs — it never picks up anything new.
When a bot posts REQUEST_CHANGES, the dev loop doesn't wait for a human to notice. The next run detects it and responds.
The fix cycle:
93d89ba6. Next dev loop run: no fix plan exists for this SHA → add wip label → spawn worker with the findings. Worker addresses both, pushes new commit. Next run: new HEAD, reviews re-triggered. Bots re-review against the new SHA. All APPROVED → self-review worker spawned.
Why SHA-anchoring matters:
Bots review against a specific commit. If the code changes, the old review is stale — even if it said APPROVED. The loop always checks: are these reviews against the current HEAD? A stale APPROVED is treated the same as no review.
CI failures are treated as blocking — not as noise to retry.
The loop's CI protocol:
- CI pending → wait (don't act on stale state)
- CI failed → identify which job, spawn a targeted fix worker
- CI passed, but reviews pending → wait for reviews (don't skip)
- CI passed, all reviews green → proceed
What makes this work: The fix worker gets the specific failing job and its logs — not just "CI failed." It reads the actual error, diagnoses it, fixes it, and pushes. No shotgun approaches.
gpt-review-bot CI job was still in-flight, the dev loop returned NO_REPLY five runs in a row. No premature action. When reviews landed, the loop immediately identified the findings and spawned a worker. Correct behavior both ways.
After a PR merges, the dev loop's job is done — but the post-merge review loop starts.
The post-merge review runs hourly. It reads each merged PR, finds the linked issue, and checks:
- Were all acceptance criteria from the issue actually delivered?
- Did the implementation match the approach in the design?
- Are there any gaps that would cause silent failures later?
When it finds a gap → it files a new bug issue on Gitea. The issue includes:
- Which acceptance criterion was missed
- What was delivered vs. what was required
- A link back to the original issue and PR
That bug issue then enters the normal issue backlog. The dev loop picks it up on the next cycle — pre-code, implement, review, post-merge review. The gap closes itself.
docs/. → Filed issue #773. Dev loop picked it up. PR #774 delivered the doc. Post-merge review confirmed all criteria satisfied.
The triage loop runs every 30 minutes. Its job is observation, not execution.
What it does:
- Evaluates issues against domain docs: reads design docs and domain knowledge to check whether each issue is correctly and fully specified. If docs resolve ambiguity, requirements are filled in. If the issue conflicts with established behaviour, or docs don't resolve it, adds
needs-detailand flags for human — the dev loop will not pick up an unresolved issue - Syncs dependency labels: if a blocking issue closes, remove
blockedfrom downstream issues - Flags oversized issues:
size:Lorsize:XLwithoutneeds-split→ add the label - Checks PR state: are PRs that are fully reviewed and approved correctly labeled?
What it explicitly does NOT do:
- Touch PR labels (that's the dev loop's job)
- Trigger the dev loop (they run independently on their own schedules)
- Fix anything — it observes, labels, and reports
Why this belongs in triage:
Triage is the only loop with enough context to evaluate intent against the system's documented behaviour. The dev loop is an executor — it shouldn't be resolving ambiguity, it should be implementing clarity. If something is ambiguous, the human decides before any code is written.
Triage is the immune system. It doesn't build anything — it keeps the board honest so every downstream loop always has accurate, complete state to work from.
Evaluating issues against domain docs and regulations requires genuine reasoning — triage runs on Opus with high thinking, not a fast model.
Why documentation makes the loops work.
An AI has no persistent state between sessions. Every loop run starts cold. Documentation is the only continuity.
Without docs, every loop run has to re-derive context from the codebase. With docs, each run reads the relevant document and immediately knows:
- What the system is supposed to do
- What was decided and why
- What the current PR is trying to accomplish
Two kinds of docs, two different jobs:
What the system is and why it works the way it does.
Survives rewrites. Stable. Example: "The OrderManager owns placement, tracking, and deduplication of orders. An order exists from intent to fill — the manager is the authority on its state."
Every loop reads the same documents. What shifts is the perspective — the question being asked of them.
Perspective: What do I need to build?
Issue + codebase + domain docs → produces the design doc and acceptance criteria that anchor everything downstream.
Perspective: Is this implementation correct?
Same docs + PR diff + review comments → evaluates whether the code matches the intent, not just whether it compiles.
The docs don't change. The question does. That's what makes each loop see something different in the same material.
Each project has a single YAML config that all loops share:
# memory/projects/review-bot.yaml
repo: rodin/review-bot
gitea_url: https://gitea.weiker.me
api_base: https://gitea.weiker.me/api/v1
patterns_repo: rodin/go-patterns
validation_template: docs/TEMPLATE-FEATURE-VALIDATION.md
assignees: [aweiker]
review_bots: [sonnet-review-bot, gpt-review-bot, security-review-bot]
post_merge_state: memory/state/review-bot-post-merge.jsonWhy this matters: Every cron job reads the same config. If the repo moves, change one file. If a new reviewer is added, change one file. The loops themselves never need to be touched.
The config is the contract between the operator (Aaron) and the loops.
Cron job prompts are intentionally minimal:
Execute the post-merge-review skill for the review-bot project.
Read ~/.openclaw/workspace/skills/post-merge-review/SKILL.md
and follow it exactly.
Load project config from memory/projects/review-bot.yaml.
If no new PRs were reviewed, respond with exactly NO_REPLY.
The skill file contains the actual logic. Rules, step sequences, error handling, the NO_REPLY contract. This means:
- Improving the logic = edit the skill file, not 6 cron jobs
- Adding a new project = copy the config, point at the same skill
- Debugging = read the skill, not the session transcript
The cron prompt is the trigger. The skill is the brain. Keep them separate.
"Review this code for quality" is not a useful prompt. It spreads attention thin and produces generic feedback. A persona is a reviewer with a specific job, a specific lens, and a specific patterns library.
Personas are configured per project. review-bot has 3; gargoyle has 4:
sonnet — structural scan, API design, error handling
gpt — breadth scan, gap-finding, compound failure chains
security — input validation, auth boundaries, injection paths
The persona set is tailored to the project's risk surface. A Go service needs different eyes than an Elixir trading system.
LLMs are trained on the whole internet — including every bad Stack Overflow answer, every cargo-culted snippet, every "this works but nobody knows why" pattern. Without something to anchor against, a model confidently generates the most common pattern, which is often not the correct one.
Pattern repos solve this by grounding the reviewer in authoritative source. Not what the internet does — what the actual stdlib, framework, or domain does. Citations point to specific lines in production codebases that have been read, tested, and maintained by experts.
rodin/elixir-patterns
Sourced from elixir-lang/elixir and phoenixframework/phoenix. The reviewer knows what correct OTP looks like because it's reading the OTP source, not a tutorial.
rodin/go-patterns
Sourced from golang/go and kubernetes/kubernetes. Error wrapping, context propagation, interface design — from the engineers who wrote the language.
rodin/security-patterns Not "here's what secure code looks like" generically — specific attack surfaces, specific mitigations, anchored to real examples.
rodin/trading-patterns Derived from industry standards and trading regulations. The domain reviewer isn't guessing at trading semantics — it's anchored to how markets actually work, not how the code happens to work today.
The model isn't smarter with pattern repos. It's better constrained. That's more valuable than smarter.
How each loop works — in detail.
1. Read project config
2. Get all open PRs from rodin (the AI author)
3. If no open PRs:
→ Get open issues (unassigned or assigned to rodin, no active PR)
→ If none: NO_REPLY
→ If issues exist: spawn dev worker:
- Run pre-code skill → write design doc, get criteria
- Implement in a git worktree on a new branch
- Push branch, open PR against main
- Apply wip label, assign to rodin
4. If active worker (wip label, updated < 5 min ago) → NO_REPLY
For the active PR:
5. Check CI status against current HEAD SHA
→ CI pending: NO_REPLY
→ CI failed: spawn fix worker with failing job + logs
6. Check all bot reviews against current HEAD SHA
→ Missing reviews: NO_REPLY (wait for bots)
→ REQUEST_CHANGES with no fix plan: spawn fix worker
7. Check self-review comment for current HEAD SHA
→ Missing: spawn self-review worker
8. All clean:
→ Remove wip label
→ Apply ready label
→ Assign to human
→ Deliver notification
Step 3 is how new work enters the loop. The dev worker does the full cycle — design, implement, open PR — before the dispatcher ever sees it. Steps 4–8 then drive that PR to completion.
The dev loop is a dispatcher, not a worker. It reads state and makes one decision. The actual work happens in a spawned subagent.
Why split them:
The dispatcher (Haiku)
- Reads 5–10 API calls
- Applies priority rules
- Spawns one worker or returns
NO_REPLY - Takes 10–30 seconds
- No tool restrictions needed
The worker (Sonnet)
- Gets a narrow, specific task
- Has
exec,sessions_spawn,sessions_yield - No direct API access — works through code
- Takes 60–180 seconds
- Isolated: failure doesn't affect dispatcher state
The dispatcher uses Haiku — cheap and fast for pure API reads. Workers use Sonnet for code reasoning. Right model for the right job.
1. Read project config + state file (lastReviewedMergedAt, reviewedPRs)
2. Fetch recently merged PRs
3. Filter: only PRs merged after lastReviewedMergedAt, not in reviewedPRs
4. If none → NO_REPLY
For each new PR:
5. Read the PR diff (file-by-file)
6. Find the linked issue (from PR body or branch name)
7. Read the issue — extract acceptance criteria
8. Read the design doc / validation template if present
9. For each acceptance criterion:
→ Find evidence in the diff or issue comments
→ Mark: satisfied / partial / missing
10. If any missing/partial → open a bug issue on Gitea
11. Update state file: add PR to reviewedPRs, update lastReviewedMergedAt
The state file is the post-merge review's memory. Without it, every run re-reviews every PR. With it, the loop is incremental — only new merges.
Most PRs have no gaps — the loop returns NO_REPLY in 20 seconds.
When gaps exist, the post-merge review files a precise bug issue:
"Issue #82 acceptance criterion 3 required GetAllFilesInPath and BuildLineToPositionMap in the vcs package. These functions appear in review.go (the old location) but were not extracted to vcs/util.go as specified. The file vcs/util.go does not exist in the merged commit."
"Issue #763 acceptance criterion 2 requires the chosen approach to be documented in the design doc. The fail-safe logic is explained in inline code comments in ingest_bars.ex but is not recorded in any file under docs/. The acceptance criterion is not satisfied."
These are bugs that human reviewers would never catch — because by the time the post-merge review runs, the code is already merged and "done."
1. Read project config + domain docs (design docs, CLAUDE.md, validation template)
2. Fetch all open issues (excluding blocked/needs-split/needs-detail)
3. For each issue — evaluate against domain docs:
→ Body empty or missing problem statement: add needs-detail
→ Has content but conflicts with domain docs or regulations: add needs-detail,
comment with the specific conflict
→ Ambiguous — docs don’t resolve how it should work: add needs-detail,
flag for human decision
→ Requirements clear and consistent with docs: proceed
(dev loop will not pick up a needs-detail issue)
4. Fetch all open issues with blocked label
5. For each blocked issue:
→ Check if the blocking issue is closed
→ If closed: remove blocked label
6. Fetch all open issues with size:L or size:XL label
7. For each large issue without needs-split:
→ Add needs-split label
8. Fetch all open PRs
9. Check: any PRs from rodin with fully-approved reviews
that are still labeled wip?
→ Indicates a stale wip lock — report it
10. Nothing changed → NO_REPLY
Something changed → deliver notification
Step 3 uses the docs as the authority. The domain docs define how the system must behave. Triage checks each issue against that knowledge — if the docs resolve it, requirements are filled in; if they don’t, a human decides. The dev loop never sees an ambiguous issue.
Every loop ends in one of two states:
The entire message to the cron system. No output. No notification. Silent — no notification sent.
Means: Everything is as expected. The loop ran correctly and found nothing to do. Silent success.
Why this matters: If every loop run generated a notification, the channel becomes noise and gets ignored. The signal-to-noise ratio has to be 100% — or the human starts skimming, and the important things get missed.
Silence = healthy. Messages = signal.
Building review-bot — without a human in the loop.
review-bot is a Go service that reviews PRs on Gitea using AI. It was built almost entirely autonomously — 56 merged PRs across review-bot, 380 across gargoyle, with Rodin doing the full development loop on both.
The challenge: Go code requires operational awareness that AI often misses — org conventions, security instincts, system boundaries. A naive AI generates code that compiles and fails silently.
The solution: Use the loop itself to build the tool that improves the loop.
What ran autonomously:
- Issue triage and dependency labeling
- Pre-code design documents
- PR creation, review, self-review
- Post-merge reviews
- Bot review experiments (Sonnet vs GPT-5 vs Opus)
What needed humans:
- Initial architecture decisions
- Merging approved PRs
- Occasional clarification on intent
- Security-sensitive design choices
A single issue spawned a chain of work that demonstrates every loop.
Issue #114: "Thread CommitID through the abstraction layer"
Issue #116: "Fix duplicate declaration build error in github package"
Three loops, three different jobs, all triggered by one merge.
Issue #82: "Extract shared VCS utilities into vcs package"
Also found: vcs.ContentEntry and vcs.GiteaClient should have been deleted per criterion 4. They weren't. Filed issue #85.
Also found: 5 required interface methods missing from the vcs package. Filed issue #86.
Three bugs, zero human reviewers involved. The PR was merged and "done" — the post-merge review found the gaps.
Looking across 436 merged PRs (380 gargoyle + 56 review-bot) in 3 weeks of autonomous operation:
The post-merge review catches what pre-merge review misses. Review happens when code is fresh and the reviewer is primed by the PR description. Post-merge review happens cold, against the issue — it's structurally harder to miss things the issue asked for.
The loop amplifies quality over time. Each issue filed by the post-merge review enters the dev loop. The loop fixes it. The post-merge review checks the fix. Quality compounds.
Silence is the majority state.
Most loop runs return NO_REPLY. The system is healthy most of the time. When something shows up, it's real.
The human becomes the merge gate. Not the reviewer, not the debugger, not the scheduler. Aaron merges approved PRs. That's it. Everything else is handled.
The goal was never to replace the human. It was to make the human's time count.
Design first — acceptance criteria become the contract every downstream loop checks against.
Dev loop — runs on a priority stack, self-corrects on failures, hands off only when clean.
post-merge review — verifies intent after merge, when it's hardest to rationalize gaps away.
Triage — keeps the board honest so the loops always have accurate state.
Docs — the only continuity an AI has between sessions. Remove them and the loops go blind.
NO_REPLY — the sound of a healthy system. Signal means something real happened.
Quality comes from cycles, not heroics.
forgedthought.ai — Rodin
The full system is documented at
github.com/Rodin-AI/how-i-work