Autonomous Agentic Research Swarm is a repo-native framework for executing research through explicit contracts, task state, review gates, and git-scoped work isolation.
This repository should be read framework-first. It currently includes an empirical research project that has been used as a reference implementation to exercise the framework end to end, but the repository itself is not defined by that single project.
The framework is designed to support three research modes:
- empirical
- modeling
- hybrid
Its central operating idea is simple:
the repository is the shared memory
Agents do not rely on hidden conversational context to coordinate. They coordinate through task files, contracts, manifests, review logs, release artifacts, and git history. That makes the workflow inspectable, reproducible, and reviewable.
Most agentic workflows break down on the same points:
- work scope widens implicitly
- state lives in chat instead of durable files
- review and provenance are bolted on after the fact
- parallel work collides because ownership is unclear
- release outputs exist without a clean chain of evidence
This framework addresses those failures directly by treating the repository as both the execution substrate and the control plane.
The framework is built from a small set of explicit architectural pieces.
The local control plane lives under .orchestrator/. Task markdown files carry the authoritative State: field, while folder placement under backlog, active, blocked, and done is only a projection maintained by tooling.
Relevant surfaces:
The framework separates framework policy from project policy.
contracts/framework.jsondefines framework capabilities, roles, states, task semantics, execution engines, and release policy.contracts/project.yamldefines the currently instantiated research project.docs/protocol.mdlocks empirical definitions when the current project is empirical.contracts/model_spec.mdis the modeling specification surface.contracts/hybrid_interface_v1.yamldefines the only allowed empirical-to-modeling bridge.
That separation is deliberate. The framework is meant to be reusable across projects; the current project contract is only one instantiation.
The operating model is built around four formal roles:
Planner: scopes work, writes tasks, maintains workstreams, and manages lifecycle projectionWorker: executes exactly one task in one isolated branch/worktree and edits only the allowed scopeJudge: reruns declared gates, verifies outputs and provenance, and is the only role that can mark scientific workdoneOperator: owns preflight, supervision, repair handling, release assembly, and shared operational surfaces
The role model is enforced through AGENTS.md, contracts/framework.json, and the prompt templates under docs/prompts/.
The framework assumes strict task isolation:
- one task
- one branch
- one worktree
Tasks declare bounded ownership through fields such as:
allowed_pathsoutputsgatesstop_conditions
This keeps parallelism tractable and makes it clear when the correct action is to block rather than improvise.
The merge firewall is designed to stay offline and deterministic by default.
Primary runtime and gate surfaces:
Durable runtime and review artifacts live under:
The framework treats these review bundles as first-class outputs, not optional metadata.
The framework defines two execution paths.
- The default path is the local swarm runtime:
scripts/swarm.pyplus.orchestrator/. - The high-stakes path is the reviewed
staged-workflow-runner, reserved for major replans, architecture rewrites, and release assessments under Operator control.
This separation prevents ordinary task execution from being overloaded with high-consequence synthesis work.
The framework supports the modes declared in contracts/framework.json.
Empirical mode is for workflows that move from source acquisition to processed datasets, validation artifacts, analytical outputs, manuscript source, and release surfaces.
Relevant framework surfaces include:
Modeling mode is for solver, simulation, optimization, or proof-oriented workflows that require explicit instance and experiment definitions rather than informal inputs.
Relevant framework surfaces include:
Hybrid mode is for workflows where empirical outputs are transformed into declared modeling instances through a contract-bound interface.
Relevant framework surfaces include:
The key rule is that modeling work consumes declared instance manifests, not arbitrary empirical data paths.
This repository currently contains both:
- the framework itself
- one active reference project instance
The current project instance, defined in contracts/project.yaml, is an empirical research project on L2-to-L1 rent. It should be understood as a real end-to-end validation of the framework's empirical path, not as the definition of the framework.
At the current state of this repository, the strongest operational evidence is:
- the repo-native control-plane model is working
- the local swarm runtime is working
- the deterministic gate and Judge review path is working
- the empirical mode has been exercised through release assembly on a real project
What is present architecturally but not yet exercised to the same depth:
- full modeling runtime maturity on a populated model specification and live instance set
- full hybrid runtime maturity beyond the current bridge contract
That distinction matters. The framework supports empirical, modeling, and hybrid work by design, but the current deepest evidence comes from the empirical reference implementation.
AGENTS.md: role boundaries and operating rules.orchestrator/: task lifecycle, templates, handoffs, and control-plane statecontracts/: framework policy, project contract, model spec, hybrid interface, schemas, instances, and experimentsdocs/: runbooks, prompts, and protocol documentsscripts/: swarm runtime, quality gates, lifecycle sweep, and release assemblysrc/: implementation surfaces for ETL, validation, analysis, and modelingregistry/: registry surfaces for empirical projectsdata/: raw, processed, sample, and manifest-backed datasetsreports/: validation outputs, figures, tables, models, paper artifacts, release manifests, and review logstests/: fast offline verification
You can use this repository in two different ways.
Use the repo as-is to inspect or extend the current empirical project that has been used to exercise the framework.
Reuse the framework structure and replace the project-specific contract surfaces with a different project instance:
- update
contracts/project.yaml - update the relevant empirical, modeling, or hybrid contracts
- define the task queue under
.orchestrator/ - keep the same role, state, gate, and review semantics
The framework is intended to generalize. The current empirical project is only one concrete instantiation.
- Python
3.11 gitpip
Useful optional tools:
quartotmuxgh
python -m pip install .make gate
make testpython scripts/swarm.py plan --remote origin --base-branch main- AGENTS.md
contracts/framework.json.orchestrator/workstreams.mddocs/runbook_swarm.mddocs/runbook_swarm_automation.md- the current project contract in
contracts/project.yaml
For modeling or hybrid work, then continue with:
contracts/model_spec.mdcontracts/hybrid_interface_v1.yamlcontracts/instances/contracts/experiments/
The framework is built around a small set of strong rules:
- the repository is the shared memory
- task-file state is authoritative
- folder placement is only a projection
- contracts outrank chat
- one task executes in one isolated worktree
- gates stay deterministic and offline by default
- review and release artifacts are required outputs
- agents should stop on ambiguity instead of widening scope informally
This repository now demonstrates a complete empirical reference run through figures, tables, manuscript source, paper build, catalog, and release manifest surfaces. That is evidence that the framework can carry a real research project end to end.
It is not yet evidence that every supported mode has equal runtime maturity. The framework is broader than the current reference project, and the README is written to reflect that boundary explicitly.