Deterministic, offline-first LLM orchestration via audited git patches and human approvals.
llm-orchestrator is a CLI tool for coordinating local Large Language Models (LLMs) to plan and execute tasks in a controlled, auditable way. Instead of chat-based coding, it enforces structured planning, diff-only execution, and human-in-the-loop approvals.
The system is designed to work fully offline with local models (e.g. Ollama), producing validated git patches that can be safely reviewed and applied.
Current LLM tooling tends to be:
- chat-based
- non-deterministic
- hard to audit
- difficult to integrate into real engineering workflows
llm-orchestrator replaces that with:
- explicit plans
- strict execution contracts
- reproducible runs
- verifiable artifacts
This tool is built for developers who want control, not autonomy.
- Deterministic — Fixed prompts, fixed schemas, fixed generation parameters.
- Offline-first — No cloud access required. No telemetry. Local models only.
- Diff-first execution — All code changes are delivered as validated unified git diffs.
- Human approval gates — Nothing progresses without explicit approval.
- Auditability — Every decision, artifact, and normalization step is recorded.
- Model-agnostic — Works with any local model exposed via an OpenAI-compatible API (e.g. Ollama).
- A local-first LLM orchestration CLI
- A planning -> execution pipeline
- A safe way to generate code changes using LLMs
- An auditable alternative to "AI agents"
- Not an autonomous agent
- Not a Copilot replacement
- Not a chat UI
- Not cloud-dependent
- Not self-modifying without approval
A run is a fully isolated execution instance:
- fixed spec
- fixed models
- fixed prompts
- fixed artifacts
Runs are reproducible and auditable.
The Planner model:
- reads a spec
- produces a structured plan (milestones + tasks)
- defines expected artifacts (always
patch.difffor code tasks)
The Executor model:
- executes one approved task at a time
- produces only unified git diffs for code changes
- never writes raw files directly
Each step requires human approval:
- approve plan
- approve task execution
- reject or revise when necessary
The Executor sees only explicitly approved context:
- extracted from approved task diffs
- or manually added via
add-context
This prevents hallucination and accidental drift.
- Node.js 20+
- Ollama installed and running
- At least one local model pulled (e.g.
qwen2.5:14b)
npm install
npm run buildnode dist/cli/main.js init \
--spec ./examples/todo-cli/spec.md \
--planner-model qwen2.5:14b \
--executor-model qwen2.5-coder:14bnode dist/cli/main.js plan --run <run_id>node dist/cli/main.js approve-plan --run <run_id>node dist/cli/main.js run-task --run <run_id> --task T001git apply --check runs/<run_id>/executions/T001/artifacts/patch.diff
node dist/cli/main.js approve-task --run <run_id> --task T001The doctor command validates your local environment and optionally checks a specific run configuration:
# Check global environment (Ollama reachability, available models)
node dist/cli/main.js doctor
# Check a specific run (validates manifest, pack files, model availability)
node dist/cli/main.js doctor --run <run_id>Output format:
- ✅ Success — Check passed
⚠️ Warning — Check passed but with warnings (e.g., model digest differs)- ❌ Failure — Check failed (actionable error message included)
Exit codes:
0— All checks passed or only warnings1— At least one failure occurred
Example output:
✅ Ollama reachable: http://127.0.0.1:11434
✅ Ollama models found: qwen2.5:14b, qwen2.5-coder:14b
✅ Run manifest found: 2026-01-12_143159_spec
✅ Pack manifest: packs/core/manifest.json
✅ Planner prompt: packs/core/planner.system.txt
✅ Executor prompt: packs/core/executor.system.txt
✅ Planner model found: qwen2.5:14b
✅ Executor model found: qwen2.5-coder:14b
node dist/cli/main.js add-context --run <run_id> --file ./src/index.tsnode dist/cli/main.js list-context --run <run_id>- Approved task artifacts
- Explicit allowlist files
The orchestrator automatically generates human-readable Markdown views alongside canonical JSON files:
- plan.md — Generated after successful
planorrevise-plancommands- Includes run metadata, plan summary, milestones, and task status table
- exec.md — Generated after successful
run-taskcommands- Includes execution metadata, artifacts, normalizations, and next actions
- patch.md — Generated after successful
run-taskcommands (if patch.diff exists)- Includes diff statistics and formatted patch content
Markdown files are derived artifacts:
- Read-only and deterministic
- Generated only after JSON validation
- Never used as source of truth
- JSON remains canonical
If markdown generation fails, a warning is logged to audit.log and the command continues.
If a model produces a malformed diff:
- the raw diff is preserved (
patch.raw.diff) - a normalized version is generated (
patch.diff) - all fixes are logged in
exec.response.json
- missing trailing newline
- incorrect hunk counts
No data is silently altered.
runs/<run_id>/
├── manifest.json
├── spec.md
├── plan/
│ ├── plan.request.json
│ ├── plan.response.json
│ ├── plan.approval.json
│ └── plan.md # Markdown view (generated)
├── tasks/
│ ├── T001.task.json
│ └── T002.task.json
├── executions/
│ └── T001/
│ ├── exec.request.json
│ ├── exec.response.json
│ ├── exec.md # Markdown view (generated)
│ ├── approval.json
│ └── artifacts/
│ ├── patch.diff
│ ├── patch.raw.diff
│ ├── patch.md # Markdown view (generated)
│ └── notes.md
├── context/
│ └── allowlist.json
└── audit.log
- No network calls beyond local model API
- No telemetry
- No background execution
- No silent code changes
- No unapproved context access
Apache-2.0
v0.1.1 — dogfood-ready
The system is stable for:
- personal use
- local automation
- controlled engineering workflows
Good catch — that list currently contradicts your stated non-goals and will attract exactly the wrong expectations.
Here’s the corrected, principle-aligned version you should put into README.md.
Future work will focus on incremental, opt-in improvements that preserve the core guarantees of the system:
- Packaging and distribution (single, offline-friendly binaries)
- Additional local or OpenAI-compatible providers
- Improved plan and schema validation
- Optional editor integrations (e.g. VS Code extension)
- Better workflow ergonomics and diagnostics
The following are intentionally out of scope:
- Autonomous agents
- Background execution loops
- Cloud dependencies by default
- Parallel task execution
- Self-modifying orchestrator behavior
- Unapproved or opaque code changes