Ralph is an autonomous ML engineering agent designed to help teams iterate on machine learning projects by automating the experiment loop and accumulating knowledge. It works like an experienced Machine Learning Engineer, reasoning carefully, documenting decisions, and respecting user guidance.
The project consists of two complementary components:
- ml-ralph (TypeScript/Bun) - Terminal User Interface
- ml-ralph-agent (Python) - Core agent logic and cognitive loop
A terminal user interface (TUI) built with Ink (React for terminals) that provides two interaction modes for working with the Ralph agent.
- Runtime: Bun
- TUI Framework: Ink (React for terminals)
- Language: TypeScript
- State Management: Zustand
- Agent Integration: Claude Code (via subprocess)
- Experiment Tracking: Weights & Biases (W&B)
- Markdown Rendering: ink-markdown
Ralph follows a clean layered architecture:
┌─────────────────────────────────────────────────────────┐
│ UI Layer │
│ (Ink/React Components - Planning & Monitor screens) │
├─────────────────────────────────────────────────────────┤
│ Application Layer │
│ (AgentOrchestrator, UIState, Commands) │
├─────────────────────────────────────────────────────────┤
│ Domain Layer │
│ (Pure types, validation, story selection logic) │
├─────────────────────────────────────────────────────────┤
│ Infrastructure Layer │
│ (FileStore, ClaudeCodeClient, WandBClient, Process) │
└─────────────────────────────────────────────────────────┘
-
UI Layer (
src/ui/)- Planning and Monitor screens
- Widgets: chat panel, metrics, learnings, research, stories
- All state flows down via props/hooks
-
Application Layer (
src/application/)AgentOrchestrator: The "brain" that manages the agent loop and story selectionUIState: Manages what the UI needs to display- Commands: Handle user actions and state transitions
-
Domain Layer (
src/domain/)- Pure types and validation
- Story selection logic
- No external dependencies
-
Infrastructure Layer (
src/infrastructure/)FileStore: Read/write.ml-ralph/configuration filesClaudeCodeClient: Spawn and communicate with Claude CodeWandBClient: Fetch metrics from Weights & BiasesProcessManager: Manage training jobs
- Chat with Claude Code to create or refine your PRD
- View accumulated learnings from past iterations
- Review research the agent has gathered
- See your story backlog and prioritize work
- Watch the agent execute stories in real-time
- View experiment metrics and training curves via W&B
- See current story and hypothesis being tested
- Control the agent (start/stop) and training jobs
| Key | Action |
|---|---|
Tab |
Switch between Planning and Monitor modes |
1/2/3 |
Switch tabs in Planning mode |
i or Enter |
Enter chat input mode |
Esc |
Exit input mode / dismiss errors |
s |
Start/Stop agent |
t |
Stop training job (Monitor mode) |
w |
Open W&B dashboard |
q |
Quit |
src/
├── ui/
│ ├── app.tsx # Main app component
│ ├── screens/
│ │ ├── planning.tsx # Planning mode screen
│ │ └── monitor.tsx # Monitor mode screen
│ ├── widgets/
│ │ ├── chat-panel.tsx # Claude Code chat interface
│ │ ├── learnings.tsx # Learnings display
│ │ ├── research.tsx # Research findings
│ │ ├── stories.tsx # Story backlog
│ │ └── metrics.tsx # W&B metrics display
│ └── hooks/
│ └── index.ts # React hooks
├── application/
│ └── orchestrator/
│ ├── orchestrator.ts # Agent orchestration logic
│ └── types.ts # Orchestrator types
├── domain/
│ └── types.ts # Core domain types
└── infrastructure/
├── file-store/ # File system operations
├── wandb/ # W&B integration
└── process/ # Process management
The core agent logic that runs through a structured cognitive loop, integrating with Claude Code or Codex CLIs.
Repository: github.com/pentoai/ML-Ralph
Package: pip install ml-ralph or uv tool install ml-ralph
Version: 0.3.0
- CLI Framework: Typer
- Terminal Output: Rich
- Python: 3.10+
- Integration: Claude/Codex CLI via subprocess streaming
Initialize Ralph in a project:
- Creates
.ml-ralph/directory structure - Sets up templates (RALPH.md, prd.json placeholders)
- Installs skills for Claude Code and Codex CLIs
- Copies configuration files to project root
Execute the autonomous loop:
- Runs up to N iterations (default: 100)
- Integrates with Claude Code CLI via streaming JSON protocol
- Displays tool invocations and progress in real-time
- Automatically handles exit conditions
Ralph works through a structured cycle, looping from DECIDE back to HYPOTHESIZE:
ORIENT → RESEARCH → HYPOTHESIZE → EXECUTE → ANALYZE → VALIDATE → DECIDE
↑ │
└─────────────────────────────────────────┘
- ORIENT - Understand the problem, constraints, failure modes
- RESEARCH - Learn from existing knowledge, find SOTA approaches
- HYPOTHESIZE - Form testable bets with expected outcomes
- EXECUTE - Implement minimal changes, run experiments
- ANALYZE - Examine results, find failure patterns
- VALIDATE - Check for leakage, ensure results are trustworthy
- DECIDE - Keep/revert/pivot, update PRD based on evidence
- Triggered when no
prd.jsonexists - Agent asks clarifying questions one at a time
- User provides ML problem context
- Agent writes
.ml-ralph/prd.jsonwhen ready - Status transitions to "approved" when user says
/start
- Triggered when
prd.jsonexists withstatus: "approved" - Agent works autonomously through cognitive loop
- Each iteration: reads state files, executes one phase, updates logs
- Checks
inbox.jsonfor user commands each iteration - Updates PRD based on evidence (refinement)
- Stops when all success criteria are met → outputs
<promise>COMPLETE</promise>
- Evidence over intuition - Log what you observed, not what you expected
- One hypothesis at a time - No simultaneous testing
- Minimal changes - Smallest experiment to test hypothesis
- Skepticism - Metrics suspicious until proven trustworthy
- Error-driven - Find patterns in failures
Commands written to inbox.json:
hint- Provide guidance without stoppingpause- Pause executionresume- Resume executionredirect- Change direction
interface PRD {
projectName: string;
description: string;
goals: string[];
successCriteria: SuccessCriterion[];
constraints: string[];
scope: {
inScope: string[];
outOfScope: string[];
};
dataSources: DataSource[];
evaluationStrategy: EvaluationStrategy;
stories: Story[];
status: "draft" | "approved" | "completed";
}interface Story {
id: string;
type: "discovery" | "experiment" | "evaluation" | "implementation" | "ops";
title: string;
description: string;
hypothesis?: string; // Format: "If X, then Y because Z"
state: "pending" | "in_progress" | "done" | "superseded";
createdAt: string;
completedAt?: string;
}interface Learning {
id: string;
insight: string;
implications: string[];
category:
| "data"
| "model"
| "evaluation"
| "infrastructure"
| "domain"
| "process";
impact: "high" | "medium" | "low";
confidence: "proven" | "likely" | "speculative";
sourceStory?: string;
sourceExperiment?: string;
wandbRunId?: string;
createdAt: string;
}interface ProgressEntry {
iteration: number;
phase: string;
hypothesis: string;
assumptions: string[];
changes: string[];
metrics: {
baseline: Record<string, number>;
result: Record<string, number>;
};
decision: "keep" | "revert" | "investigate";
evidence: {
wandbArtifacts?: string[];
logs?: string[];
commits?: string[];
};
timestamp: string;
}interface Research {
id: string;
type:
| "paper"
| "documentation"
| "tutorial"
| "stackoverflow"
| "blog"
| "repo";
title: string;
url?: string;
summary: string;
keyTakeaways: string[];
codeSnippets?: string[];
relatedStories: string[];
createdAt: string;
}interface TrainingJob {
id: string;
pid: number;
command: string;
logPath: string;
wandbRunId?: string;
wandbUrl?: string;
status: "running" | "completed" | "failed" | "stopped";
startedAt: string;
endedAt?: string;
}| File | Format | Purpose |
|---|---|---|
config.json |
JSON | Project configuration |
prd.json |
JSON | Current PRD (the contract) |
ralph.json |
JSON | Current execution state (phase, iteration, stats) |
backlog.json |
JSON | Queue of hypotheses to test |
learnings.jsonl |
JSONL | Extracted insights (one per line) |
research.jsonl |
JSONL | Research findings |
progress.jsonl |
JSONL | Iteration logs (thinking log) |
chat.jsonl |
JSONL | Conversation history |
inbox.json |
JSON | User commands (hint, pause, redirect, resume) |
runs/active.json |
JSON | Currently running training jobs |
runs/history.jsonl |
JSONL | Completed training jobs |
chat/prd-session.jsonl |
JSONL | PRD chat session history |
- Initialize:
ml-ralph initcreates.ml-ralph/directory with config - Plan: Chat with Claude Code in Planning Mode to refine your PRD
- Execute: Agent autonomously works through stories, runs experiments
- Monitor: Watch execution in Monitor Mode, see metrics and learnings
- Learn: Accumulated insights inform next iteration's decisions
Ralph handles training jobs that outlast individual iterations:
- Detach processes via
nohup,setsid, ortmux - Track via
outputs/logs/active_runs.jsonwith PID, log path, W&B URL - Next iterations enter "monitoring mode" to observe and decide
- Jobs survive agent exit/iteration boundaries
- Spawned as subprocess with
--output-format stream-json - Restricted tools:
Bash,Read,Write,Edit,Glob,Grep - Real-time streaming of tool invocations and responses
- Fetch experiment metrics
- Display training curves
- Link runs to stories and learnings
- Open dashboard via keyboard shortcut
When Ralph runs experiments, it defaults to:
uv- Package managementruff- Linting & formattingwandb- Experiment trackingpydantic- Data modelsloguru- Loggingtyper- CLI building
src/ui/app.tsx- Main application entrysrc/ui/screens/planning.tsx- Planning mode screensrc/application/orchestrator/orchestrator.ts- Agent orchestrationpackage.json- Dependencies and scripts
ml_ralph/cli.py- CLI entry pointsml_ralph/runner.py- Agent execution looptemplates/RALPH.md- Full agent instructionstemplates/CLAUDE.md- Project instructions template