Skip to content

Commit 7ad065e

Browse files
committed
docs: update autoresearch README
1 parent 84c418d commit 7ad065e

1 file changed

Lines changed: 15 additions & 31 deletions

File tree

autoresearch/README.md

Lines changed: 15 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -2,45 +2,26 @@
22

33
Autonomous ML research agent that iteratively improves a GPT pretraining script to minimize validation bits-per-byte (val_bpb). Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch).
44

5-
The key difference: everything runs **inside a single Docker container** on an [Ocean](https://dashboard.oncompute.ai/) GPU node (H200, 141GB VRAM) with a **local open-source LLM** — no API keys needed.
5+
The key difference: everything runs **inside a single Docker container** on [Ocean](https://dashboard.oncompute.ai/) GPU nodes with a **local open-source LLM** — no API keys needed. The current setup uses 2×H200 GPUs: one dedicated to the agent LLM, the other to training.
66

77
## From Karpathy's Experiment to Ocean
88

9-
Karpathy's [autoresearch](https://github.com/karpathy/autoresearch) uses the Claude API to drive an agent loop that iteratively improves a GPT training script. It's a brilliant idea — let an LLM be the researcher — but it requires API keys, costs money per token, and runs on your own machine. Here's how we adapted it to run fully self-contained on Ocean Network:
9+
Karpathy's [autoresearch](https://github.com/karpathy/autoresearch) uses the Claude API to drive the agent loop. We adapted it to run fully self-contained on Ocean Network:
1010

11-
### 1. Replace the API with a local LLM
11+
1. **Local LLM instead of API** — Replaced Claude API calls with **Qwen3.5-27B** served via **vLLM** (unquantized bf16, ~54GB). No API keys, no per-token costs.
12+
2. **Dedicated GPUs** — GPU 0 runs the agent LLM, GPU 1 runs training. Each gets the full 141GB — no memory-sharing complexity. (A single-GPU variant using Qwen3-32B-AWQ is available as `algo_qwen3-32B.py`.)
13+
3. **Single Docker container** — Everything packaged in one container: PyTorch, vLLM, Flash Attention 3, data pipeline. Ocean runs it on remote GPU nodes via a symlink to `/app/data/transformations/algorithm`.
14+
4. **Self-bootstrapping data**`prepare.py` downloads HuggingFace data shards and trains a BPE tokenizer at container startup, so nothing needs to persist between runs.
1215

13-
The original calls Claude via the Anthropic API. We replaced that with **Qwen3-32B-AWQ** served locally through **vLLM**. The AWQ 4-bit quantization brings the model down to ~18GB VRAM, leaving the rest of the H200's 141GB for training. vLLM loads once and stays resident for all 200 iterations — no network calls, no API keys, no per-token costs.
16+
> **Alternative**: You can also use the Claude API (or any LLM API) from inside the container by passing an API key as an environment variable. Stronger model, but adds cost and network dependency.
1417
15-
### 2. Share one GPU between the agent and training
16-
17-
This is the core engineering challenge. The agent LLM and the training run need to coexist on the same GPU. We configure vLLM with `gpu_memory_utilization=0.25` (~35GB for weights + KV cache), leaving ~100GB for PyTorch training. The agent generates code, then training runs as a subprocess — they never compete for memory simultaneously because inference finishes before training starts.
18-
19-
### 3. Package everything in a single Docker container
20-
21-
Ocean's compute-to-data model runs a Docker container on a remote GPU node. We built a container on `nvidia/cuda:12.8.0-devel-ubuntu22.04` that includes PyTorch, vLLM, Flash Attention 3 (via `kernels`), and all dependencies. The entire pipeline — data download, tokenizer training, LLM loading, and the 200-iteration research loop — runs from a single entrypoint (`algo.py`).
22-
23-
### 4. Adapt the data pipeline for container execution
24-
25-
Karpathy's setup assumes a persistent local environment. In a container, nothing persists between runs. `prepare.py` handles this by downloading HuggingFace data shards and training a BPE tokenizer from scratch at container startup, caching everything under `~/.cache/autoresearch/` for the duration of the job.
26-
27-
### 5. Wire into Ocean's orchestrator
28-
29-
Ocean expects the algorithm at `/app/data/transformations/algorithm`. A symlink in the Dockerfile (`ln -sf /app/algo.py /app/data/transformations/algorithm`) bridges this. Results are written to `/data/outputs/results.json` so they're downloadable from the Ocean dashboard when the job completes.
30-
31-
### Alternative: use an API instead of a local LLM
32-
33-
You could also use the Claude API (or any other LLM API) from inside the container — just pass the API key as an environment variable and swap the vLLM calls for Anthropic SDK calls. This frees up the ~35GB reserved for the agent model, giving training the full GPU, and a stronger model like Claude Sonnet would likely produce fewer crashes and smarter changes. The tradeoff is API costs and a dependency on network access.
34-
35-
### The result
36-
37-
A few clicks give you an autonomous ML researcher that runs for hours on an H200 GPU, costs nothing beyond the compute rental, and produces a `results.json` with the full experiment history and winning code.
18+
A few clicks give you an autonomous ML researcher that runs for hours on H200 GPUs, costs nothing beyond the compute rental, and produces a `results.json` with the full experiment history and winning code.
3819

3920
## How It Works
4021

4122
1. **Data prep** — Downloads HuggingFace data shards, trains a BPE tokenizer (`prepare.py`)
42-
2. **Load agent LLM** — Qwen3-32B-AWQ via vLLM (~18GB VRAM, stays resident)
43-
3. **Baseline run** — Runs the original `train.py` (5-min training budget), records val_bpb
23+
2. **Load agent LLM** — Qwen3.5-27B via vLLM on GPU 0 (~54GB VRAM, stays resident)
24+
3. **Baseline run** — Runs the original `train.py` on GPU 1 (5-min training budget), records val_bpb
4425
4. **Agent loop** (up to 200 iterations):
4526
- LLM reads experiment history + current best `train.py`
4627
- Generates a hypothesis + complete new `train.py`
@@ -54,7 +35,8 @@ The user extracts `results["best"]["train_py"]` to get the winning code.
5435

5536
| File | Description |
5637
|------|-------------|
57-
| `algo.py` | Core agent loop — orchestrates LLM inference and training |
38+
| `algo.py` | Core agent loop — orchestrates LLM inference (GPU 0) and training (GPU 1) |
39+
| `algo_qwen3-32B.py` | Previous single-GPU variant using Qwen3-32B-AWQ |
5840
| `train.py` | GPT pretraining script (the file the agent modifies) |
5941
| `prepare.py` | Data download, tokenizer, dataloader, evaluation (read-only) |
6042
| `program.md` | Instructions for the agent LLM |
@@ -64,7 +46,7 @@ The user extracts `results["best"]["train_py"]` to get the winning code.
6446
## Usage
6547

6648
1. Go to [dashboard.oncompute.ai](https://dashboard.oncompute.ai/)
67-
2. Select an **H200 GPU** environment
49+
2. Select a **H200 GPU** environment (or single H200 with `algo_qwen3-32B.py`)
6850
3. Configure the job and add payment
6951
4. Open the **Ocean Orchestrator** in VS Code / your editor
7052
5. Open this directory in the orchestrator and run the job — the container builds and executes `algo.py` autonomously
@@ -77,6 +59,8 @@ python plot_progress.py path/to/results.json progress.png
7759

7860
## Results
7961

62+
All results below are from the single-GPU setup (Qwen3-32B-AWQ on one H200). Results with the 2×H200 / Qwen3.5-27B setup are pending.
63+
8064
### Qwen3-32B-AWQ — 0.7 Temperature (First Run)
8165

8266
![Qwen3-32B first run](assets/images/qwen32B_first_run_progress.png)

0 commit comments

Comments
 (0)