Skip to content

Commit 4a37759

Browse files
committed
feat: add autoresearch files
1 parent c575091 commit 4a37759

12 files changed

Lines changed: 3776 additions & 0 deletions

autoresearch/Dockerfile

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
FROM nvidia/cuda:12.8.0-devel-ubuntu22.04
2+
3+
ENV DEBIAN_FRONTEND=noninteractive
4+
ENV PYTHONUNBUFFERED=1
5+
6+
# System dependencies
7+
RUN apt-get update && apt-get install -y \
8+
python3 python3-dev python3-venv python3-pip \
9+
git curl wget \
10+
&& rm -rf /var/lib/apt/lists/* \
11+
&& ln -sf /usr/bin/python3 /usr/bin/python
12+
13+
# Install uv for fast package management
14+
RUN curl -LsSf https://astral.sh/uv/install.sh | sh
15+
ENV PATH="/root/.local/bin:$PATH"
16+
17+
# Install Python dependencies (cached unless these lines change)
18+
RUN uv pip install --system --no-cache-dir \
19+
torch==2.9.1 --index-url https://download.pytorch.org/whl/cu128
20+
21+
RUN uv pip install --system --no-cache-dir \
22+
vllm \
23+
kernels>=0.11.7 \
24+
rustbpe>=0.1.0 \
25+
tiktoken>=0.11.0 \
26+
pyarrow>=21.0.0 \
27+
huggingface-hub \
28+
requests>=2.32.0 \
29+
numpy>=2.2.6
30+
31+
# Copy application code (changes here won't invalidate install layers)
32+
WORKDIR /app
33+
COPY . /app
34+
35+
# Ocean platform runs: python /app/data/transformations/algorithm
36+
RUN mkdir -p /app/data/transformations && ln -sf /app/algo.py /app/data/transformations/algorithm
37+
38+
CMD ["python", "algo.py"]

autoresearch/README.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Autoresearch on Ocean Network
2+
3+
Autonomous ML research agent that iteratively improves a GPT pretraining script to minimize validation bits-per-byte (val_bpb). Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch).
4+
5+
The key difference: everything runs **inside a single Docker container** on an [Ocean](https://dashboard.oncompute.ai/) GPU node (H200, 141GB VRAM) with a **local open-source LLM** — no API keys needed.
6+
7+
## How It Works
8+
9+
1. **Data prep** — Downloads HuggingFace data shards, trains a BPE tokenizer (`prepare.py`)
10+
2. **Load agent LLM** — Qwen3-32B-AWQ via vLLM (~18GB VRAM, stays resident)
11+
3. **Baseline run** — Runs the original `train.py` (5-min training budget), records val_bpb
12+
4. **Agent loop** (up to 200 iterations):
13+
- LLM reads experiment history + current best `train.py`
14+
- Generates a hypothesis + complete new `train.py`
15+
- Syntax check → train (5 min) → evaluate val_bpb
16+
- If improved: keep. If not: revert to best.
17+
- `results.json` saved after every iteration
18+
19+
The user extracts `results["best"]["train_py"]` to get the winning code.
20+
21+
## Files
22+
23+
| File | Description |
24+
|------|-------------|
25+
| `algo.py` | Core agent loop — orchestrates LLM inference and training |
26+
| `train.py` | GPT pretraining script (the file the agent modifies) |
27+
| `prepare.py` | Data download, tokenizer, dataloader, evaluation (read-only) |
28+
| `program.md` | Instructions for the agent LLM |
29+
| `Dockerfile` | Container build (CUDA 12.8, Python, PyTorch, vLLM) |
30+
| `plot_progress.py` | Generate progress charts from results |
31+
32+
## Usage
33+
34+
1. Go to [dashboard.oncompute.ai](https://dashboard.oncompute.ai/)
35+
2. Select an **H200 GPU** environment
36+
3. Configure the job and add payment
37+
4. Open the **Ocean Orchestrator** in VS Code / your editor
38+
5. Open this directory in the orchestrator and run the job — the container builds and executes `algo.py` autonomously
39+
6. Download `results.json` from the outputs when complete
40+
41+
To plot results after a run:
42+
```bash
43+
python plot_progress.py path/to/results.json progress.png
44+
```
45+
46+
## Results
47+
48+
### Qwen3-32B-AWQ — First Run
49+
50+
![Qwen3-32B first run](assets/images/qwen32B_first_run_progress.png)
51+
52+
- **Baseline**: 1.0077 val_bpb
53+
- **Best**: 0.9818 val_bpb (2.6% improvement)
54+
- **201 iterations** over 5.5 hours, 30 successful runs (85% crash rate)
55+
- Key improvements: increased model depth (8→10 layers), late-stage hyperparameter tuning

0 commit comments

Comments
 (0)