Skip to content

Commit 6d26bba

Browse files
committed
[DEBUG]: rm shell files and update readme
1 parent b96e65f commit 6d26bba

8 files changed

Lines changed: 61 additions & 287 deletions

File tree

.vscode/settings.json

Lines changed: 0 additions & 11 deletions
This file was deleted.

stage_advantage/README.md

Lines changed: 36 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ This module implements a pipeline for training an **Advantage Estimator** and us
99
│ Stage 0: GT Labeling (annotation/gt_labeling.sh + gt_label.py) │
1010
│ Compute advantage (from progress or from Stage 2 output) → task_index │
1111
├──────────────────────────────────────────────────────────────────────────┤
12-
│ Stage 1: Train Advantage Estimator (annotation/train_estimator.sh)
12+
│ Stage 1: Train Advantage Estimator (scripts/train_pytorch.py)
1313
│ Fine-tune pi0 model to predict advantage from observations │
1414
├──────────────────────────────────────────────────────────────────────────┤
1515
│ Stage 2: Advantage Estimation on New Data (annotation/eval.py) │
@@ -103,8 +103,6 @@ See `gt_labeling.sh` for batch labeling examples across multiple dataset variant
103103

104104
**Goal**: Fine-tune a pi0-based model to predict advantage values from observations (images + state), producing a learned Advantage Estimator.
105105

106-
**Script**: `annotation/train_estimator.sh`
107-
108106
**Configs**: `ADVANTAGE_TORCH_PI06_FLATTEN_FOLD` or `ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD` (defined in `src/openpi/training/config.py`)
109107

110108
### How it works
@@ -143,35 +141,28 @@ TrainConfig(
143141

144142
### Usage
145143

144+
From the **repository root**:
145+
146146
```bash
147147
# Single GPU (KAI0 or PI06)
148-
RUNNAME=ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD RUNTIME=run1 \
149-
bash stage_advantage/annotation/train_estimator.sh
150-
RUNNAME=ADVANTAGE_TORCH_PI06_FLATTEN_FOLD RUNTIME=run1 \
151-
bash stage_advantage/annotation/train_estimator.sh
152-
153-
# Multi-GPU (8 GPUs on a single node)
154-
RUNNAME=ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD RUNTIME=run1 NPROC_PER_NODE=8 \
155-
bash stage_advantage/annotation/train_estimator.sh
156-
157-
# Multi-Node (2 nodes x 8 GPUs)
158-
# On node 0 (master):
159-
RUNNAME=ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD RUNTIME=run1 \
160-
WORLD_SIZE=2 RANK=0 NPROC_PER_NODE=8 \
161-
MASTER_ADDR=<master_ip> MASTER_PORT=12345 \
162-
bash stage_advantage/annotation/train_estimator.sh
163-
164-
# On node 1:
165-
RUNNAME=ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD RUNTIME=run1 \
166-
WORLD_SIZE=2 RANK=1 NPROC_PER_NODE=8 \
167-
MASTER_ADDR=<master_ip> MASTER_PORT=12345 \
168-
bash stage_advantage/annotation/train_estimator.sh
169-
170-
# Resume from a previous checkpoint
171-
RUNNAME=ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD RUNTIME=run1 RESUME=1 \
172-
bash stage_advantage/annotation/train_estimator.sh
148+
uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
149+
uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_PI06_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
150+
151+
# Multi-GPU (e.g. 8 GPUs on one node)
152+
uv run torchrun --standalone --nproc_per_node=8 scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD \
153+
--exp_name=run1 --save_interval 10000
154+
155+
# Multi-node (e.g. 2 nodes × 8 GPUs): on master node set WORLD_SIZE=2, RANK=0, MASTER_ADDR, MASTER_PORT;
156+
# on worker set RANK=1, then:
157+
uv run torchrun --nnodes=2 --nproc_per_node=8 --node_rank=$RANK --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT \
158+
scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
159+
160+
# Resume from latest checkpoint
161+
uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD --exp_name=run1 --resume
173162
```
174163

164+
Logs and checkpoints go to `experiment/<config_name>/` and `experiment/<config_name>/log/<exp_name>.log`. Redirect to a log file if desired, e.g. `2>&1 | tee experiment/ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD/log/run1.log`.
165+
175166
### Training Outputs
176167

177168
```
@@ -194,7 +185,7 @@ experiment/ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD/ # or ADVANTAGE_TORCH_PI06_FLATTE
194185

195186
**Goal**: Use the trained Advantage Estimator to label new/unseen datasets with predicted advantage values.
196187

197-
**Script**: `annotation/eval.sh` (calls `annotation/eval.py`, which uses `annotation/evaluator.py`)
188+
**Script**: `annotation/eval.py` (uses `annotation/evaluator.py`)
198189

199190
### How it works
200191

@@ -221,17 +212,24 @@ experiment/ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD/ # or ADVANTAGE_TORCH_PI06_FLATTE
221212

222213
### Usage
223214

215+
From the **repository root** (or ensure Python can import the project and paths are correct):
216+
224217
```bash
225-
# Evaluate using the Flatten-Fold KAI0 model on a dataset
226-
bash stage_advantage/annotation/eval.sh Flatten-Fold KAI0 /path/to/dataset
218+
uv run python stage_advantage/annotation/eval.py <model_type> <model_name> <repo_id>
219+
```
227220

228-
# Evaluate using the PI06 model
229-
bash stage_advantage/annotation/eval.sh Flatten-Fold PI06 /path/to/dataset
221+
Examples:
230222

231-
# Or call eval.py directly
232-
python stage_advantage/annotation/eval.py Flatten-Fold KAI0 /path/to/dataset
223+
```bash
224+
# KAI0 (two-timestep) on a dataset
225+
uv run python stage_advantage/annotation/eval.py Flatten-Fold KAI0 /path/to/dataset
226+
227+
# PI06 (single-timestep)
228+
uv run python stage_advantage/annotation/eval.py Flatten-Fold PI06 /path/to/dataset
233229
```
234230

231+
`<model_type>` is a key in `eval.py`’s `MODELS_CONFIG_MAP` (e.g. `Flatten-Fold`); `<model_name>` is `PI06` or `KAI0`; `<repo_id>` is the path to the LeRobot dataset. Results are written under `<repo_id>/data_<model_name>_<ckpt_steps>/`.
232+
235233
### Evaluation Outputs
236234

237235
Results are saved alongside the original data directory:
@@ -306,12 +304,9 @@ stage_advantage/
306304
├── annotation/ # Stages 0–2: labeling & estimator training
307305
│ ├── README.md
308306
│ ├── gt_label.py # Core labeling script (progress → advantage → task_index)
309-
│ ├── gt_labeling.sh # Batch labeling for PI06 / KAI0 variants
310-
│ ├── train_estimator.sh # Training script for the Advantage Estimator
307+
│ ├── gt_labeling.sh # Batch labeling for PI06 / KAI0 variants (only .sh kept here)
311308
│ ├── eval.py # Evaluate trained estimator on datasets
312-
│ ├── eval.sh # Shell wrapper for eval.py
313309
│ └── evaluator.py # SimpleValueEvaluator: batched GPU inference
314-
└── awbc/ # Stage 3: AWBC (see Usage above; optional train_awbc.sh)
315-
├── README.md
316-
└── train_awbc.sh
310+
└── awbc/ # Stage 3: AWBC (commands in README)
311+
└── README.md
317312
```
Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,36 @@
1-
## Annotation: GT Data Labeling, Advantage Estimator Training & Evaluation
1+
## Annotation: Stage 0–2 (Labeling, Estimator Training, Eval)
22

3-
This directory handles **Stage 0** (GT data labeling), **Stage 1** (advantage estimator training), and **Stage 2** (advantage estimation on new data).
3+
This directory contains **Stage 0** (GT labeling with `gt_label.py` / `gt_labeling.sh`), **Stage 1** (advantage estimator training via `scripts/train_pytorch.py`), and **Stage 2** (advantage estimation on new data via `eval.py`). All commands below assume you are at the **repository root** unless noted. Full pipeline and options are in the [parent README](../README.md).
44

55
### Quick Start
66

77
```bash
88
# Step 1: Label a dataset with advantage-based task_index (GT labels from progress)
9-
bash gt_labeling.sh
9+
# Edit DATA_PATH in gt_labeling.sh, then from repo root:
10+
bash stage_advantage/annotation/gt_labeling.sh
1011

11-
# Step 2: Train the Advantage Estimator (update config.py paths first!)
12-
# Use ADVANTAGE_TORCH_PI06_FLATTEN_FOLD or ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD
13-
RUNNAME=ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD RUNTIME=run1 bash train_estimator.sh
12+
# Step 2: Train the Advantage Estimator (update config.py repo_id / pytorch_weight_path first)
13+
# From repo root:
14+
uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
15+
# Or: uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_PI06_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
1416

1517
# Step 3: Evaluate the trained estimator on new data (PI06 or KAI0)
16-
bash eval.sh Flatten-Fold KAI0 /path/to/dataset
18+
# From repo root:
19+
uv run python stage_advantage/annotation/eval.py Flatten-Fold KAI0 /path/to/dataset
1720

1821
# Step 4: Use the advantage-labeled data for AWBC (Stage 3)
19-
# Run gt_label.py with --advantage-source absolute_advantage on the Stage 2 output,
20-
# then: XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi05_flatten_fold_awbc --exp_name=run1
22+
# After Stage 2, run gt_labeling.sh with DATA_PATH = eval repo (or gt_label.py --advantage-source absolute_advantage).
23+
# Then from repo root:
24+
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi05_flatten_fold_awbc --exp_name=run1
2125
```
2226

2327
### File Descriptions
2428

2529
| File | Stage | Description |
2630
|---|---|---|
27-
| `gt_label.py` | 0 | Core script: computes advantage from progress and assigns `task_index` to parquet frames |
28-
| `gt_labeling.sh` | 0 | Batch labeling script: prepares dataset directories and runs `gt_label.py` |
29-
| `train_estimator.sh` | 1 | Launches PyTorch training of the Advantage Estimator (single/multi-GPU) |
31+
| `gt_label.py` | 0 | Core script: computes advantage from progress/absolute_advantage and assigns `task_index` to parquet frames |
32+
| `gt_labeling.sh` | 0 | Batch labeling: prepares dataset dirs and runs `gt_label.py` (only .sh in this dir) |
3033
| `eval.py` | 2 | Evaluates a trained estimator on a dataset, writing predicted advantages to new parquets |
31-
| `eval.sh` | 2 | Shell wrapper for `eval.py` with environment setup |
32-
| `evaluator.py` | 2 | `SimpleValueEvaluator` class: batched GPU inference with parallel video loading and prefetching |
34+
| `evaluator.py` | 2 | `SimpleValueEvaluator`: batched GPU inference with parallel video loading and prefetching |
3335

34-
See the [parent README](../README.md) for the full pipeline overview.
36+
For Stage 0 parameters, Stage 1 config fields, Stage 2 `MODELS_CONFIG_MAP`, and end-to-end AWBC order, see the [parent README](../README.md).

stage_advantage/annotation/eval.sh

Lines changed: 0 additions & 70 deletions
This file was deleted.

stage_advantage/annotation/gt_labeling.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,6 @@ echo " All datasets labeled successfully!"
6767
echo ""
6868
echo " Output directory: ${dir_name}"
6969
echo ""
70-
echo " Next step: update the training config in config.py with"
71-
echo " the target dataset path, then run train_estimator.sh"
70+
echo " Next step: set repo_id in config.py to the target dataset path,"
71+
echo " then run: uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_* --exp_name=run1 --save_interval 10000"
7272
echo "============================================================"

stage_advantage/annotation/train_estimator.sh

Lines changed: 0 additions & 71 deletions
This file was deleted.

stage_advantage/awbc/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Stage 3: AWBC (Advantage-Weighted Behavior Cloning)
22

3-
Train a policy on **advantage-labeled** data so that the prompt conditions the policy on the advantage bin (e.g. high vs low advantage). This is implemented by setting **`prompt_from_task=True`** in the data config: each sample’s `task_index` is mapped to a prompt string via `meta/tasks.jsonl`, and that prompt is fed to the policy as language conditioning.
3+
Train a policy on **advantage-labeled** data so that the prompt conditions the policy on the advantage bin (e.g. high vs low advantage). This is implemented by setting **`prompt_from_task=True`** in the data config: each sample’s `task_index` is mapped to a prompt string via `meta/tasks.jsonl`, and that prompt is fed to the policy as language conditioning. Full pipeline (Stage 0 → 1 → 2 → 0 → 3) is in the [parent README](../README.md).
44

55
## Configs
66

@@ -16,11 +16,11 @@ Each uses `base_config=DataConfig(prompt_from_task=True)` so that the dataset’
1616

1717
## Prerequisites
1818

19-
1. **Stage 0 + Stage 2**
20-
Produce an advantage-labeled LeRobot dataset:
21-
- Run the Advantage Estimator (Stage 2) on your data to get parquets with `absolute_advantage` (and optionally `relative_advantage`).
22-
- Run `gt_label.py` with `--advantage-source absolute_advantage` (and e.g. `--stage-nums 2` for KAI0) to compute `task_index` and write `meta/tasks.jsonl`.
23-
- Place that dataset under e.g. `./data/FlattenFold/advantage` (or your chosen path).
19+
1. **Advantage dataset**
20+
The data must have `task_index` in each parquet and `meta/tasks.jsonl` (prompt strings per `task_index`). To build it:
21+
- Run **Stage 2** (eval) on your dataset → get `data_PI06_100000/` or `data_KAI0_100000/` with advantage columns.
22+
- Run **Stage 0** on that output: `gt_label.py --advantage-source absolute_advantage` (or `gt_labeling.sh` with `DATA_PATH` = the eval repo). The resulting directory (with `data/`, `meta/tasks.jsonl`, `videos/`) is your advantage dataset.
23+
- Place or link it at e.g. `./data/FlattenFold/advantage` and set `repo_id` in config to that path.
2424

2525
2. **Config paths**
2626
In `src/openpi/training/config.py`, for the AWBC config(s) you use:

0 commit comments

Comments
 (0)