You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD --exp_name=run1 --resume
173
162
```
174
163
164
+
Logs and checkpoints go to `experiment/<config_name>/` and `experiment/<config_name>/log/<exp_name>.log`. Redirect to a log file if desired, e.g. `2>&1 | tee experiment/ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD/log/run1.log`.
165
+
175
166
### Training Outputs
176
167
177
168
```
@@ -194,7 +185,7 @@ experiment/ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD/ # or ADVANTAGE_TORCH_PI06_FLATTE
194
185
195
186
**Goal**: Use the trained Advantage Estimator to label new/unseen datasets with predicted advantage values.
196
187
197
-
**Script**: `annotation/eval.sh` (calls `annotation/eval.py`, which uses `annotation/evaluator.py`)
uv run python stage_advantage/annotation/eval.py Flatten-Fold KAI0 /path/to/dataset
226
+
227
+
# PI06 (single-timestep)
228
+
uv run python stage_advantage/annotation/eval.py Flatten-Fold PI06 /path/to/dataset
233
229
```
234
230
231
+
`<model_type>` is a key in `eval.py`’s `MODELS_CONFIG_MAP` (e.g. `Flatten-Fold`); `<model_name>` is `PI06` or `KAI0`; `<repo_id>` is the path to the LeRobot dataset. Results are written under `<repo_id>/data_<model_name>_<ckpt_steps>/`.
232
+
235
233
### Evaluation Outputs
236
234
237
235
Results are saved alongside the original data directory:
@@ -306,12 +304,9 @@ stage_advantage/
306
304
├── annotation/ # Stages 0–2: labeling & estimator training
This directory handles**Stage 0** (GT data labeling), **Stage 1** (advantage estimator training), and **Stage 2** (advantage estimation on new data).
3
+
This directory contains**Stage 0** (GT labeling with `gt_label.py` / `gt_labeling.sh`), **Stage 1** (advantage estimator training via `scripts/train_pytorch.py`), and **Stage 2** (advantage estimation on new data via `eval.py`). All commands below assume you are at the **repository root** unless noted. Full pipeline and options are in the [parent README](../README.md).
4
4
5
5
### Quick Start
6
6
7
7
```bash
8
8
# Step 1: Label a dataset with advantage-based task_index (GT labels from progress)
9
-
bash gt_labeling.sh
9
+
# Edit DATA_PATH in gt_labeling.sh, then from repo root:
Train a policy on **advantage-labeled** data so that the prompt conditions the policy on the advantage bin (e.g. high vs low advantage). This is implemented by setting **`prompt_from_task=True`** in the data config: each sample’s `task_index` is mapped to a prompt string via `meta/tasks.jsonl`, and that prompt is fed to the policy as language conditioning.
3
+
Train a policy on **advantage-labeled** data so that the prompt conditions the policy on the advantage bin (e.g. high vs low advantage). This is implemented by setting **`prompt_from_task=True`** in the data config: each sample’s `task_index` is mapped to a prompt string via `meta/tasks.jsonl`, and that prompt is fed to the policy as language conditioning. Full pipeline (Stage 0 → 1 → 2 → 0 → 3) is in the [parent README](../README.md).
4
4
5
5
## Configs
6
6
@@ -16,11 +16,11 @@ Each uses `base_config=DataConfig(prompt_from_task=True)` so that the dataset’
16
16
17
17
## Prerequisites
18
18
19
-
1.**Stage 0 + Stage 2**
20
-
Produce an advantage-labeled LeRobot dataset:
21
-
- Run the Advantage Estimator (Stage 2) on your data to get parquets with `absolute_advantage` (and optionally `relative_advantage`).
22
-
- Run `gt_label.py` with `--advantage-source absolute_advantage` (and e.g. `--stage-nums 2` for KAI0) to compute `task_index` and write `meta/tasks.jsonl`.
23
-
- Place that dataset under e.g. `./data/FlattenFold/advantage`(or your chosen path).
19
+
1.**Advantage dataset**
20
+
The data must have `task_index` in each parquet and `meta/tasks.jsonl` (prompt strings per `task_index`). To build it:
21
+
- Run **Stage 2** (eval) on your dataset → get `data_PI06_100000/` or `data_KAI0_100000/` with advantage columns.
22
+
- Run **Stage 0** on that output: `gt_label.py--advantage-source absolute_advantage` (or `gt_labeling.sh` with `DATA_PATH` = the eval repo). The resulting directory (with `data/`, `meta/tasks.jsonl`, `videos/`) is your advantage dataset.
23
+
- Place or link it at e.g. `./data/FlattenFold/advantage`and set `repo_id` in config to that path.
24
24
25
25
2.**Config paths**
26
26
In `src/openpi/training/config.py`, for the AWBC config(s) you use:
0 commit comments