OpenDriveLab
diff --git a/‎.vscode/settings.json‎
Lines changed: 0 additions & 11 deletions b/‎.vscode/settings.json‎
Lines changed: 0 additions & 11 deletions
diff --git a/‎stage_advantage/README.md‎
Lines changed: 36 additions & 41 deletions b/‎stage_advantage/README.md‎
Lines changed: 36 additions & 41 deletions
diff --git a/‎stage_advantage/annotation/README.md‎
Lines changed: 17 additions & 15 deletions b/‎stage_advantage/annotation/README.md‎
Lines changed: 17 additions & 15 deletions
diff --git a/‎stage_advantage/annotation/eval.sh‎
Lines changed: 0 additions & 70 deletions b/‎stage_advantage/annotation/eval.sh‎
Lines changed: 0 additions & 70 deletions
diff --git a/‎stage_advantage/annotation/gt_labeling.sh‎
Lines changed: 2 additions & 2 deletions b/‎stage_advantage/annotation/gt_labeling.sh‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎stage_advantage/annotation/train_estimator.sh‎
Lines changed: 0 additions & 71 deletions b/‎stage_advantage/annotation/train_estimator.sh‎
Lines changed: 0 additions & 71 deletions
diff --git a/‎stage_advantage/awbc/README.md‎
Lines changed: 6 additions & 6 deletions b/‎stage_advantage/awbc/README.md‎
Lines changed: 6 additions & 6 deletions
@@ -9,7 +9,7 @@ This module implements a pipeline for training an **Advantage Estimator** and us
  │  Stage 0: GT Labeling (annotation/gt_labeling.sh + gt_label.py)         │
  │  Compute advantage (from progress or from Stage 2 output) → task_index   │
  ├──────────────────────────────────────────────────────────────────────────┤
- │  Stage 1: Train Advantage Estimator (annotation/train_estimator.sh)     │
+ │  Stage 1: Train Advantage Estimator (scripts/train_pytorch.py)           │
  │  Fine-tune pi0 model to predict advantage from observations              │
  ├──────────────────────────────────────────────────────────────────────────┤
  │  Stage 2: Advantage Estimation on New Data (annotation/eval.py)          │
@@ -103,8 +103,6 @@ See `gt_labeling.sh` for batch labeling examples across multiple dataset variant
 
 **Goal**: Fine-tune a pi0-based model to predict advantage values from observations (images + state), producing a learned Advantage Estimator.
 
-**Script**: `annotation/train_estimator.sh`
-
 **Configs**: `ADVANTAGE_TORCH_PI06_FLATTEN_FOLD` or `ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD` (defined in `src/openpi/training/config.py`)
 
 ### How it works
@@ -143,35 +141,28 @@ TrainConfig(
 
 ### Usage
 
+From the **repository root**:
+
 ```bash
 # Single GPU (KAI0 or PI06)
-RUNNAME=ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD RUNTIME=run1 \
-    bash stage_advantage/annotation/train_estimator.sh
-RUNNAME=ADVANTAGE_TORCH_PI06_FLATTEN_FOLD RUNTIME=run1 \
-    bash stage_advantage/annotation/train_estimator.sh
-
-# Multi-GPU (8 GPUs on a single node)
-RUNNAME=ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD RUNTIME=run1 NPROC_PER_NODE=8 \
-    bash stage_advantage/annotation/train_estimator.sh
-
-# Multi-Node (2 nodes x 8 GPUs)
-# On node 0 (master):
-RUNNAME=ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD RUNTIME=run1 \
-    WORLD_SIZE=2 RANK=0 NPROC_PER_NODE=8 \
-    MASTER_ADDR=<master_ip> MASTER_PORT=12345 \
-    bash stage_advantage/annotation/train_estimator.sh
-
-# On node 1:
-RUNNAME=ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD RUNTIME=run1 \
-    WORLD_SIZE=2 RANK=1 NPROC_PER_NODE=8 \
-    MASTER_ADDR=<master_ip> MASTER_PORT=12345 \
-    bash stage_advantage/annotation/train_estimator.sh
-
-# Resume from a previous checkpoint
-RUNNAME=ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD RUNTIME=run1 RESUME=1 \
-    bash stage_advantage/annotation/train_estimator.sh
+uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
+uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_PI06_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
+
+# Multi-GPU (e.g. 8 GPUs on one node)
+uv run torchrun --standalone --nproc_per_node=8 scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD \
+    --exp_name=run1 --save_interval 10000
+
+# Multi-node (e.g. 2 nodes × 8 GPUs): on master node set WORLD_SIZE=2, RANK=0, MASTER_ADDR, MASTER_PORT;
+# on worker set RANK=1, then:
+uv run torchrun --nnodes=2 --nproc_per_node=8 --node_rank=$RANK --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT \
+    scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
+
+# Resume from latest checkpoint
+uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD --exp_name=run1 --resume
 ```
 
+Logs and checkpoints go to `experiment/<config_name>/` and `experiment/<config_name>/log/<exp_name>.log`. Redirect to a log file if desired, e.g. `2>&1 | tee experiment/ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD/log/run1.log`.
+
 ### Training Outputs
 
 ```
@@ -194,7 +185,7 @@ experiment/ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD/   # or ADVANTAGE_TORCH_PI06_FLATTE
 
 **Goal**: Use the trained Advantage Estimator to label new/unseen datasets with predicted advantage values.
 
-**Script**: `annotation/eval.sh` (calls `annotation/eval.py`, which uses `annotation/evaluator.py`)
+**Script**: `annotation/eval.py` (uses `annotation/evaluator.py`)
 
 ### How it works
 
@@ -221,17 +212,24 @@ experiment/ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD/   # or ADVANTAGE_TORCH_PI06_FLATTE
 
 ### Usage
 
+From the **repository root** (or ensure Python can import the project and paths are correct):
+
 ```bash
-# Evaluate using the Flatten-Fold KAI0 model on a dataset
-bash stage_advantage/annotation/eval.sh Flatten-Fold KAI0 /path/to/dataset
+uv run python stage_advantage/annotation/eval.py <model_type> <model_name> <repo_id>
+```
 
-# Evaluate using the PI06 model
-bash stage_advantage/annotation/eval.sh Flatten-Fold PI06 /path/to/dataset
+Examples:
 
-# Or call eval.py directly
-python stage_advantage/annotation/eval.py Flatten-Fold KAI0 /path/to/dataset
+```bash
+# KAI0 (two-timestep) on a dataset
+uv run python stage_advantage/annotation/eval.py Flatten-Fold KAI0 /path/to/dataset
+
+# PI06 (single-timestep)
+uv run python stage_advantage/annotation/eval.py Flatten-Fold PI06 /path/to/dataset
 ```
 
+`<model_type>` is a key in `eval.py`’s `MODELS_CONFIG_MAP` (e.g. `Flatten-Fold`); `<model_name>` is `PI06` or `KAI0`; `<repo_id>` is the path to the LeRobot dataset. Results are written under `<repo_id>/data_<model_name>_<ckpt_steps>/`.
+
 ### Evaluation Outputs
 
 Results are saved alongside the original data directory:
@@ -306,12 +304,9 @@ stage_advantage/
 ├── annotation/                        # Stages 0–2: labeling & estimator training
 │   ├── README.md
 │   ├── gt_label.py                    # Core labeling script (progress → advantage → task_index)
-│   ├── gt_labeling.sh                 # Batch labeling for PI06 / KAI0 variants
-│   ├── train_estimator.sh             # Training script for the Advantage Estimator
+│   ├── gt_labeling.sh                 # Batch labeling for PI06 / KAI0 variants (only .sh kept here)
 │   ├── eval.py                        # Evaluate trained estimator on datasets
-│   ├── eval.sh                        # Shell wrapper for eval.py
 │   └── evaluator.py                   # SimpleValueEvaluator: batched GPU inference
-└── awbc/                              # Stage 3: AWBC (see Usage above; optional train_awbc.sh)
-    ├── README.md
-    └── train_awbc.sh
+└── awbc/                              # Stage 3: AWBC (commands in README)
+    └── README.md
 ```
@@ -1,34 +1,36 @@
-## Annotation: GT Data Labeling, Advantage Estimator Training & Evaluation
+## Annotation: Stage 0–2 (Labeling, Estimator Training, Eval)
 
-This directory handles **Stage 0** (GT data labeling), **Stage 1** (advantage estimator training), and **Stage 2** (advantage estimation on new data).
+This directory contains **Stage 0** (GT labeling with `gt_label.py` / `gt_labeling.sh`), **Stage 1** (advantage estimator training via `scripts/train_pytorch.py`), and **Stage 2** (advantage estimation on new data via `eval.py`). All commands below assume you are at the **repository root** unless noted. Full pipeline and options are in the [parent README](../README.md).
 
 ### Quick Start
 
 ```bash
 # Step 1: Label a dataset with advantage-based task_index (GT labels from progress)
-bash gt_labeling.sh
+# Edit DATA_PATH in gt_labeling.sh, then from repo root:
+bash stage_advantage/annotation/gt_labeling.sh
 
-# Step 2: Train the Advantage Estimator (update config.py paths first!)
-# Use ADVANTAGE_TORCH_PI06_FLATTEN_FOLD or ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD
-RUNNAME=ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD RUNTIME=run1 bash train_estimator.sh
+# Step 2: Train the Advantage Estimator (update config.py repo_id / pytorch_weight_path first)
+# From repo root:
+uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
+# Or: uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_PI06_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
 
 # Step 3: Evaluate the trained estimator on new data (PI06 or KAI0)
-bash eval.sh Flatten-Fold KAI0 /path/to/dataset
+# From repo root:
+uv run python stage_advantage/annotation/eval.py Flatten-Fold KAI0 /path/to/dataset
 
 # Step 4: Use the advantage-labeled data for AWBC (Stage 3)
-# Run gt_label.py with --advantage-source absolute_advantage on the Stage 2 output,
-# then: XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi05_flatten_fold_awbc --exp_name=run1
+# After Stage 2, run gt_labeling.sh with DATA_PATH = eval repo (or gt_label.py --advantage-source absolute_advantage).
+# Then from repo root:
+XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi05_flatten_fold_awbc --exp_name=run1
 ```
 
 ### File Descriptions
 
 | File | Stage | Description |
 |---|---|---|
-| `gt_label.py` | 0 | Core script: computes advantage from progress and assigns `task_index` to parquet frames |
-| `gt_labeling.sh` | 0 | Batch labeling script: prepares dataset directories and runs `gt_label.py` |
-| `train_estimator.sh` | 1 | Launches PyTorch training of the Advantage Estimator (single/multi-GPU) |
+| `gt_label.py` | 0 | Core script: computes advantage from progress/absolute_advantage and assigns `task_index` to parquet frames |
+| `gt_labeling.sh` | 0 | Batch labeling: prepares dataset dirs and runs `gt_label.py` (only .sh in this dir) |
 | `eval.py` | 2 | Evaluates a trained estimator on a dataset, writing predicted advantages to new parquets |
-| `eval.sh` | 2 | Shell wrapper for `eval.py` with environment setup |
-| `evaluator.py` | 2 | `SimpleValueEvaluator` class: batched GPU inference with parallel video loading and prefetching |
+| `evaluator.py` | 2 | `SimpleValueEvaluator`: batched GPU inference with parallel video loading and prefetching |
 
-See the [parent README](../README.md) for the full pipeline overview.
+For Stage 0 parameters, Stage 1 config fields, Stage 2 `MODELS_CONFIG_MAP`, and end-to-end AWBC order, see the [parent README](../README.md).
@@ -67,6 +67,6 @@ echo "  All datasets labeled successfully!"
 echo ""
 echo "  Output directory: ${dir_name}"
 echo ""
-echo "  Next step: update the training config in config.py with"
-echo "  the target dataset path, then run train_estimator.sh"
+echo "  Next step: set repo_id in config.py to the target dataset path,"
+echo "  then run: uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_* --exp_name=run1 --save_interval 10000"
 echo "============================================================"
@@ -1,6 +1,6 @@
 # Stage 3: AWBC (Advantage-Weighted Behavior Cloning)
 
-Train a policy on **advantage-labeled** data so that the prompt conditions the policy on the advantage bin (e.g. high vs low advantage). This is implemented by setting **`prompt_from_task=True`** in the data config: each sample’s `task_index` is mapped to a prompt string via `meta/tasks.jsonl`, and that prompt is fed to the policy as language conditioning.
+Train a policy on **advantage-labeled** data so that the prompt conditions the policy on the advantage bin (e.g. high vs low advantage). This is implemented by setting **`prompt_from_task=True`** in the data config: each sample’s `task_index` is mapped to a prompt string via `meta/tasks.jsonl`, and that prompt is fed to the policy as language conditioning. Full pipeline (Stage 0 → 1 → 2 → 0 → 3) is in the [parent README](../README.md).
 
 ## Configs
 
@@ -16,11 +16,11 @@ Each uses `base_config=DataConfig(prompt_from_task=True)` so that the dataset’
 
 ## Prerequisites
 
-1. **Stage 0 + Stage 2**  
-   Produce an advantage-labeled LeRobot dataset:
-   - Run the Advantage Estimator (Stage 2) on your data to get parquets with `absolute_advantage` (and optionally `relative_advantage`).
-   - Run `gt_label.py` with `--advantage-source absolute_advantage` (and e.g. `--stage-nums 2` for KAI0) to compute `task_index` and write `meta/tasks.jsonl`.
-   - Place that dataset under e.g. `./data/FlattenFold/advantage` (or your chosen path).
+1. **Advantage dataset**  
+   The data must have `task_index` in each parquet and `meta/tasks.jsonl` (prompt strings per `task_index`). To build it:
+   - Run **Stage 2** (eval) on your dataset → get `data_PI06_100000/` or `data_KAI0_100000/` with advantage columns.
+   - Run **Stage 0** on that output: `gt_label.py --advantage-source absolute_advantage` (or `gt_labeling.sh` with `DATA_PATH` = the eval repo). The resulting directory (with `data/`, `meta/tasks.jsonl`, `videos/`) is your advantage dataset.
+   - Place or link it at e.g. `./data/FlattenFold/advantage` and set `repo_id` in config to that path.
 
 2. **Config paths**  
    In `src/openpi/training/config.py`, for the AWBC config(s) you use: