Skip to content

Commit bf793ff

Browse files
committed
Update README.md
1 parent 5cf2c69 commit bf793ff

1 file changed

Lines changed: 49 additions & 5 deletions

File tree

README.md

Lines changed: 49 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
χ₀ addresses the systematic distributional shift among the human demonstration distribution ($P_\text{train}$), the inductive bias learned by the policy ($Q_\text{model}$), and the test-time execution distribution ($P_\text{test}$) through three technical modules:
2121

2222
- **[Model Arithmetic](#model-arithmetic)**: A weight-space merging strategy that combines models trained on different data subsets, efficiently capturing diverse knowledge without architectural complexity. **[Released]**
23-
- **[Stage Advantage](#stage-advantage-coming-soon)**: A stage-aware advantage estimator that provides stable, dense progress signals for policy training. **[Coming Soon]**
23+
- **[Stage Advantage](#stage-advantage)**: A stage-aware advantage estimator that provides stable, dense progress signals for policy training. **[Released]**
2424
- **[Train-Deploy Alignment](#train-deploy-alignment-coming-soon)**: Bridges the distribution gap via spatio-temporal augmentation, heuristic DAgger corrections, and temporal chunk-wise smoothing. **[Coming Soon]**
2525

2626
χ₀ enables two sets of dual-arm robots to collaboratively orchestrate long-horizon garment manipulation — flattening, folding, and hanging — surpassing the state-of-the-art $\pi_{0.5}$ baseline by approximately 250% in success rate, with `only 20 hours of data and 8 A100 GPUs`.
@@ -46,14 +46,15 @@ https://github.com/user-attachments/assets/3f5f0c48-ff3f-4b9b-985b-59ad0b2ea97c
4646
- [Model Arithmetic](#model-arithmetic)
4747
- [Workflow](#workflow)
4848
- [Quick Start](#quick-start)
49-
- [Stage Advantage (Coming Soon)](#stage-advantage-coming-soon)
49+
- [Stage Advantage](#stage-advantage)
5050
- [Train-Deploy Alignment (Coming Soon)](#train-deploy-alignment-coming-soon)
5151
- [Citation](#licenseandcitation)
5252
- [Troubleshooting](#troubleshooting)
5353
- [Links and Community](#links-and-community)
5454

5555
## Update
5656

57+
- [Feb 14 2026] Release of the **Stage Advantage** module: advantage estimator training, evaluation, GT labeling, and AWBC training pipeline.
5758
- [Feb 10 2026] Initial release of the **Model Arithmetic** module with support for both JAX and PyTorch checkpoints (not tested thoroughly).
5859
- [Feb 10 2026] χ₀ paper released.
5960

@@ -208,7 +209,7 @@ Checkpoints are written to the config’s checkpoint directory. You can then use
208209

209210
- [x] kai0 oracle: training and inference code with non-advantage data of three tasks
210211
- [x] Model Arithmetic: code of different baselines for weight-space interpolation
211-
- [ ] Stage Advantage: code, data (advantage labels), and checkpoints**Feb 15**
212+
- [x] Stage Advantage: code, data (advantage labels), and checkpoints
212213
- [ ] HuggingFace & ModelScope: upload Stage Advantage data and checkpoints — **Feb 14**
213214
- [ ] Train-Deploy Alignment — **Feb 14**
214215

@@ -265,11 +266,54 @@ python model_arithmetic/arithmetic_torch.py \
265266

266267
For gradient-based optimization, dataset splitting, and all other methods, see the full documentation in [`model_arithmetic/README.md`](model_arithmetic/README.md).
267268

268-
## Stage Advantage (Coming Soon)
269+
## Stage Advantage
269270

270271
Stage Advantage decomposes long-horizon tasks into semantic stages and provides stage-aware advantage signals for policy training. It addresses the numerical instability of prior non-stage approaches by computing advantage as progress differentials within each stage, yielding smoother and more stable supervision.
271272

272-
**This module is currently under refinement and will be released soon.**
273+
The full pipeline has four stages:
274+
275+
```
276+
Stage 0: GT Labeling → Stage 1: Train Advantage Estimator → Stage 2: Advantage Estimation → Stage 3: AWBC Training
277+
```
278+
279+
### Quick Start
280+
281+
**Stage 0 — GT Data Labeling**: Compute advantage values and discretize into `task_index` labels.
282+
283+
```bash
284+
cd stage_advantage/annotation
285+
python gt_label.py <dataset_path> \
286+
--threshold 30 --chunk-size 50 --discretion-type binary \
287+
--advantage-source absolute_advantage
288+
```
289+
290+
For batch labeling across multiple dataset variants, see `stage_advantage/annotation/gt_labeling.sh`.
291+
292+
**Stage 1 — Train Advantage Estimator**: Fine-tune a pi0-based model to predict advantage from observations.
293+
294+
```bash
295+
uv run python scripts/train_pytorch.py ADVANTAGE_TORCH_KAI0_FLATTEN_FOLD --exp_name=run1 --save_interval 10000
296+
```
297+
298+
For a ready-to-use script with environment setup (conda/venv activation, DDP configuration) and automatic log management, see `stage_advantage/annotation/train_estimator.sh`.
299+
300+
**Stage 2 — Advantage Estimation on New Data**: Use the trained estimator to label datasets with predicted advantage values.
301+
302+
```bash
303+
uv run python stage_advantage/annotation/eval.py Flatten-Fold KAI0 /path/to/dataset
304+
```
305+
306+
For a ready-to-use script with environment setup and status logging, see `stage_advantage/annotation/eval.sh`.
307+
308+
**Stage 3 — AWBC Training**: Train a policy with Advantage-Weighted Behavior Cloning.
309+
310+
```bash
311+
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run scripts/train.py pi05_flatten_fold_awbc --exp_name=run1
312+
```
313+
314+
For a ready-to-use script with environment setup and automatic log management, see `stage_advantage/awbc/train_awbc.sh`.
315+
316+
For the full pipeline details, configuration instructions, and all parameters, see [`stage_advantage/README.md`](stage_advantage/README.md).
273317

274318
## Train-Deploy Alignment (Coming Soon)
275319

0 commit comments

Comments
 (0)