- 2025.05: We have released ImagineBench, a benchmark for evaluating RL from LLM-imaginary rollouts, with more diverse tasks.
- 2024.09: KALM has been accepted by NeurIPS 2024! Check out our project page for more details.
- Update the
prefixparameter inenvironment.yml - Build Python environment with following command
conda env create -f environment.yml- Download Llama-2-7b-chat-hf from https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
- Move the downloaded Llama-2-7b-chat-hf to path
base_models/llama2-hf-chat-7b - Move the OfflineRL dataset to path
data/${offlinerl_dataset_name}. We provide 2 toy datasets for testing:data/clevr_robot.npyanddata/meta_world.npy - Update the
num_processesparameter to the num of GPUs you want to use inconfig/ds_clevr.yamlandconfig/ds_meta.yaml - Update the paths and
CUDA_VISIBLE_DEVICESinscripts/train_clevr.shandscripts/train_meta.sh - fine-tune the LLM
- CLEVR-Robot
bash scripts/train_clevr.sh- Meta-World
bash scripts/train_meta.sh- Move the fine-tuned LLM to path
finetuned_models/${model_name} - Generate rollouts with the fine-tuned LLM
- CLEVR-Robot
python3 src/clevr_generate.py --model_path ${model_path} --prompt_path ${prompt_path} --output_path ${output_path} --level ${level}- Meta-World
python3 src/meta_generate.py --model_path ${model_path} --output_path ${output_path} --level ${level}- We provide 1 toy instruction prompt dataset for testing(Generation on Meta-World does not need dataset):
data/clevr_rephrase_prompt.npy
python3 src/clevr_generate.py --model_path ${model_path} --prompt_path data/clevr_rephrase_prompt.npy --output_path ${output_path} --level rephrase_level- Meta-World
python3 src/meta_generate.py --model_path ${model_path} --output_path ${output_path} --level rephrase_level- Move the imaginary dataset to path
data/${imaginary_dataset_name} - Train the OfflineRL policy with the OfflineRL dataset and imaginary datast
- CLEVR-Robot
python3 src/clevr_offline_train.py --ds_type ${ds_type} --agent_name ${agent_name} --dataset_path ${dataset_path} --device ${device} --seed ${seed}- Meta-World
python3 src/meta_offline_train.py --ds_type ${ds_type} --agent_name ${agent_name} --dataset_path ${dataset_path} --device ${device} --seed ${seed}- We provide 2 toy offlineRL datasets for testing:
data/clevr_robot.hdf5anddata/meta_world.hdf5
- CLEVR-Robot
python3 src/clevr_offline_train.py --ds_type rephrase_level --agent_name ${agent_name} --dataset_path data/clevr_robot.hdf5 --device ${device} --seed ${seed}- Meta-World
python3 src/meta_offline_train.py --ds_type rephrase_level --agent_name ${agent_name} --dataset_path data/meta_world.hdf5 --device ${device} --seed ${seed}- Train an OfflineRL policy as described in the OfflineRL Training section
- Test the OfflineRL policy
- CLEVR-Robot
- Download the test dataset from url: https://box.nju.edu.cn/f/3eb76652b51b449d8617/?dl=1
- Unzip the folder and move all the files to the
datadirectory - Run the following script:
python3 src/clevr_offline_test.py --agent_name ${agent_name} --model_path ${model_path} --device ${device} --seed ${seed}
- Meta-World
python3 src/meta_offline_test.py --agent_name ${agent_name} --model_path ${model_path} --device ${device} --seed ${seed}If you find KALM useful in your research, please consider citing the following paper:
@inproceedings{pang2024kalm,
title={KALM: Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts},
author={Jing-Cheng Pang and Si-Hang Yang and Kaiyuan Li and Xiong-Hui Chen and Nan Tang and Yang Yu},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2024}
}