KALM: Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

News

2025.05: We have released ImagineBench, a benchmark for evaluating RL from LLM-imaginary rollouts, with more diverse tasks.
2024.09: KALM has been accepted by NeurIPS 2024! Check out our project page for more details.

Python Environment Configuration

Update the prefix parameter in environment.yml
Build Python environment with following command

conda env create -f environment.yml

How to Run

LLM Grounding

Download Llama-2-7b-chat-hf from https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
Move the downloaded Llama-2-7b-chat-hf to path base_models/llama2-hf-chat-7b
Move the OfflineRL dataset to path data/${offlinerl_dataset_name}. We provide 2 toy datasets for testing: data/clevr_robot.npy and data/meta_world.npy
Update the num_processes parameter to the num of GPUs you want to use in config/ds_clevr.yaml and config/ds_meta.yaml
Update the paths and CUDA_VISIBLE_DEVICES in scripts/train_clevr.sh and scripts/train_meta.sh
fine-tune the LLM

CLEVR-Robot

bash scripts/train_clevr.sh

Meta-World

bash scripts/train_meta.sh

Rollout Generation

Move the fine-tuned LLM to path finetuned_models/${model_name}
Generate rollouts with the fine-tuned LLM

CLEVR-Robot

python3 src/clevr_generate.py --model_path ${model_path} --prompt_path ${prompt_path} --output_path ${output_path} --level ${level}

Meta-World

python3 src/meta_generate.py --model_path ${model_path} --output_path ${output_path} --level ${level}

We provide 1 toy instruction prompt dataset for testing(Generation on Meta-World does not need dataset): data/clevr_rephrase_prompt.npy

python3 src/clevr_generate.py --model_path ${model_path} --prompt_path data/clevr_rephrase_prompt.npy --output_path ${output_path} --level rephrase_level

Meta-World

python3 src/meta_generate.py --model_path ${model_path} --output_path ${output_path} --level rephrase_level

OfflineRL Training

Move the imaginary dataset to path data/${imaginary_dataset_name}
Train the OfflineRL policy with the OfflineRL dataset and imaginary datast

CLEVR-Robot

python3 src/clevr_offline_train.py --ds_type ${ds_type} --agent_name ${agent_name} --dataset_path ${dataset_path} --device ${device} --seed ${seed}

Meta-World

python3 src/meta_offline_train.py --ds_type ${ds_type} --agent_name ${agent_name} --dataset_path ${dataset_path} --device ${device} --seed ${seed}

We provide 2 toy offlineRL datasets for testing: data/clevr_robot.hdf5 and data/meta_world.hdf5

CLEVR-Robot

python3 src/clevr_offline_train.py --ds_type rephrase_level --agent_name ${agent_name} --dataset_path data/clevr_robot.hdf5 --device ${device} --seed ${seed}

Meta-World

python3 src/meta_offline_train.py --ds_type rephrase_level --agent_name ${agent_name} --dataset_path data/meta_world.hdf5 --device ${device} --seed ${seed}

OfflineRL Testing

Train an OfflineRL policy as described in the OfflineRL Training section
Test the OfflineRL policy

CLEVR-Robot
1. Download the test dataset from url: https://box.nju.edu.cn/f/3eb76652b51b449d8617/?dl=1
2. Unzip the folder and move all the files to the data directory
3. Run the following script:
```
python3 src/clevr_offline_test.py --agent_name ${agent_name} --model_path ${model_path} --device ${device} --seed ${seed}
```
Meta-World

python3 src/meta_offline_test.py --agent_name ${agent_name} --model_path ${model_path} --device ${device} --seed ${seed}

Citations

If you find KALM useful in your research, please consider citing the following paper:

@inproceedings{pang2024kalm,
  title={KALM: Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts},
  author={Jing-Cheng Pang and Si-Hang Yang and Kaiyuan Li and Xiong-Hui Chen and Nan Tang and Yang Yu},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
configs		configs
data		data
scripts		scripts
src		src
README.md		README.md
environment.yml		environment.yml
framework.png		framework.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KALM: Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

News

Python Environment Configuration

How to Run

LLM Grounding

Rollout Generation

OfflineRL Training

OfflineRL Testing

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KALM: Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

News

Python Environment Configuration

How to Run

LLM Grounding

Rollout Generation

OfflineRL Training

OfflineRL Testing

Citations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages