|
| 1 | +<p align="center"> |
| 2 | + <img src="docs/sys.png" width="900", style="border-radius:10%"> |
| 3 | + <h1 align="center">One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation</h1> |
| 4 | + <h3 align="center"> |
| 5 | + <a href="https://www.kth.se/profile/flbusch?l=en">Finn Lukas Busch</a>, |
| 6 | + <a href="https://www.kth.se/profile/timonh">Timon Homberger</a>, |
| 7 | + <a href="https://www.kth.se/profile/jgop">Jesús Ortega-Peimbert</a>, |
| 8 | + <a href="https://www.kth.se/profile/quantao?l=en">Quantao Yang</a>, |
| 9 | + <a href="https://www.kth.se/profile/olovand" style="white-space: nowrap;"> Olov Andersson</a> |
| 10 | + </h3> |
| 11 | + <p align="center"> |
| 12 | + <a href="https://www.finnbusch.com/OneMap/">Project Website</a> , <a href="https://arxiv.org/pdf/2409.11764">Paper (arXiv)</a> |
| 13 | + </p> |
| 14 | +</p> |
| 15 | + |
| 16 | +This repository contains the code for the paper "One Map to Find Them All: Real-time Open-Vocabulary Mapping for |
| 17 | +Zero-shot Multi-Object Navigation". We provide a [dockerized environment](#setup-docker) to run the code or |
| 18 | +you can [run it locally](#setup-local-without-docker). |
| 19 | + |
| 20 | +In summary we open-source: |
| 21 | +- The OneMap mapping and navigation code |
| 22 | +- The evaluation code for single- and multi-object navigation |
| 23 | +- The multi-object navigation dataset and benchmark |
| 24 | +- The multi-object navigation dataset generation code, such that you can generate your own datasets |
| 25 | + |
| 26 | +## Abstract |
| 27 | +The capability to efficiently search for objects in complex environments is fundamental for many real-world robot |
| 28 | +applications. Recent advances in open-vocabulary vision models have resulted in semantically-informed object navigation \ |
| 29 | +methods that allow a robot to search for an arbitrary object without prior training. However, these |
| 30 | +zero-shot methods have so far treated the environment as unknown for each consecutive query. |
| 31 | +In this paper we introduce a new benchmark for zero-shot multi-object navigation, allowing the robot to leverage |
| 32 | +information gathered from previous searches to more efficiently find new objects. To address this problem we build a |
| 33 | +reusable open-vocabulary feature map tailored for real-time object search. We further propose a probabilistic-semantic |
| 34 | +map update that mitigates common sources of errors in semantic feature extraction and leverage this semantic uncertainty |
| 35 | +for informed multi-object exploration. We evaluate our method on a set of object navigation tasks in both simulation |
| 36 | +as well as with a real robot, running in real-time on a Jetson Orin AGX. We demonstrate that it outperforms existing |
| 37 | +state-of-the-art approaches both on single and multi-object navigation tasks. |
| 38 | +## Setup (Docker) |
| 39 | +### 0. Docker |
| 40 | +You will need to have Docker installed on your system. Follow the [official instructions](https://docs.docker.com/engine/install/ubuntu/) to install. |
| 41 | +You will also need to have the [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) |
| 42 | +installed and configured as docker runtime on your system. |
| 43 | + |
| 44 | +### 1. Clone the repository |
| 45 | +``` |
| 46 | +git clone https://github.com/finnBsch/MON.git |
| 47 | +cd OneMap/ |
| 48 | +``` |
| 49 | +### 2. Build the Docker Image |
| 50 | +The docker image build process will build habitat-sim and download model weights. You can choose to let the container |
| 51 | +download the habitat scenes during build, or if you have them already downloaded, you can set `HM3D=LOCAL` and provide |
| 52 | +the absolute `HM3D_PATH` to the `versioned_data` directory on your machine in the `.env` file in the root of the repository. |
| 53 | + |
| 54 | +If you want the container to download the scenes for you, set `HM3D=FULL` in the `.env` file and provide your |
| 55 | +Matterport credentials. You can get access for Matterport for free [here](https://matterport.com/partners/meta). |
| 56 | +You will not need to provide a `HM3D_PATH` then. |
| 57 | +Having configured the `.env` file, you can build the docker image in the root of the repository with: |
| 58 | +``` |
| 59 | +docker compose build |
| 60 | +``` |
| 61 | +The build will take a while as `habitat-sim` is built from source. You can launch the docker container with: |
| 62 | +``` |
| 63 | +bash run_docker.sh |
| 64 | +``` |
| 65 | +and open a new terminal in the container with: |
| 66 | +``` |
| 67 | +docker exec -it onemap-onemap-1 bash |
| 68 | +``` |
| 69 | +## Setup (Local, without Docker) |
| 70 | + |
| 71 | +### 1. Clone the repository |
| 72 | +``` |
| 73 | +git clone https://github.com/finnBsch/MON.git |
| 74 | +cd OneMap/ |
| 75 | +``` |
| 76 | +### 2. Install dependencies |
| 77 | +``` |
| 78 | +python3 -m pip install gdown torch torchvision torchaudio meson |
| 79 | +python3 -m pip install -r requirements.txt |
| 80 | +``` |
| 81 | +Manually install newer `timm` version: |
| 82 | +``` |
| 83 | +python3 -m pip install --upgrade timm>=1.0.7 |
| 84 | +``` |
| 85 | +YOLOV7: |
| 86 | +``` |
| 87 | +git clone https://github.com/WongKinYiu/yolov7 |
| 88 | +``` |
| 89 | +Build planning utilities: |
| 90 | +``` |
| 91 | +python3 -m pip install ./planning_cpp/ |
| 92 | +``` |
| 93 | +### 3. Download the model weights |
| 94 | +``` |
| 95 | +mkdir -p weights/ |
| 96 | +``` |
| 97 | +SED extracted weights: |
| 98 | +``` |
| 99 | +gdown 1D_RE4lvA-CiwrP75wsL8Iu1a6NrtrP9T -O weights/clip.pth |
| 100 | +``` |
| 101 | +YOLOV7 weights and MobileSAM weights: |
| 102 | +``` |
| 103 | +wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e.pt -O weights/yolov7-e6e.pt |
| 104 | +wget https://github.com/ChaoningZhang/MobileSAM/raw/refs/heads/master/weights/mobile_sam.pt -O weights/mobile_sam.pt |
| 105 | +``` |
| 106 | +### 4. Download the habitat data |
| 107 | + |
| 108 | + |
| 109 | +## Running the code |
| 110 | +### 1. Run the example |
| 111 | +You can run the code on an example, visualized in [rerun.io](https://rerun.io/) with: |
| 112 | +#### Docker |
| 113 | +You will need to have [rerun.io](https://rerun.io/) installed on the host for visualization. |
| 114 | +Ensure the docker is running and you are in the container as described in the [Docker setup](#setup-docker). Then launch |
| 115 | +the rerun viewer with: |
| 116 | +``` |
| 117 | +rerun |
| 118 | +``` |
| 119 | +and launch the example in the container with: |
| 120 | +``` |
| 121 | +python3 habitat_test.py --config/mon/base_conf_sim.yaml |
| 122 | +``` |
| 123 | +#### Local |
| 124 | +Open the rerun viewer and example from the root of the repository with: |
| 125 | +``` |
| 126 | +rerun |
| 127 | +python3 habitat_test.py --config/mon/base_conf_sim.yaml |
| 128 | +``` |
| 129 | +### 2. Run the evaluation |
| 130 | +You can reproduce the evaluation results from the paper for single- and multi-object navigation. |
| 131 | +#### Single-object navigation |
| 132 | +``` |
| 133 | +python3 eval_habitat.py --config config/mon/eval_conf.yaml |
| 134 | +``` |
| 135 | +This will run the evaluation and save the results in the `results/` directory. You can read the results with: |
| 136 | +``` |
| 137 | +python3 read_results.py --config config/mon/eval_conf.yaml |
| 138 | +``` |
| 139 | +#### Multi-object navigation |
| 140 | +``` |
| 141 | +python3 eval_habitat_multi.py --config config/mon/eval_multi_conf.yaml |
| 142 | +``` |
| 143 | +This will run the evaluation and save the results in the `results_multi/` directory. You can read the results with: |
| 144 | +``` |
| 145 | +python3 read_results_multi.py --config config/mon/eval_multi_conf.yaml |
| 146 | +``` |
| 147 | +#### Dataset generation |
| 148 | +While we provide the generated dataset for the evaluation of multi-object navigation, we also release the code to |
| 149 | +generate the datasets with varying parameters. You can generate the dataset with |
| 150 | +``` |
| 151 | +python3 eval/dataset_utils/gen_multiobject_dataset.py |
| 152 | +``` |
| 153 | +and change the parameters such as number of objects per episode in the corresponding file. |
| 154 | + |
| 155 | +## Citation |
| 156 | +If you use this code in your research, please cite our paper: |
| 157 | +``` |
| 158 | +@misc{busch2024mapallrealtimeopenvocabulary, |
| 159 | + title={One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation}, |
| 160 | + author={Finn Lukas Busch and Timon Homberger and Jesús Ortega-Peimbert and Quantao Yang and Olov Andersson}, |
| 161 | + year={2024}, |
| 162 | + eprint={2409.11764}, |
| 163 | + archivePrefix={arXiv}, |
| 164 | + primaryClass={cs.RO}, |
| 165 | + url={https://arxiv.org/abs/2409.11764}, |
| 166 | +} |
| 167 | +``` |
0 commit comments