Skip to content

Commit 37fc9d3

Browse files
committed
Merge branch SeFlow bib into DeFlow_public
2 parents 1bff622 + 2142663 commit 37fc9d3

6 files changed

Lines changed: 119 additions & 7 deletions

File tree

README.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -90,8 +90,8 @@ unzip demo_data.zip -p /home/kin/data/av2
9090

9191
#### Prepare raw data
9292

93-
Checking more information (step for downloading raw data, storage size, #frame etc) in [dataprocess/README.md](dataprocess/README.md). Extract all data to unified h5 format.
94-
[Runtime: Normally need 10 mins finished run following commands totally in my desktop, 45 mins for the cluster I used]
93+
Checking more information (step for downloading raw data, storage size, #frame etc) in [dataprocess/README.md](dataprocess/README.md). Extract all data to unified `.h5` format.
94+
[Runtime: Normally need 45 mins finished run following commands totally in setup mentioned in our paper]
9595
```bash
9696
python dataprocess/extract_av2.py --av2_type sensor --data_mode train --argo_dir /home/kin/data/av2 --output_dir /home/kin/data/av2/preprocess_v2
9797
python dataprocess/extract_av2.py --av2_type sensor --data_mode val --mask_dir /home/kin/data/av2/3d_scene_flow
@@ -185,11 +185,14 @@ https://github.com/KTH-RPL/DeFlow/assets/35365764/9b265d56-06a9-4300-899c-96047a
185185
pages={2105-2111},
186186
doi={10.1109/ICRA57147.2024.10610278}
187187
}
188-
@article{zhang2024seflow,
188+
@inproceedings{zhang2024seflow,
189189
author={Zhang, Qingwen and Yang, Yi and Li, Peizheng and Andersson, Olov and Jensfelt, Patric},
190190
title={{SeFlow}: A Self-Supervised Scene Flow Method in Autonomous Driving},
191-
journal={arXiv preprint arXiv:2407.01702},
192-
year={2024}
191+
booktitle={European Conference on Computer Vision (ECCV)},
192+
year={2024},
193+
pages={353–369},
194+
organization={Springer},
195+
doi={10.1007/978-3-031-73232-4_20},
193196
}
194197
```
195198

assets/cuda/chamfer3D/README.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,76 @@ Chamfer Distance Cal time: 1.814 ms
7575
loss: tensor(0.1710, device='cuda:0', grad_fn=<AddBackward0>)
7676
```
7777

78+
## Mics
79+
80+
81+
### Note for CUDA ChamferDis
82+
83+
主要是 两个月前写的 已经看不懂了;然后问题原因是因为 总是缺0.0003的精度(精度强迫症患者)
84+
然后就以为是自己写错了 后面发现是因为block的这种并行化 线程大小的不同对CUDA的浮点运算会有所不同,所以导致精度差距是有一点的 如果介意的话 可以使用pytorch3d的版本(也就是速度慢4倍左右 从15ms 到 80ms)
85+
86+
这里主要重申一遍 shared memory在这里的用法:
87+
1. 首先我们每个点都会分开走到 `int tid = blockIdx.x * blockDim.x + threadIdx.x;` 也就是全局索引,注意这个每个点都分开了 因为pc0每个点和pc1的临近点 和 其他的pc0点无关
88+
2. 然后走到每个点内部 就是__shared__ 我们首先建立了 pc1的share,但是因为共享内存有限,所以每次只保存THREADS_PER_BLOCK
89+
3. 保存 THREADS_PER_BLOCK 也是每个线程做的 我们在对比距离前 运行了 __syncthreads(); 确保 THREADS_PER_BLOCK 个点的 pc1 已经到了
90+
4. 接着 我们在 `num_elems` 这一部分的数据内进行对比,同步best
91+
5. 最后传给 全局这个点的 `result`
92+
93+
需要注意的是 这种极致的并行化 会对精度产生一定的影响,但是如果你感兴趣 `#define THREADS_PER_BLOCK 256` 可以调整这个,对每个block设置不同的threads 会对精度有影响(当然 影响是 在 gt: 0.1710 但cuda计算会是 0.1711 - 0.1713之间)
94+
95+
以下为chatgpt:
96+
精度差异的原因之一可能是由于在不同的线程块大小下,浮点运算的顺序发生了改变。由于浮点运算是不结合的(即(a + b) + c 可能不等于 a + (b + c)),因此改变运算的顺序可能会导致轻微的结果差异。
97+
98+
这种类型的精度变化在GPU计算中是非常常见的,特别是在使用较大的数据集和进行大量的浮点运算时。要完全消除这种差异是非常困难的,因为即使是非常微小的实现细节变化(例如改变线程块大小、更改循环的结构、甚至是不同的GPU硬件或不同的CUDA版本)都可能导致浮点运算顺序的微小变化。
99+
100+
如果需要确保结果的一致性,可以考虑以下方法:
101+
102+
1. 固定线程块大小:选择一个固定的线程块大小,并始终使用它。
103+
104+
2. 双精度浮点数(Double Precision):使用double类型代替float,可以提高精度,但代价是更高的内存使用和可能的性能下降。
105+
106+
3. 数值稳定的算法:尽量使用数值稳定的算法,尽管这在GPU上实现起来可能比较复杂且效率较低。
107+
108+
4. 减少并行化程度:通过减少并行化程度来减少由于不同线程执行顺序引起的差异,但这通常会牺牲性能。
109+
110+
111+
复制代码部分如下:
112+
```cpp
113+
114+
for (int i = 0; i < pc1_n; i += THREADS_PER_BLOCK) {
115+
// Copy a block of pc1 to shared memory
116+
int pc1_idx = i + threadIdx.x;
117+
if (pc1_idx < pc1_n) {
118+
shared_pc1[threadIdx.x * 3 + 0] = pc1_xyz[pc1_idx * 3 + 0];
119+
shared_pc1[threadIdx.x * 3 + 1] = pc1_xyz[pc1_idx * 3 + 1];
120+
shared_pc1[threadIdx.x * 3 + 2] = pc1_xyz[pc1_idx * 3 + 2];
121+
}
122+
123+
__syncthreads();
124+
125+
// Compute the distance between pc0[tid] and the points in shared_pc1
126+
// NOTE(Qingwen): since after two months I forgot what I did here, I write some notes for future me
127+
// 0. One reason for the difference in precision may be due to the changing order of floating point operations at different thread block sizes.
128+
// But I think it's fine we lose 0.0001 precision for speed up cal time 4x
129+
// 1. since we use shared to store pc1, here Every BLOCK will have new shared_pc1 start from 0
130+
// 2. we use THREADS_PER_BLOCK to loop pc1, so we need to check if the last block is not full
131+
// 3. Based on the CUDA document, the __syncthreads() is not necessary here, but we keep it for safety
132+
// 4. After running once, we go for next block of pc1, and find the best in that batch
133+
134+
int num_elems = min(THREADS_PER_BLOCK, pc1_n - i);
135+
for (int j = 0; j < num_elems; j++) {
136+
float x1 = shared_pc1[j * 3 + 0];
137+
float y1 = shared_pc1[j * 3 + 1];
138+
float z1 = shared_pc1[j * 3 + 2];
139+
float d = (x1 - x0) * (x1 - x0) + (y1 - y0) * (y1 - y0) + (z1 - z0) * (z1 - z0);
140+
if (d < best) {
141+
best = d;
142+
best_i = j + i;
143+
}
144+
}
145+
__syncthreads();
146+
}
147+
```
78148

79149
## Other issues
80150
In cluster when build cuda things, you may occur problem:

dataprocess/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,13 @@ We've updated the process dataset for:
1212
- [x] Waymo: check [here](#waymo-dataset). The process script was involved from [SeFlow](https://github.com/KTH-RPL/SeFlow).
1313
- [ ] nuScenes: done coding, public after review. Will be involved later by another paper.
1414

15+
If you want to use all datasets above, there is a specific process environment in [envprocess.yml](../envprocess.yml) to install all the necessary packages. As Waymo package have different configuration and conflict with the main environment. Setup through the following command:
16+
17+
```bash
18+
conda env create -f envprocess.yml
19+
conda activate dataprocess
20+
```
21+
1522
## Download
1623

1724
### Argoverse 2.0

dataprocess/extract_av2.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,8 @@ def create_group_data(group, pc, gm, pose, flow_0to1=None, flow_valid=None, flow
208208
for file in os.listdir(data_dir / log_id / "sensors/lidar")
209209
if file.endswith('.feather')])
210210

211+
gt_flow_flag = False if not (data_dir / log_id / "annotations.feather").exists() else True
212+
211213
# if n is not None:
212214
# iter_bar = tqdm(zip(timestamps, timestamps[1:]), leave=False,
213215
# total=len(timestamps) - 1, position=n,
@@ -222,7 +224,7 @@ def create_group_data(group, pc, gm, pose, flow_0to1=None, flow_valid=None, flow
222224
if pc0.shape[0] < 256:
223225
print(f'{log_id}/{ts0} has less than 256 points, skip this scenarios. Please check the data if needed.')
224226
break
225-
if cnt == len(timestamps) - 1:
227+
if cnt == len(timestamps) - 1 or not gt_flow_flag:
226228
create_group_data(group, pc0, is_ground_0.astype(np.bool_), pose0.transform_matrix.astype(np.float32))
227229
else:
228230
ts1 = timestamps[cnt + 1]
@@ -269,7 +271,7 @@ def main(
269271
argo_dir: str = "/home/kin/data/av2",
270272
output_dir: str ="/home/kin/data/av2/preprocess",
271273
av2_type: str = "sensor",
272-
data_mode: str = "test",
274+
data_mode: str = "val",
273275
mask_dir: str = "/home/kin/data/av2/3d_scene_flow",
274276
nproc: int = (multiprocessing.cpu_count() - 1),
275277
only_index: bool = False,

environment.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,12 @@ dependencies:
2727
- open3d==0.18.0
2828
- dztimer
2929
- av2==0.2.1
30+
- dufomap==1.0.0
3031

3132
# Reason about the version fixed:
3233
# setuptools==68.5.1: https://github.com/aws-neuron/aws-neuron-sdk/issues/893
3334
# mkl==2024.0.0: https://github.com/pytorch/pytorch/issues/123097#issue-2218541307
3435
# av2==0.2.1: in case other version deleted some functions.
3536
# lightning==2.0.1: https://stackoverflow.com/questions/76647518/how-to-fix-error-cannot-import-name-modelmetaclass-from-pydantic-main
3637
# open3d==0.18.0: because 0.17.0 have bug on set the view json file
38+
# dufomap==1.0.0: in case later updating may not compatible with the code.

envprocess.yaml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: dataprocess
2+
channels:
3+
- conda-forge
4+
- pytorch
5+
dependencies:
6+
- python=3.8
7+
- pytorch::pytorch=2.0.0
8+
- pytorch::torchvision
9+
- numba
10+
- numpy==1.22
11+
- pandas
12+
- pip
13+
- scipy
14+
- tqdm
15+
- scikit-learn
16+
- fire
17+
- pip:
18+
- nuscenes-devkit
19+
- av2==0.2.1
20+
- waymo-open-dataset-tf-2.11.0==1.5.0
21+
- open3d==0.16.0
22+
- linefit
23+
- dztimer
24+
- dufomap==1.0.0
25+
- evalai
26+
27+
# Reason about the version fixed:
28+
# numpy==1.22: package conflicts, need numpy higher or same 1.22

0 commit comments

Comments
 (0)