Skip to content

Commit 38bc290

Browse files
committed
Update comparison with DuckDB on a MacMini w/ M4 Pro, 24GB RAM
1 parent 8062d4a commit 38bc290

1 file changed

Lines changed: 54 additions & 46 deletions

File tree

bench/indexing/blosc2-vs-duckdb-indexes.md

Lines changed: 54 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@ The goal is not to claim a universal winner, but to document the current observe
1010
- total storage footprint
1111
- sensitivity to query shape
1212

13+
The latest width-1 single-value figures below come from a fresh run on a Mac mini with an M4 Pro CPU
14+
and 24 GB of RAM.
15+
1316

1417
## Benchmark Setup
1518

@@ -56,7 +59,7 @@ There are two different DuckDB query shapes that matter a lot:
5659
- single-value form:
5760
- `id = value`
5861

59-
For Blosc2, switching between a collapsed width-1 range and `==` makes almost no practical difference.
62+
For Blosc2, switching between a collapsed width-1 range and `==` makes only a small difference in practice.
6063

6164
For DuckDB, this difference is very important:
6265

@@ -167,22 +170,24 @@ python index_query_bench.py \
167170
Observed results:
168171

169172
- `light`
170-
- cold lookup: `1.463 ms`
171-
- warm lookup: `1.286 ms`
173+
- cold lookup: `0.841 ms`
174+
- warm lookup: `0.184 ms`
172175
- `medium`
173-
- cold lookup: `1.089 ms`
174-
- warm lookup: `0.986 ms`
176+
- cold lookup: `0.564 ms`
177+
- warm lookup: `0.168 ms`
175178
- `full`
176-
- cold lookup: `0.618 ms`
177-
- warm lookup: `0.544 ms`
179+
- cold lookup: `0.554 ms`
180+
- warm lookup: `0.167 ms`
178181

179182
### Interpretation
180183

181-
With the generic range form, Blosc2 is much faster than DuckDB:
184+
With the generic width-1 range form, Blosc2 is much faster than DuckDB:
182185

183-
- Blosc2 `light` is already about `9x` faster than DuckDB `zonemap`
184-
- Blosc2 exact indexes (`medium`, `full`) are much faster still
186+
- Blosc2 `light` is already much faster than DuckDB `zonemap`, and comfortably faster than the
187+
generic-range DuckDB `art-index` behavior
188+
- Blosc2 `medium` and `full` are in a different regime on warm hits, at about `0.17 ms`
185189
- DuckDB `art-index` does not show its real point-lookup behavior in this predicate form
190+
- Blosc2 warm reuse changes the picture substantially for repeated lookups
186191

187192

188193
## Width-1 Comparison: Single-Value Predicate
@@ -205,13 +210,15 @@ python duckdb_query_bench.py \
205210
Observed results:
206211

207212
- `zonemap`
208-
- build: `1193.665 ms`
209-
- filtered lookup: `8.646 ms`
213+
- build: `509.338 ms`
214+
- cold lookup: `4.595 ms`
215+
- warm lookup: `2.857 ms`
210216
- DB size: `56,111,104` bytes
211217
- `art-index`
212-
- build: `2849.869 ms`
213-
- filtered lookup: `0.755 ms`
214-
- DB size: `478,687,232` bytes
218+
- build: `2000.316 ms`
219+
- cold lookup: `0.613 ms`
220+
- warm lookup: `0.246 ms`
221+
- DB size: `478,425,088` bytes
215222

216223
### Blosc2
217224

@@ -230,42 +237,42 @@ python index_query_bench.py \
230237
Observed results:
231238

232239
- `light`
233-
- build: `1225.637 ms`
234-
- cold lookup: `1.290 ms`
235-
- warm lookup: `2.351 ms`
240+
- build: `960.048 ms`
241+
- cold lookup: `2.489 ms`
242+
- warm lookup: `0.172 ms`
236243
- index sidecars: `27,497,393` bytes
237244
- `medium`
238-
- build: `5511.863 ms`
239-
- cold lookup: `1.081 ms`
240-
- warm lookup: `0.964 ms`
245+
- build: `4745.880 ms`
246+
- cold lookup: `2.202 ms`
247+
- warm lookup: `0.147 ms`
241248
- index sidecars: `37,645,201` bytes
242249
- `full`
243-
- build: `10954.844 ms`
244-
- cold lookup: `0.603 ms`
245-
- warm lookup: `0.525 ms`
250+
- build: `9539.843 ms`
251+
- cold lookup: `1.753 ms`
252+
- warm lookup: `0.144 ms`
246253
- index sidecars: `29,888,673` bytes
247254

248255
### Interpretation
249256

250257
Once DuckDB is allowed to use the more planner-friendly single-value predicate:
251258

252259
- `art-index` becomes very fast
253-
- `art-index` is now faster than Blosc2 `light`
254-
- Blosc2 `full` still remains slightly faster than DuckDB `art-index` on this measured point-lookup case
260+
- `art-index` is clearly faster than Blosc2 on cold point lookups in this run
261+
- Blosc2 is clearly faster on warm repeated point lookups across `light`, `medium`, and `full`
255262

256263
However, the storage costs are very different:
257264

258-
- DuckDB `art-index` database size: about `478.7 MB`
265+
- DuckDB `art-index` database size: about `478.4 MB`
259266
- DuckDB zonemap baseline size: about `56.1 MB`
260-
- estimated ART overhead over baseline: about `422.6 MB`
267+
- estimated ART overhead over baseline: about `422.3 MB`
261268
- Blosc2 `full` base + index footprint: about `31 MB + 29.9 MB = 60.9 MB`
262269

263270
So for true point lookups:
264271

265-
- DuckDB `art-index` is competitive on latency
266-
- Blosc2 `full` is still faster in the measured run
267-
- Blosc2 `full` is much smaller overall
268-
- DuckDB `art-index` is much faster to build than Blosc2 `full`
272+
- DuckDB `art-index` wins on cold point-lookup latency in this measurement
273+
- Blosc2 `full` remains much smaller overall
274+
- Blosc2 `light`, `medium`, and `full` all become faster than DuckDB `art-index` on warm repeated hits
275+
- DuckDB `art-index` still has a very large storage premium over both Blosc2 `light` and `full`
269276

270277

271278
## Blosc2 Light vs DuckDB Zonemap
@@ -280,7 +287,8 @@ Main observations:
280287
- Blosc2 base + `light`: about `58 MB`
281288
- Blosc2 `light` lookup speed is much better
282289
- width `50`: about `6.25 ms` vs `13.33 ms`
283-
- width `1`: about `1.3-1.5 ms` vs `8.6-12.6 ms`
290+
- width `1` range: about `0.18 ms` warm vs `12.61 ms` generic-range DuckDB
291+
- width `1` equality: about `0.17 ms` warm vs `2.94 ms` DuckDB zonemap warm
284292

285293
Conclusion:
286294

@@ -295,20 +303,21 @@ This is the most relevant exact-index comparison.
295303
Main observations:
296304

297305
- point-lookup latency
298-
- DuckDB `art-index`: `0.755 ms`
299-
- Blosc2 `full`: `0.603 ms` cold, `0.525 ms` warm
306+
- DuckDB `art-index`: `0.613 ms` cold, `0.245 ms` warm
307+
- Blosc2 `full`: `1.753 ms` cold, `0.144 ms` warm
300308
- build time
301-
- DuckDB `art-index`: `2849.869 ms`
302-
- Blosc2 `full`: `10954.844 ms`
309+
- DuckDB `art-index`: `2000.316 ms`
310+
- Blosc2 `full`: `9539.843 ms`
303311
- footprint
304-
- DuckDB `art-index` DB: about `478.7 MB`
312+
- DuckDB `art-index` DB: about `478.4 MB`
305313
- Blosc2 `full` base + index: about `60.9 MB`
306314

307315
Conclusion:
308316

309-
- DuckDB ART wins on build time
310317
- Blosc2 `full` wins on storage efficiency
311-
- Blosc2 `full` was slightly faster on the measured point lookup
318+
- DuckDB `art-index` wins on cold point-lookup latency
319+
- Warm repeated point lookups favor Blosc2 `full` more clearly
320+
- DuckDB `art-index` is much faster to build than Blosc2 `full`
312321
- DuckDB ART is much more sensitive to predicate shape
313322

314323

@@ -317,7 +326,7 @@ Conclusion:
317326
Observed behavior:
318327

319328
- Blosc2:
320-
- width-1 range form and `==` are nearly equivalent in performance
329+
- width-1 range form and `==` are close, with `==` giving a small but measurable improvement
321330
- DuckDB:
322331
- width-1 range form was much slower than `id = value`
323332

@@ -343,10 +352,9 @@ Practical implication:
343352
1. Blosc2 `light` is very competitive against DuckDB zonemap-like pruning.
344353
2. Blosc2 `light` offers much faster selective lookups than DuckDB zonemap at a similar total storage cost.
345354
3. DuckDB `art-index` becomes strong only when queries are written as true equality predicates.
346-
4. Blosc2 `full` compares very well against DuckDB `art-index` on point lookups:
347-
- slightly faster in the measured run
348-
- much smaller on disk
349-
- slower to build
350-
5. Query-shape sensitivity is a major difference:
355+
4. On true point lookups, DuckDB `art-index` wins on cold latency in the current M4 Pro run, but
356+
Blosc2 exact indexes are markedly better on warm repeated lookups.
357+
5. Blosc2 exact indexes remain dramatically smaller on disk than DuckDB `art-index`.
358+
6. Query-shape sensitivity is a major difference:
351359
- small for Blosc2
352360
- large for DuckDB ART

0 commit comments

Comments
 (0)