@@ -10,6 +10,9 @@ The goal is not to claim a universal winner, but to document the current observe
1010- total storage footprint
1111- sensitivity to query shape
1212
13+ The latest width-1 single-value figures below come from a fresh run on a Mac mini with an M4 Pro CPU
14+ and 24 GB of RAM.
15+
1316
1417## Benchmark Setup
1518
@@ -56,7 +59,7 @@ There are two different DuckDB query shapes that matter a lot:
5659- single-value form:
5760 - ` id = value `
5861
59- For Blosc2, switching between a collapsed width-1 range and ` == ` makes almost no practical difference.
62+ For Blosc2, switching between a collapsed width-1 range and ` == ` makes only a small difference in practice .
6063
6164For DuckDB, this difference is very important:
6265
@@ -167,22 +170,24 @@ python index_query_bench.py \
167170Observed results:
168171
169172- ` light `
170- - cold lookup: ` 1.463 ms`
171- - warm lookup: ` 1.286 ms`
173+ - cold lookup: ` 0.841 ms`
174+ - warm lookup: ` 0.184 ms`
172175- ` medium `
173- - cold lookup: ` 1.089 ms`
174- - warm lookup: ` 0.986 ms `
176+ - cold lookup: ` 0.564 ms`
177+ - warm lookup: ` 0.168 ms `
175178- ` full `
176- - cold lookup: ` 0.618 ms `
177- - warm lookup: ` 0.544 ms `
179+ - cold lookup: ` 0.554 ms `
180+ - warm lookup: ` 0.167 ms `
178181
179182### Interpretation
180183
181- With the generic range form, Blosc2 is much faster than DuckDB:
184+ With the generic width-1 range form, Blosc2 is much faster than DuckDB:
182185
183- - Blosc2 ` light ` is already about ` 9x ` faster than DuckDB ` zonemap `
184- - Blosc2 exact indexes (` medium ` , ` full ` ) are much faster still
186+ - Blosc2 ` light ` is already much faster than DuckDB ` zonemap ` , and comfortably faster than the
187+ generic-range DuckDB ` art-index ` behavior
188+ - Blosc2 ` medium ` and ` full ` are in a different regime on warm hits, at about ` 0.17 ms `
185189- DuckDB ` art-index ` does not show its real point-lookup behavior in this predicate form
190+ - Blosc2 warm reuse changes the picture substantially for repeated lookups
186191
187192
188193## Width-1 Comparison: Single-Value Predicate
@@ -205,13 +210,15 @@ python duckdb_query_bench.py \
205210Observed results:
206211
207212- ` zonemap `
208- - build: ` 1193.665 ms `
209- - filtered lookup: ` 8.646 ms `
213+ - build: ` 509.338 ms `
214+ - cold lookup: ` 4.595 ms `
215+ - warm lookup: ` 2.857 ms `
210216 - DB size: ` 56,111,104 ` bytes
211217- ` art-index `
212- - build: ` 2849.869 ms `
213- - filtered lookup: ` 0.755 ms `
214- - DB size: ` 478,687,232 ` bytes
218+ - build: ` 2000.316 ms `
219+ - cold lookup: ` 0.613 ms `
220+ - warm lookup: ` 0.246 ms `
221+ - DB size: ` 478,425,088 ` bytes
215222
216223### Blosc2
217224
@@ -230,42 +237,42 @@ python index_query_bench.py \
230237Observed results:
231238
232239- ` light `
233- - build: ` 1225.637 ms`
234- - cold lookup: ` 1.290 ms`
235- - warm lookup: ` 2.351 ms`
240+ - build: ` 960.048 ms`
241+ - cold lookup: ` 2.489 ms`
242+ - warm lookup: ` 0.172 ms`
236243 - index sidecars: ` 27,497,393 ` bytes
237244- ` medium `
238- - build: ` 5511.863 ms`
239- - cold lookup: ` 1.081 ms`
240- - warm lookup: ` 0.964 ms `
245+ - build: ` 4745.880 ms`
246+ - cold lookup: ` 2.202 ms`
247+ - warm lookup: ` 0.147 ms `
241248 - index sidecars: ` 37,645,201 ` bytes
242249- ` full `
243- - build: ` 10954.844 ms`
244- - cold lookup: ` 0.603 ms`
245- - warm lookup: ` 0.525 ms `
250+ - build: ` 9539.843 ms`
251+ - cold lookup: ` 1.753 ms`
252+ - warm lookup: ` 0.144 ms `
246253 - index sidecars: ` 29,888,673 ` bytes
247254
248255### Interpretation
249256
250257Once DuckDB is allowed to use the more planner-friendly single-value predicate:
251258
252259- ` art-index ` becomes very fast
253- - ` art-index ` is now faster than Blosc2 ` light `
254- - Blosc2 ` full ` still remains slightly faster than DuckDB ` art-index ` on this measured point-lookup case
260+ - ` art-index ` is clearly faster than Blosc2 on cold point lookups in this run
261+ - Blosc2 is clearly faster on warm repeated point lookups across ` light ` , ` medium ` , and ` full `
255262
256263However, the storage costs are very different:
257264
258- - DuckDB ` art-index ` database size: about ` 478.7 MB `
265+ - DuckDB ` art-index ` database size: about ` 478.4 MB `
259266- DuckDB zonemap baseline size: about ` 56.1 MB `
260- - estimated ART overhead over baseline: about ` 422.6 MB `
267+ - estimated ART overhead over baseline: about ` 422.3 MB `
261268- Blosc2 ` full ` base + index footprint: about ` 31 MB + 29.9 MB = 60.9 MB `
262269
263270So for true point lookups:
264271
265- - DuckDB ` art-index ` is competitive on latency
266- - Blosc2 ` full ` is still faster in the measured run
267- - Blosc2 ` full ` is much smaller overall
268- - DuckDB ` art-index ` is much faster to build than Blosc2 ` full `
272+ - DuckDB ` art-index ` wins on cold point-lookup latency in this measurement
273+ - Blosc2 ` full ` remains much smaller overall
274+ - Blosc2 ` light ` , ` medium ` , and ` full ` all become faster than DuckDB ` art-index ` on warm repeated hits
275+ - DuckDB ` art-index ` still has a very large storage premium over both Blosc2 ` light ` and ` full `
269276
270277
271278## Blosc2 Light vs DuckDB Zonemap
@@ -280,7 +287,8 @@ Main observations:
280287 - Blosc2 base + ` light ` : about ` 58 MB `
281288- Blosc2 ` light ` lookup speed is much better
282289 - width ` 50 ` : about ` 6.25 ms ` vs ` 13.33 ms `
283- - width ` 1 ` : about ` 1.3-1.5 ms ` vs ` 8.6-12.6 ms `
290+ - width ` 1 ` range: about ` 0.18 ms ` warm vs ` 12.61 ms ` generic-range DuckDB
291+ - width ` 1 ` equality: about ` 0.17 ms ` warm vs ` 2.94 ms ` DuckDB zonemap warm
284292
285293Conclusion:
286294
@@ -295,20 +303,21 @@ This is the most relevant exact-index comparison.
295303Main observations:
296304
297305- point-lookup latency
298- - DuckDB ` art-index ` : ` 0.755 ms `
299- - Blosc2 ` full ` : ` 0.603 ms` cold, ` 0.525 ms ` warm
306+ - DuckDB ` art-index ` : ` 0.613 ms ` cold, ` 0.245 ms ` warm
307+ - Blosc2 ` full ` : ` 1.753 ms` cold, ` 0.144 ms ` warm
300308- build time
301- - DuckDB ` art-index ` : ` 2849.869 ms`
302- - Blosc2 ` full ` : ` 10954.844 ms`
309+ - DuckDB ` art-index ` : ` 2000.316 ms`
310+ - Blosc2 ` full ` : ` 9539.843 ms`
303311- footprint
304- - DuckDB ` art-index ` DB: about ` 478.7 MB `
312+ - DuckDB ` art-index ` DB: about ` 478.4 MB `
305313 - Blosc2 ` full ` base + index: about ` 60.9 MB `
306314
307315Conclusion:
308316
309- - DuckDB ART wins on build time
310317- Blosc2 ` full ` wins on storage efficiency
311- - Blosc2 ` full ` was slightly faster on the measured point lookup
318+ - DuckDB ` art-index ` wins on cold point-lookup latency
319+ - Warm repeated point lookups favor Blosc2 ` full ` more clearly
320+ - DuckDB ` art-index ` is much faster to build than Blosc2 ` full `
312321- DuckDB ART is much more sensitive to predicate shape
313322
314323
@@ -317,7 +326,7 @@ Conclusion:
317326Observed behavior:
318327
319328- Blosc2:
320- - width-1 range form and ` == ` are nearly equivalent in performance
329+ - width-1 range form and ` == ` are close, with ` == ` giving a small but measurable improvement
321330- DuckDB:
322331 - width-1 range form was much slower than ` id = value `
323332
@@ -343,10 +352,9 @@ Practical implication:
3433521 . Blosc2 ` light ` is very competitive against DuckDB zonemap-like pruning.
3443532 . Blosc2 ` light ` offers much faster selective lookups than DuckDB zonemap at a similar total storage cost.
3453543 . DuckDB ` art-index ` becomes strong only when queries are written as true equality predicates.
346- 4 . Blosc2 ` full ` compares very well against DuckDB ` art-index ` on point lookups:
347- - slightly faster in the measured run
348- - much smaller on disk
349- - slower to build
350- 5 . Query-shape sensitivity is a major difference:
355+ 4 . On true point lookups, DuckDB ` art-index ` wins on cold latency in the current M4 Pro run, but
356+ Blosc2 exact indexes are markedly better on warm repeated lookups.
357+ 5 . Blosc2 exact indexes remain dramatically smaller on disk than DuckDB ` art-index ` .
358+ 6 . Query-shape sensitivity is a major difference:
351359 - small for Blosc2
352360 - large for DuckDB ART
0 commit comments