Antalya 26.3 Backport of #102628 - Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter#1768
Antalya 26.3 Backport of #102628 - Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter#1768mkmkme wants to merge 1 commit into
Conversation
…-null-check Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter
il9ue
left a comment
There was a problem hiding this comment.
LGTM — seems like clean backport ✅
Diff matches upstream #102628. Two fixes in Reader.cpp:
-
Gate
use_filter_in_decoderon!column.need_null_map. The fast path processes all rows throughprocessDefLevelsForInnermostColumnand applies the filter at encoded-value indices, which don't line up 1:1 with row indices when nulls are present. Fall-back to the standard row-range path is correct. -
Inverted
memchr:0→1. ClickHouse convention is1 = NULL, so the old check cleared the null_map whenever any non-null existed — exactly backwards. With all-NULL filtered rows into a non-Nullable column atnull_as_default=0, this silently dropped the map instead of raisingCANNOT_INSERT_NULL_IN_ORDINARY_COLUMN, then crashed downstream on the row-count mismatch.
Test covers all four meaningful cases (nullable output, null_as_default=1, the formerly-crashing path, and a no-nulls control). Correctly gated on input_format_parquet_use_native_reader_v3=1.
CI selection appropriate (Parquet/Iceberg/S3 Export, ASAN kept). Approve once green.
Audit: PR #1768 — Fix
|
| Scope reviewed | decodePrimitiveColumn, readRowsInPage, processDefLevelsForInnermostColumn, expandDataByMask, need_null_map initialization, new test cases |
|---|---|
| Categories failed (pre-fix) | filter-in-decoder/nullable alignment, inverted null-marker, downstream expand invariant |
| Categories passed | concurrency (fields are read-only during decode), need_null_map=false path (statistics-driven, unaffected) |
| Limits | static analysis only; offset-index path (data_pages non-empty) is out of scope — the existing data_pages.empty() guard already disables the optimization there |
PR #1768 CI TriagePR: #1768 - Antalya 26.3 Backport of #102628 - Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter PR Change ScopeChanged files:
This PR is narrowly scoped to a Parquet reader null-check crash fix and an associated stateless regression test. Summary
Evidence from CI Metadata
Root Cause Classification1)
|
Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix LOGICAL_ERROR crash "Unexpected number of rows in column subchunk" in native Parquet V3 reader when reading nullable columns with a WHERE filter (ClickHouse#102628 by @groeneai).
Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: