Antalya 26.3 Backport of #102628 - Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter by mkmkme · Pull Request #1768 · Altinity/ClickHouse

mkmkme · 2026-05-09T06:52:55Z

Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix LOGICAL_ERROR crash "Unexpected number of rows in column subchunk" in native Parquet V3 reader when reading nullable columns with a WHERE filter (ClickHouse#102628 by @groeneai).

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

Regression jobs to run:

…-null-check Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter

github-actions · 2026-05-09T06:54:04Z

Workflow [PR], commit [6d92950]

il9ue

LGTM — seems like clean backport ✅

Diff matches upstream #102628. Two fixes in Reader.cpp:

Gate use_filter_in_decoder on !column.need_null_map. The fast path processes all rows through processDefLevelsForInnermostColumn and applies the filter at encoded-value indices, which don't line up 1:1 with row indices when nulls are present. Fall-back to the standard row-range path is correct.
Inverted memchr: 0 → 1. ClickHouse convention is 1 = NULL, so the old check cleared the null_map whenever any non-null existed — exactly backwards. With all-NULL filtered rows into a non-Nullable column at null_as_default=0, this silently dropped the map instead of raising CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN, then crashed downstream on the row-count mismatch.

Test covers all four meaningful cases (nullable output, null_as_default=1, the formerly-crashing path, and a no-nulls control). Correctly gated on input_format_parquet_use_native_reader_v3=1.

CI selection appropriate (Parquet/Iceberg/S3 Export, ASAN kept). Approve once green.

Selfeer · 2026-05-14T13:55:26Z

Audit: PR #1768 — Fix `LOGICAL_ERROR` in Parquet reader for nullable columns with filter

PR: #1768
Upstream: ClickHouse#102628
Scope: src/Processors/Formats/Impl/Parquet/Reader.cpp (2 independent changes), 2 new test files

Confirmed Defects (pre-fix, both resolved by this PR)

High — Filter-in-decoder misalignment for nullable columns

Anchor: Reader.cpp / decodePrimitiveColumn — use_filter_in_decoder decision block

Trigger: nullable Parquet column (need_null_map=true) + non-empty WHERE filter + page not
dictionary-encoded + no offset index (data_pages is empty)

Root cause:
When use_filter_in_decoder=true, readRowsInPage is called with the full subgroup end row as
end_row_idx, so processDefLevelsForInnermostColumn populates null_map for all rows in the
subgroup — both rows that pass the filter and rows that do not. At the same time, the decoder
receives the row-indexed filter and applies it at encoded-value positions. Because null rows
produce no encoded value, the two indices diverge: the filter is applied at wrong positions, and
null_map.size() ends up as rows_total while column->size() reflects a mismatched number of
decoded values.

expandDataByMask then throws:

LOGICAL_ERROR: Not enough bytes in mask

or, depending on the exact filter pattern, "Too many bytes in mask". If the filter happens to
produce zero decoded values, the size assertion at the end of decodePrimitiveColumn fires instead:

LOGICAL_ERROR: Unexpected number of rows in column subchunk 0 1

Impact: Any SELECT through the native Parquet V3 reader (input_format_parquet_use_native_reader_v3=1) that reads a nullable column with a WHERE filter crashes with LOGICAL_ERROR.

Fix: !column.need_null_map added to the use_filter_in_decoder guard. Falls back to the
standard row-range iteration path, which correctly reads only rows in passing filter ranges and
produces a null_map sized to rows_pass.

High — Inverted `memchr` sentinel in null-value check

Anchor: Reader.cpp / decodePrimitiveColumn, null check block (line ~1420 pre-fix)

Trigger: non-nullable output schema + null_as_default=0 + Parquet column that can contain
nulls (need_null_map=true) + WHERE filter

Root cause:
ClickHouse null_map convention is 1 = NULL, 0 = NOT NULL — established in
processDefLevelsForInnermostColumn:

is_null = def[i] != max_def;
out_null_map->push_back(is_null);   // bool: 1 for null, 0 for not-null

and confirmed by the expand(null_map, /*inverted*/ true) call, where inverted=true means
"insert default where mask is 1".

The old check searched for the wrong byte:

if (memchr(null_map.data(), 0, null_map.size()) != nullptr)   // ← searches for NOT-NULL
    throw Exception(... CANNOT_INSERT_NULL ...);

This produces two wrong outcomes:

False positive: if the filtered result contains any non-null row (byte 0 exists in the
map), the exception is thrown even though there are no actual NULLs in the output — a valid
read is rejected.
Silent pass → LOGICAL_ERROR: if the filter selects only null rows, the map contains all
1s and memchr(..., 0, ...) returns nullptr — no exception is thrown, null_map is
cleared, the column has 0 rows, but rows_pass > 0. The size assertion then fires:
```
LOGICAL_ERROR: Unexpected number of rows in column subchunk 0 1
```

Impact: For non-nullable output with null_as_default=0, valid reads throw CANNOT_INSERT_NULL
incorrectly, and reads where all filtered values are NULL crash with LOGICAL_ERROR instead of
the expected CANNOT_INSERT_NULL.

Fix: Sentinel changed from 0 to 1 — the check now correctly detects NULL markers.

How the Two Bugs Interact

The LOGICAL_ERROR: Unexpected number of rows in column subchunk from the PR title requires both
bugs together:

Bug 1 puts use_filter_in_decoder=true for a nullable column, causing the decoder to be given
a filter — but null rows consume no encoded value, so encoded-value index ≠ row index.
With a filter that selects only null rows, the decoder finds no encoded values to write
(column->size() == 0).
Bug 2's inverted check finds no byte 0 in an all-null map, so no exception is thrown and
null_map is cleared.
expand() is skipped (null_map is null); the final size assertion fires: 0 ≠ rows_pass.

Each bug is also independently harmful — Bug 1 corrupts values or crashes on any nullable+filtered
read; Bug 2 produces false CANNOT_INSERT_NULL or crashes on all-null filtered reads with
non-nullable output.

Coverage Summary

Scope reviewed	`decodePrimitiveColumn`, `readRowsInPage`, `processDefLevelsForInnermostColumn`, `expandDataByMask`, `need_null_map` initialization, new test cases
Categories failed (pre-fix)	filter-in-decoder/nullable alignment, inverted null-marker, downstream `expand` invariant
Categories passed	concurrency (fields are read-only during decode), `need_null_map=false` path (statistics-driven, unaffected)
Limits	static analysis only; offset-index path (`data_pages` non-empty) is out of scope — the existing `data_pages.empty()` guard already disables the optimization there

Selfeer · 2026-05-14T13:58:10Z

PR #1768 CI Triage

PR: #1768 - Antalya 26.3 Backport of #102628 - Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter
CI report: ci_run_report.html
Date: 2026-05-14

PR Change Scope

Changed files:

src/Processors/Formats/Impl/Parquet/Reader.cpp
tests/queries/0_stateless/04099_parquet_nullable_null_check_fix.reference
tests/queries/0_stateless/04099_parquet_nullable_null_check_fix.sh

This PR is narrowly scoped to a Parquet reader null-check crash fix and an associated stateless regression test.

Summary

Category	Count	Checks
PR-caused regression	0	-
Unrelated regression-suite failures	8	Regression `{parquet, s3_export_part, s3_export_partition, swarms}` on both `x86_64` and `aarch64`
Unrelated flaky/perf check failure	1	`Stateless tests (arm_binary, sequential)`
Non-test policy failure	1	`DCO`

Evidence from CI Metadata

CI report New Fails in PR: Nothing to report.
CI report Checks New Fails: one test: 00157_cache_dictionary in Stateless tests (arm_binary, sequential).
Failure set mostly matches recurring regression-suite patterns already seen on nearby PRs in this branch.

Root Cause Classification

1) `Regression* / Parquet / parquet` (x86_64 + aarch64) — Unrelated

Observed signatures:

snapshot drift such as Date -> Date32
snapshot drift such as DateTime -> DateTime64(0)

These are known snapshot/type-behavior mismatches in parquet regression tests. They are not specific to nullable null-check crash handling in Parquet/Reader.cpp.

2) `Regression* / S3Export (partition) / s3_export_partition` (x86_64 + aarch64) — Unrelated

Observed signature:

Code: 344 ... Exporting merge tree partition is experimental ... allow_experimental_export_merge_tree_partition ... (SUPPORT_IS_DISABLED)

This is a regression environment/config flag issue in export-partition tests, unrelated to Parquet reader code.

3) `Regression* / S3Export (part) / s3_export_part` (x86_64 + aarch64) — Unrelated

Observed signatures:

Code: 44 ... storage S3 doesn't support columns with dynamic structure (ILLEGAL_COLUMN) for JSON columns
occasional export consistency mismatch in concurrent alter scenario (part_log vs destination parts)

These failures are in S3 export-part test behavior and do not intersect with the Parquet nullable filter fix.

4) `Regression* / Swarms / swarms` (x86_64 + aarch64) — Unrelated

Observed signatures include:

ExpectTimeoutError: Timeout 600.000s
UNKNOWN_DATABASE
join/assertion mismatches in swarm join scenarios

This is swarms/distributed instability and does not map to changed Parquet reader logic.

5) `Stateless tests (arm_binary, sequential)` — Unrelated flaky/perf

CI report marks one new fail:

00157_cache_dictionary

This test is historically flaky/slow (frequent timeout/perf behavior in prior triage runs) and does not align with PR #1768 changes (Parquet/Reader.cpp + 04099 test only).

6) `DCO` — Unrelated to runtime behavior

DCO is ACTION_REQUIRED, which is policy metadata and not a functional regression.

Verdict

There is no evidence that PR #1768 caused or is related to the observed CI failures.

The failing checks are attributable to existing regression-suite instability/configuration gaps (parquet, s3_export_part, s3_export_partition, swarms) plus one unrelated stateless flaky/perf test (00157_cache_dictionary).

Merge pull request ClickHouse#102628 from groeneai/fix-parquet-reader…

6d92950

…-null-check Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter

mkmkme added antalya backport Backport antalya-26.3 labels May 9, 2026

il9ue approved these changes May 11, 2026

View reviewed changes

Selfeer added the verified Approved for release label May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Antalya 26.3 Backport of #102628 - Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter#1768

Antalya 26.3 Backport of #102628 - Fix LOGICAL_ERROR crash in Parquet reader for nullable columns with filter#1768
mkmkme wants to merge 1 commit into
antalya-26.3from
backports/antalya-26.3/102628

mkmkme commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

il9ue left a comment

Uh oh!

Selfeer commented May 14, 2026

Uh oh!

Selfeer commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mkmkme commented May 9, 2026

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

CI/CD Options

Exclude tests:

Regression jobs to run:

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

il9ue left a comment

Choose a reason for hiding this comment

Uh oh!

Selfeer commented May 14, 2026

Audit: PR #1768 — Fix LOGICAL_ERROR in Parquet reader for nullable columns with filter

Confirmed Defects (pre-fix, both resolved by this PR)

High — Filter-in-decoder misalignment for nullable columns

High — Inverted memchr sentinel in null-value check

How the Two Bugs Interact

Coverage Summary

Uh oh!

Selfeer commented May 14, 2026

PR #1768 CI Triage

PR Change Scope

Summary

Evidence from CI Metadata

Root Cause Classification

1) Regression* / Parquet / parquet (x86_64 + aarch64) — Unrelated

2) Regression* / S3Export (partition) / s3_export_partition (x86_64 + aarch64) — Unrelated

3) Regression* / S3Export (part) / s3_export_part (x86_64 + aarch64) — Unrelated

4) Regression* / Swarms / swarms (x86_64 + aarch64) — Unrelated

5) Stateless tests (arm_binary, sequential) — Unrelated flaky/perf

6) DCO — Unrelated to runtime behavior

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Audit: PR #1768 — Fix `LOGICAL_ERROR` in Parquet reader for nullable columns with filter

High — Inverted `memchr` sentinel in null-value check

1) `Regression* / Parquet / parquet` (x86_64 + aarch64) — Unrelated

2) `Regression* / S3Export (partition) / s3_export_partition` (x86_64 + aarch64) — Unrelated

3) `Regression* / S3Export (part) / s3_export_part` (x86_64 + aarch64) — Unrelated

4) `Regression* / Swarms / swarms` (x86_64 + aarch64) — Unrelated

5) `Stateless tests (arm_binary, sequential)` — Unrelated flaky/perf

6) `DCO` — Unrelated to runtime behavior