Skip to content

GH-49697: [C++][CI] Check IPC file body bounds are in sync with decoder outcome#49698

Merged
pitrou merged 1 commit intoapache:mainfrom
pitrou:gh49697-ipc-file-validation
Apr 13, 2026
Merged

GH-49697: [C++][CI] Check IPC file body bounds are in sync with decoder outcome#49698
pitrou merged 1 commit intoapache:mainfrom
pitrou:gh49697-ipc-file-validation

Conversation

@pitrou
Copy link
Copy Markdown
Member

@pitrou pitrou commented Apr 9, 2026

Rationale for this change

When we read an IPC message from an IPC file, we validate its associated body size against the amount required by the streaming decoder. However, we're currently only checking that the body size is large enough, not that it's exactly as expected.

An invalid IPC file might advertise in its footer a metaDataLength that's larger than the actual serialized Flatbuffers payload. In that case, the associated body would start before the offset computed from the IPC file footer.

What changes are included in this PR?

  1. Strengthen body size check against expected decoder read, to ensure that the metadata length advertised in the IPC file footer is consistent with the actual size of the Flatbuffers-serialized metadata.
  2. Refactor RecordBatch IPC loading to reduce code duplication.
  3. (as a consequence of item 2 above) Fix latent bug where the IPC file reader did not apply the ensure_alignment option to buffers read from IPC.

Are these changes tested?

By additional fuzz regression file.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the awaiting review Awaiting review label Apr 9, 2026
@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Apr 9, 2026

@github-actions crossbow submit -g cpp

@github-actions

This comment was marked as outdated.

@pitrou pitrou force-pushed the gh49697-ipc-file-validation branch from 44711e3 to 08d81ea Compare April 13, 2026 10:17
@pitrou pitrou force-pushed the gh49697-ipc-file-validation branch 2 times, most recently from ad0aab4 to 3b2d101 Compare April 13, 2026 11:32
@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Apr 13, 2026

@github-actions crossbow submit -g cpp

@pitrou pitrou marked this pull request as ready for review April 13, 2026 12:45
@github-actions
Copy link
Copy Markdown

Revision: 3b2d101

Submitted crossbow builds: ursacomputing/crossbow @ actions-265034e4bd

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-debian-13-cpp-amd64 GitHub Actions
test-debian-13-cpp-i386 GitHub Actions
test-debian-experimental-cpp-gcc-15 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Apr 13, 2026

@lidavidm Could you perhaps take a look?

@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Apr 13, 2026

@raulcd This may be a good last-minute fix for 24.0.0.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Apr 13, 2026
@pitrou pitrou force-pushed the gh49697-ipc-file-validation branch from 3b2d101 to c2daf38 Compare April 13, 2026 13:47
@pitrou pitrou merged commit 982797c into apache:main Apr 13, 2026
56 checks passed
@pitrou pitrou removed the awaiting merge Awaiting merge label Apr 13, 2026
@pitrou pitrou deleted the gh49697-ipc-file-validation branch April 13, 2026 14:45
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 982797c.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

raulcd pushed a commit that referenced this pull request Apr 14, 2026
…er outcome (#49698)

### Rationale for this change

When we read an IPC message from an IPC file, we validate its associated body size against the amount required by the streaming decoder. However, we're currently only checking that the body size is large enough, not that it's exactly as expected.

An invalid IPC file might advertise in its footer a `metaDataLength` that's larger than the actual serialized Flatbuffers payload. In that case, the associated body would start before the offset computed from the IPC file footer.

### What changes are included in this PR?

1. Strengthen body size check against expected decoder read, to ensure that the metadata length advertised in the IPC file footer is consistent with the actual size of the Flatbuffers-serialized metadata.
2. Refactor RecordBatch IPC loading to reduce code duplication.
3. (as a consequence of item 2 above) Fix latent bug where the IPC file reader did not apply the `ensure_alignment` option to buffers read from IPC.

### Are these changes tested?

By additional fuzz regression file.

### Are there any user-facing changes?

No.

* GitHub Issue: #49697

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants