fix(puffin): implement LZ4 footer compression#2438
Open
1fanwang wants to merge 2 commits into
Open
Conversation
The PuffinWriter exposed a `compress_footer: bool` flag that set the
FooterPayloadCompressed bit in the file footer and recorded LZ4 as the
codec, but `CompressionCodec::Lz4.compress` returned `FeatureUnsupported`,
so calling `PuffinWriter::new(_, _, true)` always failed at close time:
thread '...' panicked at crates/iceberg/src/puffin/writer.rs:350:
called `Result::unwrap()` on an `Err` value:
FeatureUnsupported => LZ4 compression is not supported currently
The reader side had the same hole: a Puffin file written by another
implementation with FooterPayloadCompressed=1 was unreadable.
Wire LZ4 through `CompressionCodec::{compress,decompress}` using
`lz4_flex`'s frame encoder/decoder, matching the Puffin spec requirement
of "LZ4 single compression frame with content size present" by setting
`FrameInfo::content_size(Some(len))` on the encoder. The reader path is
symmetric — `FrameDecoder::read_to_end` consumes the same frame.
Tests:
- `test_compress_empty_footer_lz4_succeeds` — direct reproducer from
the issue (close() with no blobs and compress_footer=true).
- `test_compress_footer_lz4_round_trips` — full encode/decode loop with
one blob and file properties.
- `test_write_lz4_compressed_metric_data` — was previously asserting the
error string; now asserts round-trip of two blobs.
- `test_lz4_compressed_footer_is_decoded` (metadata) — verifies the
reader honors the FooterPayloadCompressed flag.
- `test_compression_codec_lz4_roundtrip` — round-trip on empty + mixed
payloads, and asserts the frame begins with the LZ4 magic 0x184D2204.
Closes apache#2419.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The PuffinWriter exposed a
compress_footer: boolflag that set the FooterPayloadCompressed bit in the file footer and recorded LZ4 as the codec, butCompressionCodec::Lz4.compressreturnedFeatureUnsupported, so callingPuffinWriter::new(_, _, true)always failed at close time:The reader side had the same hole: a Puffin file written by another implementation with FooterPayloadCompressed=1 was unreadable.
This change wires LZ4 through
CompressionCodec::{compress,decompress}usinglz4_flex's frame encoder/decoder, matching the Puffin spec requirement of "LZ4 single compression frame with content size present" by settingFrameInfo::content_size(Some(len))on the encoder. The reader path is symmetric —FrameDecoder::read_to_endconsumes the same frame.lz4_flexis already in the lock file as a transitive dep ofparquet, so no new dependency tree is introduced.Are these changes tested?
Yes:
test_compress_empty_footer_lz4_succeeds— direct adaptation of the reproducer in the issue (close() with no blobs, compress_footer=true).test_compress_footer_lz4_round_trips— full encode + decode loop with one blob and non-empty file properties.test_write_lz4_compressed_metric_data— previously asserted the error string; now asserts a two-blob round-trip.test_lz4_compressed_footer_is_decoded(metadata) — verifies the reader honors the FooterPayloadCompressed flag.test_compression_codec_lz4_roundtrip— round-trip on empty and mixed payloads, plus a magic-number check that the frame begins with0x184D2204.Closes #2419