Skip to content

fix(puffin): implement LZ4 footer compression#2438

Open
1fanwang wants to merge 2 commits into
apache:mainfrom
1fanwang:fix/2419-puffin-footer-lz4
Open

fix(puffin): implement LZ4 footer compression#2438
1fanwang wants to merge 2 commits into
apache:mainfrom
1fanwang:fix/2419-puffin-footer-lz4

Conversation

@1fanwang
Copy link
Copy Markdown

@1fanwang 1fanwang commented May 12, 2026

The PuffinWriter exposed a compress_footer: bool flag that set the FooterPayloadCompressed bit in the file footer and recorded LZ4 as the codec, but CompressionCodec::Lz4.compress returned FeatureUnsupported, so calling PuffinWriter::new(_, _, true) always failed at close time:

thread '...' panicked at crates/iceberg/src/puffin/writer.rs:350:
called `Result::unwrap()` on an `Err` value:
FeatureUnsupported => LZ4 compression is not supported currently

The reader side had the same hole: a Puffin file written by another implementation with FooterPayloadCompressed=1 was unreadable.

This change wires LZ4 through CompressionCodec::{compress,decompress} using lz4_flex's frame encoder/decoder, matching the Puffin spec requirement of "LZ4 single compression frame with content size present" by setting FrameInfo::content_size(Some(len)) on the encoder. The reader path is symmetric — FrameDecoder::read_to_end consumes the same frame. lz4_flex is already in the lock file as a transitive dep of parquet, so no new dependency tree is introduced.

Are these changes tested?

Yes:

  • test_compress_empty_footer_lz4_succeeds — direct adaptation of the reproducer in the issue (close() with no blobs, compress_footer=true).
  • test_compress_footer_lz4_round_trips — full encode + decode loop with one blob and non-empty file properties.
  • test_write_lz4_compressed_metric_data — previously asserted the error string; now asserts a two-blob round-trip.
  • test_lz4_compressed_footer_is_decoded (metadata) — verifies the reader honors the FooterPayloadCompressed flag.
  • test_compression_codec_lz4_roundtrip — round-trip on empty and mixed payloads, plus a magic-number check that the frame begins with 0x184D2204.

Closes #2419

1fanwang added 2 commits May 12, 2026 01:21
The PuffinWriter exposed a `compress_footer: bool` flag that set the
FooterPayloadCompressed bit in the file footer and recorded LZ4 as the
codec, but `CompressionCodec::Lz4.compress` returned `FeatureUnsupported`,
so calling `PuffinWriter::new(_, _, true)` always failed at close time:

    thread '...' panicked at crates/iceberg/src/puffin/writer.rs:350:
    called `Result::unwrap()` on an `Err` value:
    FeatureUnsupported => LZ4 compression is not supported currently

The reader side had the same hole: a Puffin file written by another
implementation with FooterPayloadCompressed=1 was unreadable.

Wire LZ4 through `CompressionCodec::{compress,decompress}` using
`lz4_flex`'s frame encoder/decoder, matching the Puffin spec requirement
of "LZ4 single compression frame with content size present" by setting
`FrameInfo::content_size(Some(len))` on the encoder. The reader path is
symmetric — `FrameDecoder::read_to_end` consumes the same frame.

Tests:
- `test_compress_empty_footer_lz4_succeeds` — direct reproducer from
  the issue (close() with no blobs and compress_footer=true).
- `test_compress_footer_lz4_round_trips` — full encode/decode loop with
  one blob and file properties.
- `test_write_lz4_compressed_metric_data` — was previously asserting the
  error string; now asserts round-trip of two blobs.
- `test_lz4_compressed_footer_is_decoded` (metadata) — verifies the
  reader honors the FooterPayloadCompressed flag.
- `test_compression_codec_lz4_roundtrip` — round-trip on empty + mixed
  payloads, and asserts the frame begins with the LZ4 magic 0x184D2204.

Closes apache#2419.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: failed to write compressed puffin footer

1 participant