libutil: emit multi-frame zstd for parallel-decodable output#15550
Merged
Mic92 merged 1 commit intoNixOS:masterfrom Apr 13, 2026
Merged
libutil: emit multi-frame zstd for parallel-decodable output#15550Mic92 merged 1 commit intoNixOS:masterfrom
Mic92 merged 1 commit intoNixOS:masterfrom
Conversation
Member
Author
|
Some performance results. Interestingly, this reduces memory usage for the producer: For consuming I couldn't measure any change. |
afd50c5 to
b99f302
Compare
b99f302 to
a9b8ff1
Compare
20cd6c1 to
3fdc0b5
Compare
Member
|
Could we make The default for compression could also be flipped to |
3fdc0b5 to
5411313
Compare
Mic92
commented
Mar 31, 2026
Mic92
commented
Mar 31, 2026
Mic92
commented
Mar 31, 2026
xokdvium
reviewed
Apr 11, 2026
xokdvium
reviewed
Apr 11, 2026
xokdvium
reviewed
Apr 11, 2026
xokdvium
reviewed
Apr 11, 2026
d9b5933 to
661bf44
Compare
Member
Author
6fdcbb7 to
3ec7f1e
Compare
xokdvium
approved these changes
Apr 13, 2026
Contributor
|
I guess it needs some formatting. |
Libarchive's zstd filter always produces a single frame, so decompression of large NARs is stuck on one core. Replace the libarchive zstd compression path with a direct-libzstd sink that cuts a new frame every 16 MiB of uncompressed input, each with an exact pledged size so Frame_Content_Size lands in the header. Frame concatenation is mandatory in RFC 8878 §3.1, so existing nix binaries, libarchive and the zstd CLI decode the result unchanged. Nix still decompresses serially today, but the independent sized frames let a future parallel decoder exploit data already on disk. Per-frame compression uses up to 4 zstd workers (bounded by getMaxCPU()/hardware_concurrency). parallel-compression now defaults to true for zstd; xz keeps its false default. As a side effect peak RSS during compression drops substantially (~600 -> ~100 MiB for a 1 GiB store path) with effectively unchanged ratio.
3ec7f1e to
65ff1e6
Compare
Member
Author
|
Fixed the formatting. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Libarchive's zstd filter produces a single frame regardless of ZSTD_c_nbWorkers, so decompression is stuck on one core. For large NARs (e.g. 11 GiB) that means ~9 seconds of single-threaded zstd while everything else waits.
Replace the libarchive zstd path with a direct-libzstd sink that cuts a new frame every 16 MiB of uncompressed input. Each frame buffers its input and compresses in one shot with an exact ZSTD_CCtx_setPledgedSrcSize, so Frame_Content_Size is written to every frame header.
Frame concatenation is mandatory in RFC 8878 §3.1, so existing nix binaries, libarchive, and the zstd CLI all decode the result unchanged. Nix currently still decompresses serially, but the independent frames with known content sizes mean a future parallel decoder can exploit them without any change to the compressed data already on disk.
When parallel=true, nbWorkers is set from
std::thread::hardware_concurrency() for MT compression within each frame; parallel=false compresses single-threaded but still emits independent frames.
The 16 MiB frame size matches zstd's 8 MiB default window well (minimal ratio loss from not referencing across boundaries) and gives ample frame counts for parallel decode of large NARs.
Motivation
Context
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.