Summary
ZstdCompressionReader.readinto(buf) and .readinto1(buf) correctly copy compressed bytes into the caller's buffer, but the reader's internal bytesCompressed counter is not updated. As a result, stream_reader.tell() returns 0 after any number of successful readinto() calls, even though bytes were in fact written. The equivalent .read() path correctly advances the counter.
Impact
- Severity: Silent data-correctness bug — no crash, no exception, just a wrong value from
tell(). Any caller that relies on tell() to measure progress, compute offsets, or compare positions will silently malfunction.
- Reachability: Standard
io.RawIOBase idioms — any use of readinto() / readinto1() on a stream_reader. Common in performance-sensitive decode pipelines that reuse a pre-allocated buffer.
- Version: 0.25.0 (commit
7a77a75).
- Platform: Platform-independent.
Reproducer
import zstandard, io
data = b'hello world ' * 10000
# read() — tell() works correctly
comp1 = zstandard.ZstdCompressor()
r1 = comp1.stream_reader(io.BytesIO(data))
r1.__enter__()
while r1.read(1024):
pass
print("read tell:", r1.tell()) # 29 (correct: total compressed bytes)
r1.__exit__(None, None, None)
# readinto() — tell() stuck at 0
comp2 = zstandard.ZstdCompressor()
r2 = comp2.stream_reader(io.BytesIO(data))
r2.__enter__()
buf = bytearray(1024)
while r2.readinto(buf):
pass
print("readinto tell:", r2.tell()) # 0 — BUG (should match read() path)
r2.__exit__(None, None, None)
Root cause
readinto / readinto1 build a ZSTD_outBuffer on the stack that wraps the caller's buffer:
ZSTD_outBuffer output = {dest, dest_size, 0};
zresult = ZSTD_compressStream2(cctx, &output, &input, ZSTD_e_continue);
After the call, output.pos holds the number of bytes written to dest. The reader then updates its position by reading from self->output.pos — the persistent struct, which the local-struct call never touched. So self->output.pos stays at zero, and bytesCompressed never advances.
The read() path uses self->output directly (not a local copy), so the persistent field is updated by ZSTD_compressStream2. That's why tell() works after read() but not after readinto.
Affected sites
c-ext/compressionreader.c — readinto and readinto1 method bodies.
(Same pattern may warrant a look on the decompression side as well, although the main analysis only flagged compression.)
Suggested fix
Two options; either is minimal.
Option A — advance from the local struct
Read bytesCompressed from the local output.pos before it goes out of scope:
ZSTD_outBuffer output = {dest, dest_size, 0};
zresult = ZSTD_compressStream2(cctx, &output, &input, ZSTD_e_continue);
/* ... */
self->bytesCompressed += output.pos; /* was: effectively + self->output.pos, i.e. 0 */
Option B — share the persistent struct
If you'd rather the two methods share the read() bookkeeping path, use self->output directly instead of a stack-local struct, and update self->output.dst / self->output.size to point at the caller's buffer before the call. Slightly more invasive but avoids duplicating the position-update logic.
Methodology
Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). Reproducer verified live on CPython 3.14.3 debug build — read() path returns tell() == 29 (matches the compressed-output length); readinto() path returns tell() == 0 after the exact same compressed sequence is consumed. Happy to open a PR.
Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.
Full report
Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605
Summary
ZstdCompressionReader.readinto(buf)and.readinto1(buf)correctly copy compressed bytes into the caller's buffer, but the reader's internalbytesCompressedcounter is not updated. As a result,stream_reader.tell()returns0after any number of successfulreadinto()calls, even though bytes were in fact written. The equivalent.read()path correctly advances the counter.Impact
tell(). Any caller that relies ontell()to measure progress, compute offsets, or compare positions will silently malfunction.io.RawIOBaseidioms — any use ofreadinto()/readinto1()on astream_reader. Common in performance-sensitive decode pipelines that reuse a pre-allocated buffer.7a77a75).Reproducer
Root cause
readinto/readinto1build aZSTD_outBufferon the stack that wraps the caller's buffer:After the call,
output.posholds the number of bytes written todest. The reader then updates its position by reading fromself->output.pos— the persistent struct, which the local-struct call never touched. Soself->output.posstays at zero, andbytesCompressednever advances.The
read()path usesself->outputdirectly (not a local copy), so the persistent field is updated byZSTD_compressStream2. That's whytell()works afterread()but not afterreadinto.Affected sites
c-ext/compressionreader.c—readintoandreadinto1method bodies.(Same pattern may warrant a look on the decompression side as well, although the main analysis only flagged compression.)
Suggested fix
Two options; either is minimal.
Option A — advance from the local struct
Read
bytesCompressedfrom the localoutput.posbefore it goes out of scope:Option B — share the persistent struct
If you'd rather the two methods share the
read()bookkeeping path, useself->outputdirectly instead of a stack-local struct, and updateself->output.dst/self->output.sizeto point at the caller's buffer before the call. Slightly more invasive but avoids duplicating the position-update logic.Methodology
Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). Reproducer verified live on CPython 3.14.3 debug build —
read()path returnstell() == 29(matches the compressed-output length);readinto()path returnstell() == 0after the exact same compressed sequence is consumed. Happy to open a PR.Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.
Full report
Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605