Skip to content

Silent data-correctness bug: readinto() / readinto1() on stream_reader return tell() == 0 after successful reads #295

@devdanzin

Description

@devdanzin

Summary

ZstdCompressionReader.readinto(buf) and .readinto1(buf) correctly copy compressed bytes into the caller's buffer, but the reader's internal bytesCompressed counter is not updated. As a result, stream_reader.tell() returns 0 after any number of successful readinto() calls, even though bytes were in fact written. The equivalent .read() path correctly advances the counter.

Impact

  • Severity: Silent data-correctness bug — no crash, no exception, just a wrong value from tell(). Any caller that relies on tell() to measure progress, compute offsets, or compare positions will silently malfunction.
  • Reachability: Standard io.RawIOBase idioms — any use of readinto() / readinto1() on a stream_reader. Common in performance-sensitive decode pipelines that reuse a pre-allocated buffer.
  • Version: 0.25.0 (commit 7a77a75).
  • Platform: Platform-independent.

Reproducer

import zstandard, io

data = b'hello world ' * 10000

# read() — tell() works correctly
comp1 = zstandard.ZstdCompressor()
r1 = comp1.stream_reader(io.BytesIO(data))
r1.__enter__()
while r1.read(1024):
    pass
print("read     tell:", r1.tell())       # 29 (correct: total compressed bytes)
r1.__exit__(None, None, None)

# readinto() — tell() stuck at 0
comp2 = zstandard.ZstdCompressor()
r2 = comp2.stream_reader(io.BytesIO(data))
r2.__enter__()
buf = bytearray(1024)
while r2.readinto(buf):
    pass
print("readinto tell:", r2.tell())       # 0 — BUG (should match read() path)
r2.__exit__(None, None, None)

Root cause

readinto / readinto1 build a ZSTD_outBuffer on the stack that wraps the caller's buffer:

ZSTD_outBuffer output = {dest, dest_size, 0};
zresult = ZSTD_compressStream2(cctx, &output, &input, ZSTD_e_continue);

After the call, output.pos holds the number of bytes written to dest. The reader then updates its position by reading from self->output.pos — the persistent struct, which the local-struct call never touched. So self->output.pos stays at zero, and bytesCompressed never advances.

The read() path uses self->output directly (not a local copy), so the persistent field is updated by ZSTD_compressStream2. That's why tell() works after read() but not after readinto.

Affected sites

  • c-ext/compressionreader.creadinto and readinto1 method bodies.

(Same pattern may warrant a look on the decompression side as well, although the main analysis only flagged compression.)

Suggested fix

Two options; either is minimal.

Option A — advance from the local struct

Read bytesCompressed from the local output.pos before it goes out of scope:

ZSTD_outBuffer output = {dest, dest_size, 0};
zresult = ZSTD_compressStream2(cctx, &output, &input, ZSTD_e_continue);
/* ... */
self->bytesCompressed += output.pos;   /* was: effectively + self->output.pos, i.e. 0 */

Option B — share the persistent struct

If you'd rather the two methods share the read() bookkeeping path, use self->output directly instead of a stack-local struct, and update self->output.dst / self->output.size to point at the caller's buffer before the call. Slightly more invasive but avoids duplicating the position-update logic.

Methodology

Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). Reproducer verified live on CPython 3.14.3 debug build — read() path returns tell() == 29 (matches the compressed-output length); readinto() path returns tell() == 0 after the exact same compressed sequence is consumed. Happy to open a PR.

Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.

Full report

Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions