Summary
Several streaming paths call PyBytes_AsStringAndSize(result, &readBuffer, &readSize) on the return value of a caller-supplied source.read() and do not check the return value. When read() returns a non-bytes object (e.g., str, None, bytearray), readBuffer and readSize are left with their prior/uninitialized contents; the next ZSTD_compressStream2 call reads from those addresses. Observed: SEGV on release builds, _Py_CheckFunctionResult abort on debug builds.
Impact
- Severity: SEGV on release builds; assertion abort on debug builds.
- Reachability: Any caller-supplied
source.read() whose return is not bytes. A trivial wrapper around a text file or a mistakenly-returned bytearray triggers it.
- Version: 0.25.0 (commit
7a77a75).
- Platform: Confirmed Linux x86_64 / CPython 3.14 debug; bug is platform-independent.
Reproducers
SEGV on release — via copy_stream:
import zstandard, io
class BadSource:
def read(self, size):
return 'not bytes' # str, not bytes
comp = zstandard.ZstdCompressor()
comp.copy_stream(BadSource(), io.BytesIO())
# Segmentation fault
Assertion abort on debug — via iterator:
import zstandard
class BadSource:
def read(self, size):
return 'not bytes'
comp = zstandard.ZstdCompressor()
it = comp.read_to_iter(BadSource())
next(it)
# Fatal Python error: _Py_CheckFunctionResult: a function returned a result with an exception set
# TypeError: expected bytes, str found
Root cause
PyBytes_AsStringAndSize returns -1 and sets a TypeError when its argument is not a bytes object. On failure, the by-address output parameters readBuffer / readSize are not written. zstandard ignores the return code and proceeds to use those addresses, passing them to ZSTD_compressStream2 which reads from whatever happens to be on the stack / in registers.
Affected sites
| File |
Line |
Function |
c-ext/compressor.c |
349 |
copy_stream |
c-ext/decompressor.c |
202 |
decompressor_copy_stream |
c-ext/compressoriterator.c |
98 |
ZstdCompressorIterator_iternext |
c-ext/decompressoriterator.c |
134 |
ZstdDecompressorIterator_iternext |
Plus 2 additional sites reported in the full analysis (in read_compressor_input and the decompressor equivalent).
Suggested fix
Mechanical — add the standard error check after every call:
if (PyBytes_AsStringAndSize(result, &readBuffer, &readSize) < 0) {
Py_DECREF(result);
goto finally;
}
Optionally: tighten the documented API contract on source.read() to specify that the return must be a bytes object (the C code already expects this). Enforcing it at the boundary would be a small additional cleanup.
Methodology
Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). SEGV on release verified at the copy_stream site; assertion abort on debug verified at the iterator site. Four sites confirmed via direct reproducer; two more confirmed via static review. Happy to open a PR — the fix is a ~8-line diff.
Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.
Full report
Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605
Summary
Several streaming paths call
PyBytes_AsStringAndSize(result, &readBuffer, &readSize)on the return value of a caller-suppliedsource.read()and do not check the return value. Whenread()returns a non-bytesobject (e.g.,str,None,bytearray),readBufferandreadSizeare left with their prior/uninitialized contents; the nextZSTD_compressStream2call reads from those addresses. Observed: SEGV on release builds,_Py_CheckFunctionResultabort on debug builds.Impact
source.read()whose return is notbytes. A trivial wrapper around a text file or a mistakenly-returnedbytearraytriggers it.7a77a75).Reproducers
SEGV on release — via
copy_stream:Assertion abort on debug — via iterator:
Root cause
PyBytes_AsStringAndSizereturns-1and sets aTypeErrorwhen its argument is not abytesobject. On failure, the by-address output parametersreadBuffer/readSizeare not written. zstandard ignores the return code and proceeds to use those addresses, passing them toZSTD_compressStream2which reads from whatever happens to be on the stack / in registers.Affected sites
c-ext/compressor.ccopy_streamc-ext/decompressor.cdecompressor_copy_streamc-ext/compressoriterator.cZstdCompressorIterator_iternextc-ext/decompressoriterator.cZstdDecompressorIterator_iternextPlus 2 additional sites reported in the full analysis (in
read_compressor_inputand the decompressor equivalent).Suggested fix
Mechanical — add the standard error check after every call:
Optionally: tighten the documented API contract on
source.read()to specify that the return must be abytesobject (the C code already expects this). Enforcing it at the boundary would be a small additional cleanup.Methodology
Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). SEGV on release verified at the
copy_streamsite; assertion abort on debug verified at the iterator site. Four sites confirmed via direct reproducer; two more confirmed via static review. Happy to open a PR — the fix is a ~8-line diff.Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.
Full report
Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605