Summary
~45 sites across module init, type creation, and the decompression state machine either don't check the return value of an allocating API, or check it in a way that returns NULL without setting an exception. On OOM (or memory pressure in constrained environments), the resulting behavior ranges from SystemError: returned NULL without setting an exception (fatal on debug builds) to dereferencing NULL pointers and crashing.
The most catastrophic of these is Category 1: ZstdError going NULL at module init. Every subsequent PyErr_Format(ZstdError, ...) across ~50 error paths would then pass NULL as the exception class. The failure is silent at import time (see Category 2).
Impact
- Severity: Crashes /
SystemError on OOM. Latent on programs that don't hit OOM during normal operation.
- Reachability: OOM conditions, memory-constrained containers, processes with ulimit, small-VM CI, memory-pressure chaos-testing harnesses.
- Version: 0.25.0 (commit
7a77a75).
- Platform: Platform-independent.
Category 1: ZstdError NULL at module init — catastrophic
If PyErr_NewException("zstandard.ZstdError", ...) fails during module init (OOM), ZstdError stays NULL. Every subsequent PyErr_Format(ZstdError, ...) across ~50 error sites then passes NULL as the exception class → crash. Coupled with Category 2 (failure doesn't propagate out of module init), import appears to succeed and the crash deferred until the first error surfaces.
Site: c-ext/constants.c:28.
Fix:
ZstdError = PyErr_NewException("zstandard.ZstdError", NULL, NULL);
if (!ZstdError) {
return -1; /* propagate out of constants_module_init */
}
Plus every caller of constants_module_init up the chain must check and propagate.
Category 2: zstd_module_init continues after sub-init failures
16 type-init sub-calls in zstd_module_init; if any fails the module still loads. The features set also leaks on early returns from module init.
Fix pattern:
if (compressor_module_init(m) < 0) return NULL;
if (decompressor_module_init(m) < 0) return NULL;
/* ... 14 more ... */
Category 3: 19 PyModule_AddObject without ref-guard
PyModule_AddObject steals the reference on success and leaks it on failure. pythoncapi_compat.h (already included in the codebase) provides PyModule_AddObjectRef which doesn't steal.
Fix:
/* was: */ PyModule_AddObject(m, "X", X);
/* use: */ if (PyModule_AddObjectRef(m, "X", X) < 0) { Py_DECREF(X); goto err; }
Py_DECREF(X); /* AddObjectRef doesn't steal */
For type-registration sites specifically, PyModule_AddType(m, &T) (3.10+) is simpler.
Category 4: 3 unchecked PyType_FromSpec in bufferutil_module_init
On OOM, PyType_FromSpec returns NULL. On Python < 3.9 the code immediately dereferences NULL (->tp_as_buffer); on 3.9+ NULL is passed to PyType_Ready which also crashes.
Sites: c-ext/bufferutil.c:531, 544, 557.
Fix:
BufferWithSegments_Type = (PyTypeObject *)PyType_FromSpec(&BufferWithSegments_Spec);
if (!BufferWithSegments_Type) return -1;
Category 5: Unchecked PyTuple_New in constants_module_init
PyTuple_New(0) returns NULL on OOM; PyTuple_SET_ITEM then writes to NULL.
Site: c-ext/constants.c.
Fix: NULL-check the return before PyTuple_SET_ITEM.
Category 6: 4 PyMem_Malloc / PyMem_Realloc without PyErr_NoMemory
When PyMem_Malloc / PyMem_Realloc returns NULL, the code takes goto finally which returns NULL without setting an exception → SystemError.
Sites: c-ext/decompressor.c:704, 739 (PyMem_Malloc), :800, 839 (PyMem_Realloc) — all inside decompress_content_dict_chain.
Fix:
buf = PyMem_Malloc(size);
if (!buf) { PyErr_NoMemory(); goto finally; }
new_buf = PyMem_Realloc(buf, new_size);
if (!new_buf) { PyErr_NoMemory(); goto finally; }
buf = new_buf;
Category 7: Py_DECREF(NULL) via PyLong_FromSsize_t OOM in copy_stream
PyLong_FromSsize_t(totalRead) can return NULL on OOM. Current code stores NULL into a tuple via PyTuple_SET_ITEM(result, 0, totalRead); the subsequent Py_DECREF(result) traverses the tuple and dereferences NULL. Same pattern in the decompressor copy_stream.
Sites: c-ext/compressor.c:411-418; c-ext/decompressor.c:247-251.
Fix:
PyObject *totalReadObj = PyLong_FromSsize_t(totalRead);
PyObject *totalWriteObj = PyLong_FromSsize_t(totalWrite);
if (!totalReadObj || !totalWriteObj) {
Py_XDECREF(totalReadObj);
Py_XDECREF(totalWriteObj);
goto except;
}
result = PyTuple_New(2);
if (!result) {
Py_DECREF(totalReadObj);
Py_DECREF(totalWriteObj);
goto except;
}
PyTuple_SET_ITEM(result, 0, totalReadObj);
PyTuple_SET_ITEM(result, 1, totalWriteObj);
Category 8: Unchecked PyBytes_FromStringAndSize for unused_data
Silent data loss on OOM.
Site: c-ext/decompressobj.c:97.
Fix: NULL-check the return and propagate PyErr_NoMemory / return NULL with exception set.
Suggested PR shape
Most impactful is Category 1 (it cascades into ~50 other error paths). Categories 2-8 are independent and small. Happy to do either one big hardening PR or one PR per category — probably the big PR reads better because OOM hardening is best evaluated holistically.
Methodology
Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). All 8 categories reviewed statically; faithfully reproducing OOM requires _testcapi.set_nomemory or libfiu-style allocator injection, both of which are feasible but out of scope for the live-reproducer pass that covered this analysis. Happy to open a PR — the fixes are mechanical and collectively ~100 lines of diff.
Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.
Full report
Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605
Summary
~45 sites across module init, type creation, and the decompression state machine either don't check the return value of an allocating API, or check it in a way that returns
NULLwithout setting an exception. On OOM (or memory pressure in constrained environments), the resulting behavior ranges fromSystemError: returned NULL without setting an exception(fatal on debug builds) to dereferencing NULL pointers and crashing.The most catastrophic of these is Category 1:
ZstdErrorgoing NULL at module init. Every subsequentPyErr_Format(ZstdError, ...)across ~50 error paths would then pass NULL as the exception class. The failure is silent at import time (see Category 2).Impact
SystemErroron OOM. Latent on programs that don't hit OOM during normal operation.7a77a75).Category 1:
ZstdErrorNULL at module init — catastrophicIf
PyErr_NewException("zstandard.ZstdError", ...)fails during module init (OOM),ZstdErrorstaysNULL. Every subsequentPyErr_Format(ZstdError, ...)across ~50 error sites then passes NULL as the exception class → crash. Coupled with Category 2 (failure doesn't propagate out of module init), import appears to succeed and the crash deferred until the first error surfaces.Site:
c-ext/constants.c:28.Fix:
Plus every caller of
constants_module_initup the chain must check and propagate.Category 2:
zstd_module_initcontinues after sub-init failures16 type-init sub-calls in
zstd_module_init; if any fails the module still loads. Thefeaturesset also leaks on early returns from module init.Fix pattern:
Category 3: 19
PyModule_AddObjectwithout ref-guardPyModule_AddObjectsteals the reference on success and leaks it on failure.pythoncapi_compat.h(already included in the codebase) providesPyModule_AddObjectRefwhich doesn't steal.Fix:
For type-registration sites specifically,
PyModule_AddType(m, &T)(3.10+) is simpler.Category 4: 3 unchecked
PyType_FromSpecinbufferutil_module_initOn OOM,
PyType_FromSpecreturns NULL. On Python < 3.9 the code immediately dereferences NULL (->tp_as_buffer); on 3.9+ NULL is passed toPyType_Readywhich also crashes.Sites:
c-ext/bufferutil.c:531, 544, 557.Fix:
Category 5: Unchecked
PyTuple_Newinconstants_module_initPyTuple_New(0)returns NULL on OOM;PyTuple_SET_ITEMthen writes to NULL.Site:
c-ext/constants.c.Fix: NULL-check the return before
PyTuple_SET_ITEM.Category 6: 4
PyMem_Malloc/PyMem_ReallocwithoutPyErr_NoMemoryWhen
PyMem_Malloc/PyMem_Reallocreturns NULL, the code takesgoto finallywhich returns NULL without setting an exception →SystemError.Sites:
c-ext/decompressor.c:704, 739(PyMem_Malloc),:800, 839(PyMem_Realloc) — all insidedecompress_content_dict_chain.Fix:
Category 7:
Py_DECREF(NULL)viaPyLong_FromSsize_tOOM incopy_streamPyLong_FromSsize_t(totalRead)can return NULL on OOM. Current code stores NULL into a tuple viaPyTuple_SET_ITEM(result, 0, totalRead); the subsequentPy_DECREF(result)traverses the tuple and dereferences NULL. Same pattern in the decompressorcopy_stream.Sites:
c-ext/compressor.c:411-418;c-ext/decompressor.c:247-251.Fix:
Category 8: Unchecked
PyBytes_FromStringAndSizeforunused_dataSilent data loss on OOM.
Site:
c-ext/decompressobj.c:97.Fix: NULL-check the return and propagate
PyErr_NoMemory/ return NULL with exception set.Suggested PR shape
Most impactful is Category 1 (it cascades into ~50 other error paths). Categories 2-8 are independent and small. Happy to do either one big hardening PR or one PR per category — probably the big PR reads better because OOM hardening is best evaluated holistically.
Methodology
Found via cext-review-toolkit (Tree-sitter-based static analysis with structured naive/informed review passes). All 8 categories reviewed statically; faithfully reproducing OOM requires
_testcapi.set_nomemoryor libfiu-style allocator injection, both of which are feasible but out of scope for the live-reproducer pass that covered this analysis. Happy to open a PR — the fixes are mechanical and collectively ~100 lines of diff.Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.
Full report
Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605