Summary
The pure-Python CFFI backend has several small-but-real parity gaps with the C extension backend. The most visible is that several ZstdError constructors are called with positional args as if %-formatting were going to happen, but it doesn't — so user-facing error messages look like garbled tuples. Two additional gaps affect resource-lifecycle invariants and parameter validation.
Impact
- Severity: User-visible garbled error messages (main issue); resource-lifecycle invariant violation in
__exit__ (secondary); small parameter-validation drift.
- Reachability: Any user running the CFFI backend — typically PyPy, or environments where the C extension can't be built.
- Version: 0.25.0 (commit
7a77a75).
- Note: Report was produced without a live CFFI environment available; the below is based on static review of
zstandard/backend_cffi.py. Happy to verify with a concrete reproducer if you set up a PyPy / CFFI-only test environment.
Gap 1: Tuple-args to ZstdError — garbled error messages
Pattern: raise ZstdError("... %s", error) passes two positional args to the exception constructor, making .args == ("... %s", error). No %-formatting happens. The rendered message looks like ('..., %s', <something>) instead of the intended interpolated string.
Sites: zstandard/backend_cffi.py:1531, 1575, 1682.
Fix: raise ZstdError("... %s" % error) or an f-string.
Gap 2: __exit__ ordering mismatch with the C backend
C backend's __exit__ calls close() first, then clears the compressor/decompressor field. CFFI backend sets _compressor = None first, then calls close() — which runs with the invariants already broken. Any reference to self._compressor inside close() observes None.
Sites: zstandard/backend_cffi.py:1413, 3161.
Fix: Reorder to match the C backend:
def __exit__(self, exc_type, exc_value, tb):
self.close() # first
self._compressor = None # then clear
return False
Gap 3: Minor parameter-validation drift
- Off-by-one in level-validation error message. CFFI says
"less than 22"; C says "less than 23". One of them has the boundary wrong (if the C backend's boundary is correct — ZSTD_maxCLevel() returns 22 — then both messages should say "less than 23" in the "strictly-less-than" formulation or "more than 22" in the symmetric one).
compressobj(size=-1) is accepted by CFFI but raises OverflowError in C — signed/unsigned mismatch somewhere in the CFFI argument handling.
- CFFI backend does not declare
Py_MOD_GIL_NOT_USED (doesn't apply — it's pure Python), but has no FT-story either; worth either documenting or gating free-threaded wheels to C-backend-only.
Suggested PR shape
All three gaps are one small PR (pure-Python fixes, likely ~20 lines of diff). If you'd prefer to keep C-side and CFFI-side PRs separate I can do that — otherwise one combined parity PR seems cleanest.
Methodology
Found via cext-review-toolkit — the parity-checker agent identifies places where two implementations of the same interface diverge. Gaps 1 and 2 were flagged via structural pattern matching against the C backend; Gap 3 via argument-validation diffing. The CFFI environment wasn't available during the analysis, so none of the above was live-reproduced; verification is recommended but the diffs are small and the patterns are unambiguous on inspection.
Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.
Full report
Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605
Summary
The pure-Python CFFI backend has several small-but-real parity gaps with the C extension backend. The most visible is that several
ZstdErrorconstructors are called with positional args as if%-formatting were going to happen, but it doesn't — so user-facing error messages look like garbled tuples. Two additional gaps affect resource-lifecycle invariants and parameter validation.Impact
__exit__(secondary); small parameter-validation drift.7a77a75).zstandard/backend_cffi.py. Happy to verify with a concrete reproducer if you set up a PyPy / CFFI-only test environment.Gap 1: Tuple-args to
ZstdError— garbled error messagesPattern:
raise ZstdError("... %s", error)passes two positional args to the exception constructor, making.args == ("... %s", error). No%-formatting happens. The rendered message looks like('..., %s', <something>)instead of the intended interpolated string.Sites:
zstandard/backend_cffi.py:1531, 1575, 1682.Fix:
raise ZstdError("... %s" % error)or an f-string.Gap 2:
__exit__ordering mismatch with the C backendC backend's
__exit__callsclose()first, then clears the compressor/decompressor field. CFFI backend sets_compressor = Nonefirst, then callsclose()— which runs with the invariants already broken. Any reference toself._compressorinsideclose()observesNone.Sites:
zstandard/backend_cffi.py:1413, 3161.Fix: Reorder to match the C backend:
Gap 3: Minor parameter-validation drift
"less than 22"; C says"less than 23". One of them has the boundary wrong (if the C backend's boundary is correct —ZSTD_maxCLevel()returns 22 — then both messages should say"less than 23"in the "strictly-less-than" formulation or"more than 22"in the symmetric one).compressobj(size=-1)is accepted by CFFI but raisesOverflowErrorin C — signed/unsigned mismatch somewhere in the CFFI argument handling.Py_MOD_GIL_NOT_USED(doesn't apply — it's pure Python), but has no FT-story either; worth either documenting or gating free-threaded wheels to C-backend-only.Suggested PR shape
All three gaps are one small PR (pure-Python fixes, likely ~20 lines of diff). If you'd prefer to keep C-side and CFFI-side PRs separate I can do that — otherwise one combined parity PR seems cleanest.
Methodology
Found via cext-review-toolkit — the parity-checker agent identifies places where two implementations of the same interface diverge. Gaps 1 and 2 were flagged via structural pattern matching against the C backend; Gap 3 via argument-validation diffing. The CFFI environment wasn't available during the analysis, so none of the above was live-reproduced; verification is recommended but the diffs are small and the patterns are unambiguous on inspection.
Discovery, root-cause analysis, and issue drafting were performed by Claude Code and reviewed by a human before filing.
Full report
Complete multi-agent analysis (48 FIX findings across 13 categories, plus a reproducer appendix): https://gist.github.com/devdanzin/b86039ac097141579590c1a0f3a43605