Skip to content

Fix null pointer handling in edit_distance#2263

Open
THE-Amrit-mahto-05 wants to merge 2 commits intoCCExtractor:masterfrom
THE-Amrit-mahto-05:fix/null-pointer-edit-distance
Open

Fix null pointer handling in edit_distance#2263
THE-Amrit-mahto-05 wants to merge 2 commits intoCCExtractor:masterfrom
THE-Amrit-mahto-05:fix/null-pointer-edit-distance

Conversation

@THE-Amrit-mahto-05
Copy link
Copy Markdown
Contributor

@THE-Amrit-mahto-05 THE-Amrit-mahto-05 commented Apr 12, 2026

In raising this pull request, I confirm the following (please check boxes):

Reason for this PR:

  • This PR adds new functionality.
  • This PR fixes a bug that I have personally experienced or that a real user has reported and for which a sample exists.
  • This PR is porting code from C to Rust.

Sanity check:

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • If the PR adds new functionality, I've added it to the changelog. If it's just a bug fix, I have NOT added it to the changelog.
  • I am NOT adding new C code unless it's to fix an existing, reproducible bug.

Problem

The function edit_distance is part of the Rust FFI layer and is exposed to C code. In C usage, NULL pointers are valid inputs, but the current implementation does not safely handle them.

Issue

  • Passing NULL word1 or word2 leads to unsafe dereference via CStr::from_ptr
  • This results in undefined behavior and potential segmentation faults

Proof

The issue can be reproduced by calling the function with NULL pointers from the FFI boundary.

A crash was observed during test execution due to a NULL pointer dereference (SIGSEGV).


Steps to reproduce

  1. Call:
    edit_distance(word1 = NULL, word2 = "abc", len1 = 0, len2 = 3)
  2. Call:
    edit_distance(word1 = "abc", word2 = NULL, len1 = 3, len2 = 0)

Expected behavior

The function should safely handle NULL inputs and return:

  • Length of the non-null string
  • 0 if both inputs are NULL

Actual behavior (before fix)

Unsafe pointer dereference occurs when converting NULL pointers using CStr::from_ptr, leading to undefined behavior or crash.


Fix

  • Added NULL pointer checks before any CStr::from_ptr usage
  • Safe handling logic introduced:
    • both NULL → return 0
    • word1 NULL → return len2
    • word2 NULL → return len1
  • Prevents unsafe dereferencing at the FFI boundary

In the C FFI boundary, NULL pointers are treated as empty inputs when passed to edit_distance.
When one input is NULL, the function returns the length of the other string, treating it as a full deletion/insertion cost.
This behavior is consistent with the existing interpretation of edit distance and ensures safe handling without undefined behavior.


Testing

  • Ran: cargo test --features hardsubx_ocr
  • Result: 406 tests passed, 0 failures
  • No regressions introduced in edit_distance

Impact

This change improves memory safety and prevents undefined behavior in Rust bindings exposed to C via FFI.

@ccextractor-bot
Copy link
Copy Markdown
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit ad4886e...:
Report Name Tests Passed
Broken 9/13
CEA-708 1/14
DVB 2/7
DVD 3/3
DVR-MS 2/2
General 22/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 76/86
Teletext 20/21
WTV 13/13
XDS 31/34

Your PR breaks these cases:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b...
  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2...
  • ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65...
  • ccextractor --autoprogram --out=ttxt --latin1 01509e4d27...
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b...
  • ccextractor --out=spupng c83f765c66...
  • ccextractor --datastreamtype 2 --streamtype 2 c83f765c66...
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e...
  • ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7...

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 b22260d065..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@ccextractor-bot
Copy link
Copy Markdown
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit ad4886e...:
Report Name Tests Passed
Broken 9/13
CEA-708 1/14
DVB 3/7
DVD 3/3
DVR-MS 2/2
General 22/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 81/86
Teletext 20/21
WTV 13/13
XDS 31/34

Your PR breaks these cases:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b...
  • ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9...
  • ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc...
  • ccextractor --autoprogram --out=srt --latin1 b22260d065...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e...
  • ccextractor --autoprogram --out=ttxt --latin1 01509e4d27...
  • ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e...
  • ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7...

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --out=spupng c83f765c66..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants