Skip to content

Use simdutf for faster string encoding/decoding#976

Merged
boorad merged 21 commits intomargelo:mainfrom
wh201906:wh201906/simdutf
Apr 18, 2026
Merged

Use simdutf for faster string encoding/decoding#976
boorad merged 21 commits intomargelo:mainfrom
wh201906:wh201906/simdutf

Conversation

@wh201906
Copy link
Copy Markdown
Contributor

@wh201906 wh201906 commented Apr 6, 2026

This PR migrates some string encoding/decoding functions in HybridUtils from OpenSSL/manual implementations to simdutf:

These changes significantly improve performance when encoding/decoding large base64 payloads. I ran the benchmark on an old Android device and here are the results:
(I added some test cases with 1kB data, but this PR doesn't include them)
base64 1MB encode throughput compared with CraftzdogBuffer: From 4.92x to 9.16x
base64 1MB decode throughput compared with CraftzdogBuffer: From 169.07x to 398.76x

  • old
IMG_20260406_205200
  • new
IMG_20260406_220227

@wh201906
Copy link
Copy Markdown
Contributor Author

wh201906 commented Apr 6, 2026

There are some ongoing optimizations in simdutf, which can also be utilized in RNQC once implemented.
simdutf/simdutf#925
simdutf/simdutf#565

@boorad
Copy link
Copy Markdown
Collaborator

boorad commented Apr 8, 2026

I mean... let's find a better way to download a pre-built library. I'm not adding 82K LOC to the repo.

@wh201906
Copy link
Copy Markdown
Contributor Author

wh201906 commented Apr 8, 2026

I've added simdutf as a submodule and modified the CMakeLists.txt for Android and QuickCrypto.podspec for iOS. I tested the builds for the two mobile platforms and both of them passed.
Would you mind testing it via the CI? I'm a first-time contributor so the CI won't run for this PR without approval.

@boorad
Copy link
Copy Markdown
Collaborator

boorad commented Apr 14, 2026

Claude review 😇

The Good

  1. simdutf is a solid choice — it's the same library Node.js itself uses for string encoding. SIMD-accelerated base64 on ARM (NEON) is a real win for mobile, and the benchmark numbers (4.9x → 9.2x encode, 169x → 399x decode) are impressive.

  2. Removing decodeBase64Url as a separate function is smart — simdutf's base64_default_or_url_accept_garbage option handles both base64 and base64url in one decoder, which simplifies the code and matches Node.js behavior (where Buffer.from(str, 'base64') also accepts URL-safe characters).

  3. Good test additions — the new tests for cross-format acceptance (base64 accepting URL-safe chars and vice versa) and padding/trailing-data behavior match Node.js semantics.

  4. Latin1 encoding via simdutf is a clean replacement for the manual byte-by-byte loop.

ArrayBuffer::move — Yes, It's a New Nitro API

ArrayBuffer::move(std::vector<uint8_t>&&) was added in a newer Nitro Modules version (the project is on 0.29.1, it appears around 0.34-0.35). It takes ownership of the vector's heap allocation via move semantics — no copy. This is strictly better than the old ToNativeArrayBuffer(vec) pattern which always new uint8_t[] + memcpy'd.

However, this PR would not compile against the current Nitro version (0.29.1). It requires a Nitro upgrade. The PR doesn't mention this dependency. Also note the ArrayBuffer::copy call for the utf8 path — that one is fine, it exists in 0.29.1.

Concerns / Questions

  1. Nitro version dependency is undeclared — this is a breaking change if merged without bumping Nitro. Should be called out.

  2. Submodule adds significant weight — simdutf is a large library (~hundreds of KB of generated SIMD code). The author mentions no significant APK size increase, which is good, but the single-header limited-feature generation option mentioned in the PR description should probably be used to keep it lean (we only need base64 + latin1→utf8).

  3. base64_default_or_url_accept_garbage — the name says "accept garbage," which means it's lenient. This matches Node.js behavior, but it's worth verifying edge cases where malformed input should throw vs silently decode. The tests cover the padding-stop behavior, which is good.

  4. encodeLatin1 zero-check may be wrongsimdutf::convert_latin1_to_utf8 returns 0 on error, but also returns 0 for empty input. The early len == 0 return handles it, but it's worth noting the function contract.

  5. cmake_minimum_required bump from 3.10 to 3.15 — probably needed by simdutf's CMakeLists, but worth confirming it doesn't break any CI/older Android NDK versions.

  6. Our ToNativeArrayBuffer helpers become partially dead code — the string and vector overloads are no longer used in HybridUtils. If we're going to adopt ArrayBuffer::move/::copy project-wide, we should migrate the other callsites too (ECDH, cipher, etc.) and remove the old helpers. Otherwise it's inconsistent.

Summary

Good PR with real perf wins, clean code reduction, and correct Node.js-aligned semantics. The main issue is the undeclared Nitro Modules version dependency — this needs a Nitro upgrade to compile. Worth asking the contributor if they tested against a newer Nitro version, and whether we should bundle the Nitro upgrade into this PR or do it separately first.

@wh201906
Copy link
Copy Markdown
Contributor Author

ArrayBuffer::move(std::vector<uint8_t>&&) was added in a newer Nitro Modules version (the project is on 0.29.1, it appears around 0.34-0.35).

I checked the code of nitro and found it is available in v0.31.2. We only need to bump nitro to that version.
mrousavy/nitro@f2ddc45#diff-7bf06be05b9debc132d1aa90c40244197990ea37eeceecb88e0fbcc099c43b9cR35-R39

Alternatively, considering the implementation of move() is only few lines, I can also implement it in RNQC without bumping the nitro.

Which way do you prefer?

@wh201906
Copy link
Copy Markdown
Contributor Author

wh201906 commented Apr 15, 2026

but the single-header limited-feature generation option mentioned in the PR description should probably be used to keep it lean

This involves running a Python script from simdutf when building RNQC, I'm not sure if it's a good idea to require Python as a dev dependency with extra building steps, considering how small the binary size we can save.

@wh201906
Copy link
Copy Markdown
Contributor Author

base64_default_or_url_accept_garbage — the name says "accept garbage," which means it's lenient. This matches Node.js behavior, but it's worth verifying edge cases where malformed input should throw vs silently decode. The tests cover the padding-stop behavior, which is good.

I can add more test cases of Buffer.from()/Buffer.toString() from Node.js if you wish. I only ensured this PR doesn't break any existing test cases.

@wh201906
Copy link
Copy Markdown
Contributor Author

encodeLatin1 zero-check may be wrong — simdutf::convert_latin1_to_utf8 returns 0 on error, but also returns 0 for empty input. The early len == 0 return handles it, but it's worth noting the function contract.

This is more like a fast path. If the input length is 0, no need to calculate the output length and enter the conversion.

@boorad
Copy link
Copy Markdown
Collaborator

boorad commented Apr 16, 2026

@wh201906

  1. Definitely bump nitro. I was thinking we keep it to its lowest level (0.31.2) for widest compatibility with other modules, but open to suggestions / bumping it all the way to latest.

  2. yeah, drop the single-header idea

  3. More test cases would be good - especially for malformed base64 edge cases, since accept_garbage is lenient by design. Maybe a few Node.js-aligned edge case tests.

  4. 👍 fast path

  5. cmake minimum required bump?

  6. follow-up ticket to clean up ToNativeArrayBuffer() callsites

@wh201906
Copy link
Copy Markdown
Contributor Author

wh201906 commented Apr 16, 2026

I was thinking we keep it to its lowest level (0.31.2) for widest compatibility with other modules

I also prefer this rather than bumping to the latest. ❤compatibility
Should I bump it in this PR? I can also make a new PR for bumping nitro then rebase this one.

More test cases would be good

I'm gonna add some test cases from Node.js

cmake minimum required bump?

Yes. The simdutf requires 3.15. Not necessary now.
https://github.com/simdutf/simdutf/blob/cfe3bfdf415fcead80a9281f7515db2ab5562291/CMakeLists.txt#L1
I guess bumping CMake won't cause big compatibility problems?

@wh201906
Copy link
Copy Markdown
Contributor Author

Yes. The simdutf requires 3.15.

I'm trying to build it without the CMakeLists.txt from simdutf. If it works we don't need to bump the CMake required version.

@wh201906
Copy link
Copy Markdown
Contributor Author

wh201906 commented Apr 18, 2026

I'm trying to build it without the CMakeLists.txt from simdutf.

This is fixed in 96021e6 and I have reverted the change of CMake minimum version

I'm gonna add some test cases from Node.js

I extracted 23 related test cases from Node.js v24.15.0 (latest LTS version). The 19 passed test cases are added in ddc0d08
(related: test cases related to current definition of native encoding)


Test toString('base64')

Current encoding_tests.ts:

test(SUITE, '[Node.js] Test toString(\'base64\')', () => {
  expect(bufferToString(stringToBuffer('Man', 'utf8'), 'base64')).to.equal(
    'TWFu',
  );
  expect(bufferToString(stringToBuffer('Woman', 'utf8'), 'base64')).to.equal(
    'V29tYW4=',
  );
});

Original Node.js (test/parallel/test-buffer-alloc.js):

//
// Test toString('base64')
//
assert.strictEqual((Buffer.from('Man')).toString('base64'), 'TWFu');
assert.strictEqual((Buffer.from('Woman')).toString('base64'), 'V29tYW4=');
Test that regular and URL-safe base64 both work both ways

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] Test that regular and URL-safe base64 both work both ways',
  () => {
    const expected = new Uint8Array([
      0xff, 0xff, 0xbe, 0xff, 0xef, 0xbf, 0xfb, 0xef, 0xff,
    ]);

    expect(toU8(stringToBuffer('//++/++/++//', 'base64'))).to.deep.equal(
      expected,
    );
    expect(toU8(stringToBuffer('__--_--_--__', 'base64'))).to.deep.equal(
      expected,
    );
    expect(toU8(stringToBuffer('//++/++/++//', 'base64url'))).to.deep.equal(
      expected,
    );
    expect(toU8(stringToBuffer('__--_--_--__', 'base64url'))).to.deep.equal(
      expected,
    );
  },
);

Original Node.js (test/parallel/test-buffer-alloc.js):

{
  // Test that regular and URL-safe base64 both work both ways
  const expected = [0xff, 0xff, 0xbe, 0xff, 0xef, 0xbf, 0xfb, 0xef, 0xff];
  assert.deepStrictEqual(Buffer.from('//++/++/++//', 'base64'),
                         Buffer.from(expected));
  assert.deepStrictEqual(Buffer.from('__--_--_--__', 'base64'),
                         Buffer.from(expected));
  assert.deepStrictEqual(Buffer.from('//++/++/++//', 'base64url'),
                         Buffer.from(expected));
  assert.deepStrictEqual(Buffer.from('__--_--_--__', 'base64url'),
                         Buffer.from(expected));
}
Test that regular and URL-safe base64 both work both ways with padding

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] Test that regular and URL-safe base64 both work both ways with padding',
  () => {
    const expected = new Uint8Array([
      0xff, 0xff, 0xbe, 0xff, 0xef, 0xbf, 0xfb, 0xef, 0xff, 0xfb,
    ]);

    expect(toU8(stringToBuffer('//++/++/++//+w==', 'base64'))).to.deep.equal(
      expected,
    );
    expect(toU8(stringToBuffer('//++/++/++//+w==', 'base64url'))).to.deep.equal(
      expected,
    );
  },
);

Original Node.js (test/parallel/test-buffer-alloc.js):

{
  // Test that regular and URL-safe base64 both work both ways with padding
  const expected = [0xff, 0xff, 0xbe, 0xff, 0xef, 0xbf, 0xfb, 0xef, 0xff, 0xfb];
  assert.deepStrictEqual(Buffer.from('//++/++/++//+w==', 'base64'),
                         Buffer.from(expected));
  assert.deepStrictEqual(Buffer.from('//++/++/++//+w==', 'base64'),
                         Buffer.from(expected));
  assert.deepStrictEqual(Buffer.from('//++/++/++//+w==', 'base64url'),
                         Buffer.from(expected));
  assert.deepStrictEqual(Buffer.from('//++/++/++//+w==', 'base64url'),
                         Buffer.from(expected));
}
Check that the base64 decoder ignores whitespace

Current encoding_tests.ts:

test(SUITE, '[Node.js] Check that the base64 decoder ignores whitespace', () => {
  const quote =
    'Man is distinguished, not only by his reason, but by this ' +
    'singular passion from other animals, which is a lust ' +
    'of the mind, that by a perseverance of delight in the ' +
    'continued and indefatigable generation of knowledge, ' +
    'exceeds the short vehemence of any carnal pleasure.';
  const expected =
    'TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBi' +
    'eSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBp' +
    'cyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVs' +
    'aWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24g' +
    'b2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNh' +
    'cm5hbCBwbGVhc3VyZS4=';
  const base64flavors = ['base64', 'base64url'] as const;

  base64flavors.forEach(encoding => {
    const expectedWhite =
      `${expected.slice(0, 60)} \n` +
      `${expected.slice(60, 120)} \n` +
      `${expected.slice(120, 180)} \n` +
      `${expected.slice(180, 240)} \n` +
      `${expected.slice(240, 300)}\n` +
      `${expected.slice(300, 360)}\n`;
    const decoded = bufferToString(stringToBuffer(expectedWhite, encoding), 'utf8');
    expect(decoded).to.equal(quote);
  });
});

Original Node.js (test/parallel/test-buffer-alloc.js):

{
  // big example
  const quote = 'Man is distinguished, not only by his reason, but by this ' +
                'singular passion from other animals, which is a lust ' +
                'of the mind, that by a perseverance of delight in the ' +
                'continued and indefatigable generation of knowledge, ' +
                'exceeds the short vehemence of any carnal pleasure.';
  const expected = 'TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb' +
                   '24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlci' +
                   'BhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQ' +
                   'gYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu' +
                   'dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZ' +
                   'GdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm' +
                   '5hbCBwbGVhc3VyZS4=';
  assert.strictEqual(Buffer.from(quote).toString('base64'), expected);
  assert.strictEqual(
    Buffer.from(quote).toString('base64url'),
    expected.replaceAll('+', '-').replaceAll('/', '_').replaceAll('=', '')
  );

  base64flavors.forEach((encoding) => {
    let b = Buffer.allocUnsafe(1024);
    let bytesWritten = b.write(expected, 0, encoding);
    assert.strictEqual(quote.length, bytesWritten);
    assert.strictEqual(quote, b.toString('ascii', 0, quote.length));

    // Check that the base64 decoder ignores whitespace
    const expectedWhite = `${expected.slice(0, 60)} \n` +
                          `${expected.slice(60, 120)} \n` +
                          `${expected.slice(120, 180)} \n` +
                          `${expected.slice(180, 240)} \n` +
                          `${expected.slice(240, 300)}\n` +
                          `${expected.slice(300, 360)}\n`;
    b = Buffer.allocUnsafe(1024);
    bytesWritten = b.write(expectedWhite, 0, encoding);
    assert.strictEqual(quote.length, bytesWritten);
    assert.strictEqual(quote, b.toString('ascii', 0, quote.length));

    // Check that the base64 decoder on the constructor works
    // even in the presence of whitespace.
    b = Buffer.from(expectedWhite, encoding);
    assert.strictEqual(quote.length, b.length);
    assert.strictEqual(quote, b.toString('ascii', 0, quote.length));
  });
}
Check that the base64 decoder ignores illegal chars

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] Check that the base64 decoder ignores illegal chars',
  () => {
    const quote =
      'Man is distinguished, not only by his reason, but by this ' +
      'singular passion from other animals, which is a lust ' +
      'of the mind, that by a perseverance of delight in the ' +
      'continued and indefatigable generation of knowledge, ' +
      'exceeds the short vehemence of any carnal pleasure.';
    const expected =
      'TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBi' +
      'eSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBp' +
      'cyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVs' +
      'aWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24g' +
      'b2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNh' +
      'cm5hbCBwbGVhc3VyZS4=';
    const base64flavors = ['base64', 'base64url'] as const;

    base64flavors.forEach(encoding => {
      const expectedIllegal =
        expected.slice(0, 60) +
        ' \x80' +
        expected.slice(60, 120) +
        ' \xff' +
        expected.slice(120, 180) +
        ' \x00' +
        expected.slice(180, 240) +
        ' \x98' +
        expected.slice(240, 300) +
        '\x03' +
        expected.slice(300, 360);
      const decoded = bufferToString(
        stringToBuffer(expectedIllegal, encoding),
        'utf8',
      );
      expect(decoded).to.equal(quote);
    });
  },
);

Original Node.js (test/parallel/test-buffer-alloc.js):

{
  // big example
  const quote = 'Man is distinguished, not only by his reason, but by this ' +
                'singular passion from other animals, which is a lust ' +
                'of the mind, that by a perseverance of delight in the ' +
                'continued and indefatigable generation of knowledge, ' +
                'exceeds the short vehemence of any carnal pleasure.';
  const expected = 'TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb' +
                   '24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlci' +
                   'BhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQ' +
                   'gYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu' +
                   'dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZ' +
                   'GdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm' +
                   '5hbCBwbGVhc3VyZS4=';
  assert.strictEqual(Buffer.from(quote).toString('base64'), expected);
  assert.strictEqual(
    Buffer.from(quote).toString('base64url'),
    expected.replaceAll('+', '-').replaceAll('/', '_').replaceAll('=', '')
  );

  base64flavors.forEach((encoding) => {
    // Check that the base64 decoder ignores illegal chars
    const expectedIllegal = expected.slice(0, 60) + ' \x80' +
                            expected.slice(60, 120) + ' \xff' +
                            expected.slice(120, 180) + ' \x00' +
                            expected.slice(180, 240) + ' \x98' +
                            expected.slice(240, 300) + '\x03' +
                            expected.slice(300, 360);
    b = Buffer.from(expectedIllegal, encoding);
    assert.strictEqual(quote.length, b.length);
    assert.strictEqual(quote, b.toString('ascii', 0, quote.length));
  });
}
Handle padding graciously, multiple-of-4 or not

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] Handle padding graciously, multiple-of-4 or not',
  () => {
    const base64flavors = ['base64', 'base64url'] as const;

    base64flavors.forEach(encoding => {
      expect(bufferToString(stringToBuffer('', encoding), 'utf8')).to.equal('');
      expect(bufferToString(stringToBuffer('K', encoding), 'utf8')).to.equal('');

      expect(bufferToString(stringToBuffer('Kg==', encoding), 'utf8')).to.equal(
        '*',
      );
      expect(bufferToString(stringToBuffer('Kio=', encoding), 'utf8')).to.equal(
        '*'.repeat(2),
      );
      expect(bufferToString(stringToBuffer('Kioq', encoding), 'utf8')).to.equal(
        '*'.repeat(3),
      );
      expect(
        bufferToString(stringToBuffer('KioqKg==', encoding), 'utf8'),
      ).to.equal('*'.repeat(4));
      expect(
        bufferToString(stringToBuffer('KioqKio=', encoding), 'utf8'),
      ).to.equal('*'.repeat(5));
      expect(
        bufferToString(stringToBuffer('KioqKioq', encoding), 'utf8'),
      ).to.equal('*'.repeat(6));
      expect(
        bufferToString(stringToBuffer('KioqKioqKg==', encoding), 'utf8'),
      ).to.equal('*'.repeat(7));
      expect(
        bufferToString(stringToBuffer('KioqKioqKio=', encoding), 'utf8'),
      ).to.equal('*'.repeat(8));
      expect(
        bufferToString(stringToBuffer('KioqKioqKioq', encoding), 'utf8'),
      ).to.equal('*'.repeat(9));
      expect(
        bufferToString(stringToBuffer('KioqKioqKioqKg==', encoding), 'utf8'),
      ).to.equal('*'.repeat(10));
      expect(
        bufferToString(stringToBuffer('KioqKioqKioqKio=', encoding), 'utf8'),
      ).to.equal('*'.repeat(11));
      expect(
        bufferToString(stringToBuffer('KioqKioqKioqKioq', encoding), 'utf8'),
      ).to.equal('*'.repeat(12));
      expect(
        bufferToString(stringToBuffer('KioqKioqKioqKioqKg==', encoding), 'utf8'),
      ).to.equal('*'.repeat(13));
      expect(
        bufferToString(stringToBuffer('KioqKioqKioqKioqKio=', encoding), 'utf8'),
      ).to.equal('*'.repeat(14));
      expect(
        bufferToString(stringToBuffer('KioqKioqKioqKioqKioq', encoding), 'utf8'),
      ).to.equal('*'.repeat(15));
      expect(
        bufferToString(
          stringToBuffer('KioqKioqKioqKioqKioqKg==', encoding),
          'utf8',
        ),
      ).to.equal('*'.repeat(16));
      expect(
        bufferToString(
          stringToBuffer('KioqKioqKioqKioqKioqKio=', encoding),
          'utf8',
        ),
      ).to.equal('*'.repeat(17));
      expect(
        bufferToString(
          stringToBuffer('KioqKioqKioqKioqKioqKioq', encoding),
          'utf8',
        ),
      ).to.equal('*'.repeat(18));
      expect(
        bufferToString(
          stringToBuffer('KioqKioqKioqKioqKioqKioqKg==', encoding),
          'utf8',
        ),
      ).to.equal('*'.repeat(19));
      expect(
        bufferToString(
          stringToBuffer('KioqKioqKioqKioqKioqKioqKio=', encoding),
          'utf8',
        ),
      ).to.equal('*'.repeat(20));

      expect(bufferToString(stringToBuffer('Kg', encoding), 'utf8')).to.equal(
        '*',
      );
      expect(bufferToString(stringToBuffer('Kio', encoding), 'utf8')).to.equal(
        '*'.repeat(2),
      );
      expect(
        bufferToString(stringToBuffer('KioqKg', encoding), 'utf8'),
      ).to.equal('*'.repeat(4));
      expect(
        bufferToString(stringToBuffer('KioqKio', encoding), 'utf8'),
        ).to.equal('*'.repeat(5));
      expect(
        bufferToString(stringToBuffer('KioqKioqKg', encoding), 'utf8'),
      ).to.equal('*'.repeat(7));
      expect(
        bufferToString(stringToBuffer('KioqKioqKio', encoding), 'utf8'),
      ).to.equal('*'.repeat(8));
      expect(
        bufferToString(stringToBuffer('KioqKioqKioqKg', encoding), 'utf8'),
      ).to.equal('*'.repeat(10));
      expect(
        bufferToString(stringToBuffer('KioqKioqKioqKio', encoding), 'utf8'),
      ).to.equal('*'.repeat(11));
      expect(
        bufferToString(stringToBuffer('KioqKioqKioqKioqKg', encoding), 'utf8'),
      ).to.equal('*'.repeat(13));
      expect(
        bufferToString(stringToBuffer('KioqKioqKioqKioqKio', encoding), 'utf8'),
      ).to.equal('*'.repeat(14));
      expect(
        bufferToString(
          stringToBuffer('KioqKioqKioqKioqKioqKg', encoding),
          'utf8',
        ),
      ).to.equal('*'.repeat(16));
      expect(
        bufferToString(
          stringToBuffer('KioqKioqKioqKioqKioqKio', encoding),
          'utf8',
        ),
      ).to.equal('*'.repeat(17));
      expect(
        bufferToString(
          stringToBuffer('KioqKioqKioqKioqKioqKioqKg', encoding),
          'utf8',
        ),
      ).to.equal('*'.repeat(19));
      expect(
        bufferToString(
          stringToBuffer('KioqKioqKioqKioqKioqKioqKio', encoding),
          'utf8',
        ),
      ).to.equal('*'.repeat(20));
    });

    expect(
      stringToBuffer(
        '72INjkR5fchcxk9+VgdGPFJDxUBFR5/rMFsghgxADiw==',
        'base64',
      ).byteLength,
    ).to.equal(32);
    expect(
      stringToBuffer(
        '72INjkR5fchcxk9-VgdGPFJDxUBFR5_rMFsghgxADiw==',
        'base64url',
      ).byteLength,
    ).to.equal(32);
    expect(
      stringToBuffer(
        '72INjkR5fchcxk9+VgdGPFJDxUBFR5/rMFsghgxADiw=',
        'base64',
      ).byteLength,
    ).to.equal(32);
    expect(
      stringToBuffer(
        '72INjkR5fchcxk9-VgdGPFJDxUBFR5_rMFsghgxADiw=',
        'base64url',
      ).byteLength,
    ).to.equal(32);
    expect(
      stringToBuffer(
        '72INjkR5fchcxk9+VgdGPFJDxUBFR5/rMFsghgxADiw',
        'base64',
      ).byteLength,
    ).to.equal(32);
    expect(
      stringToBuffer(
        '72INjkR5fchcxk9-VgdGPFJDxUBFR5_rMFsghgxADiw',
        'base64url',
      ).byteLength,
    ).to.equal(32);
    expect(
      stringToBuffer(
        'w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg==',
        'base64',
      ).byteLength,
    ).to.equal(31);
    expect(
      stringToBuffer(
        'w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg==',
        'base64url',
      ).byteLength,
    ).to.equal(31);
    expect(
      stringToBuffer(
        'w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg=',
        'base64',
      ).byteLength,
    ).to.equal(31);
    expect(
      stringToBuffer(
        'w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg=',
        'base64url',
      ).byteLength,
    ).to.equal(31);
    expect(
      stringToBuffer(
        'w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg',
        'base64',
      ).byteLength,
    ).to.equal(31);
    expect(
      stringToBuffer(
        'w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg',
        'base64url',
      ).byteLength,
    ).to.equal(31);
  },
);

Original Node.js (test/parallel/test-buffer-alloc.js):

const base64flavors = ['base64', 'base64url'];

base64flavors.forEach((encoding) => {
  assert.strictEqual(Buffer.from('', encoding).toString(), '');
  assert.strictEqual(Buffer.from('K', encoding).toString(), '');

  // multiple-of-4 with padding
  assert.strictEqual(Buffer.from('Kg==', encoding).toString(), '*');
  assert.strictEqual(Buffer.from('Kio=', encoding).toString(), '*'.repeat(2));
  assert.strictEqual(Buffer.from('Kioq', encoding).toString(), '*'.repeat(3));
  assert.strictEqual(
    Buffer.from('KioqKg==', encoding).toString(), '*'.repeat(4));
  assert.strictEqual(
    Buffer.from('KioqKio=', encoding).toString(), '*'.repeat(5));
  assert.strictEqual(
    Buffer.from('KioqKioq', encoding).toString(), '*'.repeat(6));
  assert.strictEqual(Buffer.from('KioqKioqKg==', encoding).toString(),
                     '*'.repeat(7));
  assert.strictEqual(Buffer.from('KioqKioqKio=', encoding).toString(),
                     '*'.repeat(8));
  assert.strictEqual(Buffer.from('KioqKioqKioq', encoding).toString(),
                     '*'.repeat(9));
  assert.strictEqual(Buffer.from('KioqKioqKioqKg==', encoding).toString(),
                     '*'.repeat(10));
  assert.strictEqual(Buffer.from('KioqKioqKioqKio=', encoding).toString(),
                     '*'.repeat(11));
  assert.strictEqual(Buffer.from('KioqKioqKioqKioq', encoding).toString(),
                     '*'.repeat(12));
  assert.strictEqual(Buffer.from('KioqKioqKioqKioqKg==', encoding).toString(),
                     '*'.repeat(13));
  assert.strictEqual(Buffer.from('KioqKioqKioqKioqKio=', encoding).toString(),
                     '*'.repeat(14));
  assert.strictEqual(Buffer.from('KioqKioqKioqKioqKioq', encoding).toString(),
                     '*'.repeat(15));
  assert.strictEqual(
    Buffer.from('KioqKioqKioqKioqKioqKg==', encoding).toString(),
    '*'.repeat(16));
  assert.strictEqual(
    Buffer.from('KioqKioqKioqKioqKioqKio=', encoding).toString(),
    '*'.repeat(17));
  assert.strictEqual(
    Buffer.from('KioqKioqKioqKioqKioqKioq', encoding).toString(),
    '*'.repeat(18));
  assert.strictEqual(Buffer.from('KioqKioqKioqKioqKioqKioqKg==',
                                 encoding).toString(),
                     '*'.repeat(19));
  assert.strictEqual(Buffer.from('KioqKioqKioqKioqKioqKioqKio=',
                                 encoding).toString(),
                     '*'.repeat(20));

  // No padding, not a multiple of 4
  assert.strictEqual(Buffer.from('Kg', encoding).toString(), '*');
  assert.strictEqual(Buffer.from('Kio', encoding).toString(), '*'.repeat(2));
  assert.strictEqual(Buffer.from('KioqKg', encoding).toString(), '*'.repeat(4));
  assert.strictEqual(
    Buffer.from('KioqKio', encoding).toString(), '*'.repeat(5));
  assert.strictEqual(Buffer.from('KioqKioqKg', encoding).toString(),
                     '*'.repeat(7));
  assert.strictEqual(Buffer.from('KioqKioqKio', encoding).toString(),
                     '*'.repeat(8));
  assert.strictEqual(Buffer.from('KioqKioqKioqKg', encoding).toString(),
                     '*'.repeat(10));
  assert.strictEqual(Buffer.from('KioqKioqKioqKio', encoding).toString(),
                     '*'.repeat(11));
  assert.strictEqual(Buffer.from('KioqKioqKioqKioqKg', encoding).toString(),
                     '*'.repeat(13));
  assert.strictEqual(Buffer.from('KioqKioqKioqKioqKio', encoding).toString(),
                     '*'.repeat(14));
  assert.strictEqual(Buffer.from('KioqKioqKioqKioqKioqKg', encoding).toString(),
                     '*'.repeat(16));
  assert.strictEqual(
    Buffer.from('KioqKioqKioqKioqKioqKio', encoding).toString(),
    '*'.repeat(17));
  assert.strictEqual(
    Buffer.from('KioqKioqKioqKioqKioqKioqKg', encoding).toString(),
    '*'.repeat(19));
  assert.strictEqual(
    Buffer.from('KioqKioqKioqKioqKioqKioqKio', encoding).toString(),
    '*'.repeat(20));
});

// Handle padding graciously, multiple-of-4 or not
assert.strictEqual(
  Buffer.from('72INjkR5fchcxk9+VgdGPFJDxUBFR5/rMFsghgxADiw==', 'base64').length,
  32
);
assert.strictEqual(
  Buffer.from('72INjkR5fchcxk9-VgdGPFJDxUBFR5_rMFsghgxADiw==', 'base64url')
    .length,
  32
);
assert.strictEqual(
  Buffer.from('72INjkR5fchcxk9+VgdGPFJDxUBFR5/rMFsghgxADiw=', 'base64').length,
  32
);
assert.strictEqual(
  Buffer.from('72INjkR5fchcxk9-VgdGPFJDxUBFR5_rMFsghgxADiw=', 'base64url')
    .length,
  32
);
assert.strictEqual(
  Buffer.from('72INjkR5fchcxk9+VgdGPFJDxUBFR5/rMFsghgxADiw', 'base64').length,
  32
);
assert.strictEqual(
  Buffer.from('72INjkR5fchcxk9-VgdGPFJDxUBFR5_rMFsghgxADiw', 'base64url')
    .length,
  32
);
assert.strictEqual(
  Buffer.from('w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg==', 'base64').length,
  31
);
assert.strictEqual(
  Buffer.from('w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg==', 'base64url')
    .length,
  31
);
assert.strictEqual(
  Buffer.from('w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg=', 'base64').length,
  31
);
assert.strictEqual(
  Buffer.from('w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg=', 'base64url')
    .length,
  31
);
assert.strictEqual(
  Buffer.from('w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg', 'base64').length,
  31
);
assert.strictEqual(
  Buffer.from('w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg', 'base64url').length,
  31
);
Test single base64 char encodes as 0.

Current encoding_tests.ts:

test(SUITE, '[Node.js] Test single base64 char encodes as 0.', () => {
  expect(toU8(stringToBuffer('A', 'base64'))).to.deep.equal(new Uint8Array([]));
});

Original Node.js (test/parallel/test-buffer-alloc.js):

// Test single base64 char encodes as 0.
assert.strictEqual(Buffer.from('A', 'base64').length, 0);
Return empty output for invalid base64 with repeated leading padding (nodejs/node#3496)

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] Return empty output for invalid base64 with repeated leading padding (nodejs/node#3496)',
  () => {
    expect(toU8(stringToBuffer('=bad'.repeat(1e4), 'base64'))).to.deep.equal(
      new Uint8Array([]),
    );
  },
);

Original Node.js (test/parallel/test-buffer-alloc.js):

// Regression test for https://github.com/nodejs/node/issues/3496.
assert.strictEqual(Buffer.from('=bad'.repeat(1e4), 'base64').length, 0);
Ignore trailing whitespace in base64 input (nodejs/node#11987)

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] Ignore trailing whitespace in base64 input (nodejs/node#11987)',
  () => {
    expect(toU8(stringToBuffer('w0  ', 'base64'))).to.deep.equal(
      toU8(stringToBuffer('w0', 'base64')),
    );
  },
);

Original Node.js (test/parallel/test-buffer-alloc.js):

// Regression test for https://github.com/nodejs/node/issues/11987.
assert.deepStrictEqual(Buffer.from('w0  ', 'base64'),
                       Buffer.from('w0', 'base64'));
Ignore leading whitespace in base64 input (nodejs/node#13657)

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] Ignore leading whitespace in base64 input (nodejs/node#13657)',
  () => {
    expect(toU8(stringToBuffer(' YWJvcnVtLg', 'base64'))).to.deep.equal(
      toU8(stringToBuffer('YWJvcnVtLg', 'base64')),
    );
  },
);

Original Node.js (test/parallel/test-buffer-alloc.js):

// Regression test for https://github.com/nodejs/node/issues/13657.
assert.deepStrictEqual(Buffer.from(' YWJvcnVtLg', 'base64'),
                       Buffer.from('YWJvcnVtLg', 'base64'));
Test toString('base64url')

Current encoding_tests.ts:

test(SUITE, '[Node.js] Test toString(\'base64url\')', () => {
  expect(bufferToString(stringToBuffer('Man', 'utf8'), 'base64url')).to.equal(
    'TWFu',
  );
  expect(
    bufferToString(stringToBuffer('Woman', 'utf8'), 'base64url'),
  ).to.equal('V29tYW4');
});

Original Node.js (test/parallel/test-buffer-alloc.js):

//
// Test toString('base64url')
//
assert.strictEqual((Buffer.from('Man')).toString('base64url'), 'TWFu');
assert.strictEqual((Buffer.from('Woman')).toString('base64url'), 'V29tYW4');
This string encodes single '.' character in UTF-16

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] This string encodes single \'.\' character in UTF-16',
  () => {
    const dot = new Uint8Array([0xff, 0xfe, 0x2e, 0x00]).buffer as ArrayBuffer;
    expect(bufferToString(dot, 'base64')).to.equal('//4uAA==');
    expect(bufferToString(dot, 'base64url')).to.equal('__4uAA');
  },
);

Original Node.js (test/parallel/test-buffer-alloc.js):

{
// This string encodes single '.' character in UTF-16
  const dot = Buffer.from('//4uAA==', 'base64');
  assert.strictEqual(dot[0], 0xff);
  assert.strictEqual(dot[1], 0xfe);
  assert.strictEqual(dot[2], 0x2e);
  assert.strictEqual(dot[3], 0x00);
  assert.strictEqual(dot.toString('base64'), '//4uAA==');
}

{
// This string encodes single '.' character in UTF-16
  const dot = Buffer.from('//4uAA', 'base64url');
  assert.strictEqual(dot[0], 0xff);
  assert.strictEqual(dot[1], 0xfe);
  assert.strictEqual(dot[2], 0x2e);
  assert.strictEqual(dot[3], 0x00);
  assert.strictEqual(dot.toString('base64url'), '__4uAA');
}
Test for proper UTF-8 Encoding

Current encoding_tests.ts:

test(SUITE, '[Node.js] Test for proper UTF-8 Encoding', () => {
  expect(toU8(stringToBuffer('\u00fcber', 'utf8'))).to.deep.equal(
    new Uint8Array([195, 188, 98, 101, 114]),
  );
});

Original Node.js (test/parallel/test-buffer-alloc.js):

{
  // Test for proper UTF-8 Encoding
  const e = Buffer.from('über');
  assert.deepStrictEqual(e, Buffer.from([195, 188, 98, 101, 114]));
}
Test UTF-8 string includes null character

Current encoding_tests.ts:

test(SUITE, '[Node.js] Test UTF-8 string includes null character', () => {
  expect(toU8(stringToBuffer('\0', 'utf8'))).to.deep.equal(
    new Uint8Array([0x00]),
  );
  expect(toU8(stringToBuffer('\0\0', 'utf8'))).to.deep.equal(
    new Uint8Array([0x00, 0x00]),
  );
});

Original Node.js (test/parallel/test-buffer-alloc.js):

{
  // https://github.com/nodejs/node-v0.x-archive/pull/1210
  // Test UTF-8 string includes null character
  let buf = Buffer.from('\0');
  assert.strictEqual(buf.length, 1);
  buf = Buffer.from('\0\0');
  assert.strictEqual(buf.length, 2);
}
Test unmatched surrogates not producing invalid utf8 output

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] Test unmatched surrogates not producing invalid utf8 output',
  () => {
    expect(toU8(stringToBuffer('ab\ud800cd', 'utf8'))).to.deep.equal(
      new Uint8Array([0x61, 0x62, 0xef, 0xbf, 0xbd, 0x63, 0x64]),
    );
  },
);

Original Node.js (test/parallel/test-buffer-alloc.js):

{
  // Test unmatched surrogates not producing invalid utf8 output
  // ef bf bd = utf-8 representation of unicode replacement character
  // see https://codereview.chromium.org/121173009/
  const buf = Buffer.from('ab\ud800cd', 'utf8');
  assert.strictEqual(buf[0], 0x61);
  assert.strictEqual(buf[1], 0x62);
  assert.strictEqual(buf[2], 0xef);
  assert.strictEqual(buf[3], 0xbf);
  assert.strictEqual(buf[4], 0xbd);
  assert.strictEqual(buf[5], 0x63);
  assert.strictEqual(buf[6], 0x64);
}
latin1 encoding should write only one byte per character.

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] latin1 encoding should write only one byte per character.',
  () => {
    expect(
      toU8(stringToBuffer(String.fromCharCode(0xffff), 'latin1')),
    ).to.deep.equal(new Uint8Array([0xff]));
    expect(
      toU8(stringToBuffer(String.fromCharCode(0xaaee), 'latin1')),
    ).to.deep.equal(new Uint8Array([0xee]));
  },
);

Original Node.js (test/parallel/test-buffer-alloc.js):

{
  // latin1 encoding should write only one byte per character.
  const b = Buffer.from([0xde, 0xad, 0xbe, 0xef]);
  let s = String.fromCharCode(0xffff);
  b.write(s, 0, 'latin1');
  assert.strictEqual(b[0], 0xff);
  assert.strictEqual(b[1], 0xad);
  assert.strictEqual(b[2], 0xbe);
  assert.strictEqual(b[3], 0xef);
  s = String.fromCharCode(0xaaee);
  b.write(s, 0, 'latin1');
  assert.strictEqual(b[0], 0xee);
  assert.strictEqual(b[1], 0xad);
  assert.strictEqual(b[2], 0xbe);
  assert.strictEqual(b[3], 0xef);
}
Binary encoding should write only one byte per character.

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] Binary encoding should write only one byte per character.',
  () => {
    expect(
      toU8(stringToBuffer(String.fromCharCode(0xffff), 'binary')),
    ).to.deep.equal(new Uint8Array([0xff]));
    expect(
      toU8(stringToBuffer(String.fromCharCode(0xaaee), 'binary')),
    ).to.deep.equal(new Uint8Array([0xee]));
  },
);

Original Node.js (test/parallel/test-buffer-alloc.js):

{
  // Binary encoding should write only one byte per character.
  const b = Buffer.from([0xde, 0xad, 0xbe, 0xef]);
  let s = String.fromCharCode(0xffff);
  b.write(s, 0, 'latin1');
  assert.strictEqual(b[0], 0xff);
  assert.strictEqual(b[1], 0xad);
  assert.strictEqual(b[2], 0xbe);
  assert.strictEqual(b[3], 0xef);
  s = String.fromCharCode(0xaaee);
  b.write(s, 0, 'latin1');
  assert.strictEqual(b[0], 0xee);
  assert.strictEqual(b[1], 0xad);
  assert.strictEqual(b[2], 0xbe);
  assert.strictEqual(b[3], 0xef);
}
ASCII conversion in node.js simply masks off the high bits, it doesn't do transliteration.

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] ASCII conversion in node.js simply masks off the high bits, it doesn\'t do transliteration.',
  () => {
    expect(
      bufferToString(stringToBuffer('h\u00e9rit\u00e9', 'utf8'), 'ascii'),
    ).to.equal('hC)ritC)');
  },
);

Original Node.js (test/parallel/test-buffer-ascii.js):

// ASCII conversion in node.js simply masks off the high bits,
// it doesn't do transliteration.
assert.strictEqual(Buffer.from('hérité').toString('ascii'), 'hC)ritC)');
Test ASCII decoding of UTF-8 multibyte characters at every byte offset.

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] Test ASCII decoding of UTF-8 multibyte characters at every byte offset.',
  () => {
    const input =
      'C\u2019est, graphiquement, la r\u00e9union d\u2019un accent aigu ' +
      'et d\u2019un accent grave.';

    const expected =
      'Cb\u0000\u0019est, graphiquement, la rC)union ' +
      'db\u0000\u0019un accent aigu et db\u0000\u0019un ' +
      'accent grave.';

    const bytes = toU8(stringToBuffer(input, 'utf8'));

    for (let i = 0; i < expected.length; ++i) {
      const slice = bytes.slice(i);
      expect(bufferToString(slice.buffer as ArrayBuffer, 'ascii')).to.equal(
        expected.slice(i),
      );
    }
  },
);

Original Node.js (test/parallel/test-buffer-ascii.js):

// 71 characters, 78 bytes. The ’ character is a triple-byte sequence.
const input = 'C’est, graphiquement, la réunion d’un accent aigu ' +
              'et d’un accent grave.';

const expected = 'Cb\u0000\u0019est, graphiquement, la rC)union ' +
                 'db\u0000\u0019un accent aigu et db\u0000\u0019un ' +
                 'accent grave.';

const buf = Buffer.from(input);

for (let i = 0; i < expected.length; ++i) {
  assert.strictEqual(buf.slice(i).toString('ascii'), expected.slice(i));

  // Skip remainder of multi-byte sequence.
  if (input.charCodeAt(i) > 65535) ++i;
  if (input.charCodeAt(i) > 127) ++i;
}

@wh201906
Copy link
Copy Markdown
Contributor Author

wh201906 commented Apr 18, 2026

There are 4 Node.js test cases failed. I have changed the code and removed the incompatible test cases to make it work in 73e75c1. You can also revert them if you really wish.


Test single hex character is discarded.

Current encoding_tests.ts:

test(SUITE, '[Node.js] Test single hex character is discarded.', () => {
  expect(toU8(stringToBuffer('A', 'hex'))).to.deep.equal(new Uint8Array([]));
});

Original Node.js (test/parallel/test-buffer-alloc.js):

// Test single hex character is discarded.
assert.strictEqual(Buffer.from('A', 'hex').length, 0);
Test that if a trailing character is discarded, rest of string is processed.

Current encoding_tests.ts:

test(
  SUITE,
  '[Node.js] Test that if a trailing character is discarded, rest of string is processed.',
  () => {
    expect(toU8(stringToBuffer('Abx', 'hex'))).to.deep.equal(
      new Uint8Array([0xab]),
    );
    expect(toU8(stringToBuffer('abc', 'hex'))).to.deep.equal(
      new Uint8Array([0xab]),
    );
  },
);

Original Node.js (test/parallel/test-buffer-alloc.js):

// Test that if a trailing character is discarded, rest of string is processed.
assert.deepStrictEqual(Buffer.from('Abx', 'hex'), Buffer.from('Ab', 'hex'));
Test hex strings and bad hex strings

Current encoding_tests.ts:

test(SUITE, '[Node.js] Test hex strings and bad hex strings', () => {
  expect(toU8(stringToBuffer('abcdxx', 'hex'))).to.deep.equal(
    new Uint8Array([0xab, 0xcd]),
  );
  expect(toU8(stringToBuffer('xxabcd', 'hex'))).to.deep.equal(
    new Uint8Array([]),
  );
  expect(toU8(stringToBuffer('cdxxab', 'hex'))).to.deep.equal(
    new Uint8Array([0xcd]),
  );

  const bytes = new Uint8Array(256);
  for (let i = 0; i < 256; i++) {
    bytes[i] = i;
  }

  const hex = bufferToString(bytes.buffer as ArrayBuffer, 'hex');
  const badHex = `${hex.slice(0, 256)}xx${hex.slice(256, 510)}`;
  expect(toU8(stringToBuffer(badHex, 'hex'))).to.deep.equal(
    bytes.slice(0, 128),
  );
});

Original Node.js (test/parallel/test-buffer-badhex.js):

// Test hex strings and bad hex strings
{
  const buf = Buffer.alloc(4);
  assert.strictEqual(buf.length, 4);
  assert.deepStrictEqual(buf, Buffer.from([0, 0, 0, 0]));
  assert.strictEqual(buf.write('abcdxx', 0, 'hex'), 2);
  assert.deepStrictEqual(buf, Buffer.from([0xab, 0xcd, 0x00, 0x00]));
  assert.strictEqual(buf.toString('hex'), 'abcd0000');
  assert.strictEqual(buf.write('abcdef01', 0, 'hex'), 4);
  assert.deepStrictEqual(buf, Buffer.from([0xab, 0xcd, 0xef, 0x01]));
  assert.strictEqual(buf.toString('hex'), 'abcdef01');

  const copy = Buffer.from(buf.toString('hex'), 'hex');
  assert.strictEqual(buf.toString('hex'), copy.toString('hex'));
}

{
  const buf = Buffer.alloc(5);
  assert.strictEqual(buf.write('abcdxx', 1, 'hex'), 2);
  assert.strictEqual(buf.toString('hex'), '00abcd0000');
}

{
  const buf = Buffer.alloc(4);
  assert.deepStrictEqual(buf, Buffer.from([0, 0, 0, 0]));
  assert.strictEqual(buf.write('xxabcd', 0, 'hex'), 0);
  assert.deepStrictEqual(buf, Buffer.from([0, 0, 0, 0]));
  assert.strictEqual(buf.write('xxab', 1, 'hex'), 0);
  assert.deepStrictEqual(buf, Buffer.from([0, 0, 0, 0]));
  assert.strictEqual(buf.write('cdxxab', 0, 'hex'), 1);
  assert.deepStrictEqual(buf, Buffer.from([0xcd, 0, 0, 0]));
}

{
  const buf = Buffer.alloc(256);
  for (let i = 0; i < 256; i++)
    buf[i] = i;

  const hex = buf.toString('hex');
  assert.deepStrictEqual(Buffer.from(hex, 'hex'), buf);

  const badHex = `${hex.slice(0, 256)}xx${hex.slice(256, 510)}`;
  assert.deepStrictEqual(Buffer.from(badHex, 'hex'), buf.slice(0, 128));
}
Test for proper ascii Encoding, length should be 4

Current encoding_tests.ts:

test(SUITE, '[Node.js] Test for proper ascii Encoding, length should be 4', () => {
  expect(toU8(stringToBuffer('\u00fcber', 'ascii'))).to.deep.equal(
    new Uint8Array([252, 98, 101, 114]),
  );
});

Original Node.js (test/parallel/test-buffer-alloc.js):

{
  // Test for proper ascii Encoding, length should be 4
  const f = Buffer.from('über', 'ascii');
  assert.deepStrictEqual(f, Buffer.from([252, 98, 101, 114]));
}

In summary, Node.js seems to accept more "invalid" hex strings than we think and try to produce any meaningful result from it.
As for ascii encoding/decoding, it only says adding the mask when encoding buffer to string
https://nodejs.org/docs/latest-v24.x/api/buffer.html#buffers-and-character-encodings
The code shows it doesn't mask every bytes when decoding string to buffer
https://github.com/nodejs/node/blob/v24.15.0/src/string_bytes.cc#L526-L539 (buffer to string, calls nbytes::ForceAscii())
https://github.com/nodejs/node/blob/v24.15.0/src/string_bytes.cc#L250-L265 (string to buffer, only copy/memcpy())

hex roundtrip all byte values
-> [Node.js] Test hex strings and bad hex strings

latin1 decode truncates code points above 0xFF to low byte
-> [Node.js] latin1 encoding should write only one byte per character.

ascii encode strips high bit
-> [Node.js] ASCII conversion in node.js simply masks off the high bits, it doesn't do transliteration.
+ [Node.js] Test ASCII decoding of UTF-8 multibyte characters at every byte offset.
@wh201906
Copy link
Copy Markdown
Contributor Author

I also removed the duplicate test cases introduced in 73e75c1 (d7b1ee8). The ones from Node.js have covered them.
I also removed some cases in main branch (e9e8b9a), which are also covered by Node.js test cases.

@wh201906
Copy link
Copy Markdown
Contributor Author

@boorad Hi. Could you please help me to review these commits? I think other commits should be fine.

73e75c1: While this makes our native converter matches the behavior of Node.js, it is a breaking change compared with RNQC v1.0.17
e9e8b9a: These removed test cases should be covered by Node.js test cases. Since I'm removing the old ones it's good to confirm them.

If they are all fine, we only need to bump the nitro to 0.31.2. I prefer to do it in a separated PR.

@boorad
Copy link
Copy Markdown
Collaborator

boorad commented Apr 18, 2026

@wh201906 These commits look good to me. I'm thinking this breaking change needs documenting, and probably leads to a minor version bump to v1.1.0

Compared to v1.0.17:

  1. stringToBuffer('abc', 'hex') — used to throw, now returns 1 byte [0xab]
  2. stringToBuffer('zzzz', 'hex') — used to throw, now returns empty
  3. stringToBuffer('\u00fcber', 'ascii') — used to be [0x7C, 0x62, 0x65, 0x72], now [0xFC, 0x62, 0x65, 0x72]

Anyone relying on throws from hex validation as input validation will now silently get partial/empty results. Anyone who expected ASCII to sanitize high-bit chars will now get latin1-like output.

I'm good with the Nitro bump in another PR 👍

@boorad boorad merged commit 7b9c096 into margelo:main Apr 18, 2026
7 checks passed
@wh201906 wh201906 deleted the wh201906/simdutf branch April 19, 2026 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants