Use simdutf for faster string encoding/decoding#976
Conversation
simdutf.cpp: From https://github.com/simdutf/simdutf/releases/download/v8.2.0/simdutf.cpp simdutf.h: From https://github.com/simdutf/simdutf/releases/download/v8.2.0/simdutf.h LICENSE-MIT: From https://raw.githubusercontent.com/simdutf/simdutf/v8.2.0/LICENSE-MIT
Specify simdutf::base64_default in encodeBase64() explicitly
simdutf supports hybird decoding format
Alighed with Buffer.from() behavior in Node
|
There are some ongoing optimizations in simdutf, which can also be utilized in RNQC once implemented. |
169f380 to
28487d7
Compare
|
I mean... let's find a better way to download a pre-built library. I'm not adding 82K LOC to the repo. |
|
I've added simdutf as a submodule and modified the CMakeLists.txt for Android and QuickCrypto.podspec for iOS. I tested the builds for the two mobile platforms and both of them passed. |
|
Claude review 😇 The Good
|
I checked the code of nitro and found it is available in v0.31.2. We only need to bump nitro to that version. Alternatively, considering the implementation of Which way do you prefer? |
This involves running a Python script from simdutf when building RNQC, I'm not sure if it's a good idea to require Python as a dev dependency with extra building steps, considering how small the binary size we can save. |
I can add more test cases of |
This is more like a fast path. If the input length is 0, no need to calculate the output length and enter the conversion. |
|
I also prefer this rather than bumping to the latest. ❤compatibility
I'm gonna add some test cases from Node.js
|
I'm trying to build it without the CMakeLists.txt from simdutf. If it works we don't need to bump the CMake required version. |
This is fixed in 96021e6 and I have reverted the change of CMake minimum version
I extracted 23 related test cases from Node.js v24.15.0 (latest LTS version). The 19 passed test cases are added in ddc0d08 Test toString('base64')Current test(SUITE, '[Node.js] Test toString(\'base64\')', () => {
expect(bufferToString(stringToBuffer('Man', 'utf8'), 'base64')).to.equal(
'TWFu',
);
expect(bufferToString(stringToBuffer('Woman', 'utf8'), 'base64')).to.equal(
'V29tYW4=',
);
});Original Node.js ( //
// Test toString('base64')
//
assert.strictEqual((Buffer.from('Man')).toString('base64'), 'TWFu');
assert.strictEqual((Buffer.from('Woman')).toString('base64'), 'V29tYW4=');Test that regular and URL-safe base64 both work both waysCurrent test(
SUITE,
'[Node.js] Test that regular and URL-safe base64 both work both ways',
() => {
const expected = new Uint8Array([
0xff, 0xff, 0xbe, 0xff, 0xef, 0xbf, 0xfb, 0xef, 0xff,
]);
expect(toU8(stringToBuffer('//++/++/++//', 'base64'))).to.deep.equal(
expected,
);
expect(toU8(stringToBuffer('__--_--_--__', 'base64'))).to.deep.equal(
expected,
);
expect(toU8(stringToBuffer('//++/++/++//', 'base64url'))).to.deep.equal(
expected,
);
expect(toU8(stringToBuffer('__--_--_--__', 'base64url'))).to.deep.equal(
expected,
);
},
);Original Node.js ( {
// Test that regular and URL-safe base64 both work both ways
const expected = [0xff, 0xff, 0xbe, 0xff, 0xef, 0xbf, 0xfb, 0xef, 0xff];
assert.deepStrictEqual(Buffer.from('//++/++/++//', 'base64'),
Buffer.from(expected));
assert.deepStrictEqual(Buffer.from('__--_--_--__', 'base64'),
Buffer.from(expected));
assert.deepStrictEqual(Buffer.from('//++/++/++//', 'base64url'),
Buffer.from(expected));
assert.deepStrictEqual(Buffer.from('__--_--_--__', 'base64url'),
Buffer.from(expected));
}Test that regular and URL-safe base64 both work both ways with paddingCurrent test(
SUITE,
'[Node.js] Test that regular and URL-safe base64 both work both ways with padding',
() => {
const expected = new Uint8Array([
0xff, 0xff, 0xbe, 0xff, 0xef, 0xbf, 0xfb, 0xef, 0xff, 0xfb,
]);
expect(toU8(stringToBuffer('//++/++/++//+w==', 'base64'))).to.deep.equal(
expected,
);
expect(toU8(stringToBuffer('//++/++/++//+w==', 'base64url'))).to.deep.equal(
expected,
);
},
);Original Node.js ( {
// Test that regular and URL-safe base64 both work both ways with padding
const expected = [0xff, 0xff, 0xbe, 0xff, 0xef, 0xbf, 0xfb, 0xef, 0xff, 0xfb];
assert.deepStrictEqual(Buffer.from('//++/++/++//+w==', 'base64'),
Buffer.from(expected));
assert.deepStrictEqual(Buffer.from('//++/++/++//+w==', 'base64'),
Buffer.from(expected));
assert.deepStrictEqual(Buffer.from('//++/++/++//+w==', 'base64url'),
Buffer.from(expected));
assert.deepStrictEqual(Buffer.from('//++/++/++//+w==', 'base64url'),
Buffer.from(expected));
}Check that the base64 decoder ignores whitespaceCurrent test(SUITE, '[Node.js] Check that the base64 decoder ignores whitespace', () => {
const quote =
'Man is distinguished, not only by his reason, but by this ' +
'singular passion from other animals, which is a lust ' +
'of the mind, that by a perseverance of delight in the ' +
'continued and indefatigable generation of knowledge, ' +
'exceeds the short vehemence of any carnal pleasure.';
const expected =
'TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBi' +
'eSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBp' +
'cyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVs' +
'aWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24g' +
'b2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNh' +
'cm5hbCBwbGVhc3VyZS4=';
const base64flavors = ['base64', 'base64url'] as const;
base64flavors.forEach(encoding => {
const expectedWhite =
`${expected.slice(0, 60)} \n` +
`${expected.slice(60, 120)} \n` +
`${expected.slice(120, 180)} \n` +
`${expected.slice(180, 240)} \n` +
`${expected.slice(240, 300)}\n` +
`${expected.slice(300, 360)}\n`;
const decoded = bufferToString(stringToBuffer(expectedWhite, encoding), 'utf8');
expect(decoded).to.equal(quote);
});
});Original Node.js ( {
// big example
const quote = 'Man is distinguished, not only by his reason, but by this ' +
'singular passion from other animals, which is a lust ' +
'of the mind, that by a perseverance of delight in the ' +
'continued and indefatigable generation of knowledge, ' +
'exceeds the short vehemence of any carnal pleasure.';
const expected = 'TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb' +
'24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlci' +
'BhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQ' +
'gYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu' +
'dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZ' +
'GdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm' +
'5hbCBwbGVhc3VyZS4=';
assert.strictEqual(Buffer.from(quote).toString('base64'), expected);
assert.strictEqual(
Buffer.from(quote).toString('base64url'),
expected.replaceAll('+', '-').replaceAll('/', '_').replaceAll('=', '')
);
base64flavors.forEach((encoding) => {
let b = Buffer.allocUnsafe(1024);
let bytesWritten = b.write(expected, 0, encoding);
assert.strictEqual(quote.length, bytesWritten);
assert.strictEqual(quote, b.toString('ascii', 0, quote.length));
// Check that the base64 decoder ignores whitespace
const expectedWhite = `${expected.slice(0, 60)} \n` +
`${expected.slice(60, 120)} \n` +
`${expected.slice(120, 180)} \n` +
`${expected.slice(180, 240)} \n` +
`${expected.slice(240, 300)}\n` +
`${expected.slice(300, 360)}\n`;
b = Buffer.allocUnsafe(1024);
bytesWritten = b.write(expectedWhite, 0, encoding);
assert.strictEqual(quote.length, bytesWritten);
assert.strictEqual(quote, b.toString('ascii', 0, quote.length));
// Check that the base64 decoder on the constructor works
// even in the presence of whitespace.
b = Buffer.from(expectedWhite, encoding);
assert.strictEqual(quote.length, b.length);
assert.strictEqual(quote, b.toString('ascii', 0, quote.length));
});
}Check that the base64 decoder ignores illegal charsCurrent test(
SUITE,
'[Node.js] Check that the base64 decoder ignores illegal chars',
() => {
const quote =
'Man is distinguished, not only by his reason, but by this ' +
'singular passion from other animals, which is a lust ' +
'of the mind, that by a perseverance of delight in the ' +
'continued and indefatigable generation of knowledge, ' +
'exceeds the short vehemence of any carnal pleasure.';
const expected =
'TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBi' +
'eSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBp' +
'cyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVs' +
'aWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24g' +
'b2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNh' +
'cm5hbCBwbGVhc3VyZS4=';
const base64flavors = ['base64', 'base64url'] as const;
base64flavors.forEach(encoding => {
const expectedIllegal =
expected.slice(0, 60) +
' \x80' +
expected.slice(60, 120) +
' \xff' +
expected.slice(120, 180) +
' \x00' +
expected.slice(180, 240) +
' \x98' +
expected.slice(240, 300) +
'\x03' +
expected.slice(300, 360);
const decoded = bufferToString(
stringToBuffer(expectedIllegal, encoding),
'utf8',
);
expect(decoded).to.equal(quote);
});
},
);Original Node.js ( {
// big example
const quote = 'Man is distinguished, not only by his reason, but by this ' +
'singular passion from other animals, which is a lust ' +
'of the mind, that by a perseverance of delight in the ' +
'continued and indefatigable generation of knowledge, ' +
'exceeds the short vehemence of any carnal pleasure.';
const expected = 'TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb' +
'24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlci' +
'BhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQ' +
'gYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu' +
'dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZ' +
'GdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm' +
'5hbCBwbGVhc3VyZS4=';
assert.strictEqual(Buffer.from(quote).toString('base64'), expected);
assert.strictEqual(
Buffer.from(quote).toString('base64url'),
expected.replaceAll('+', '-').replaceAll('/', '_').replaceAll('=', '')
);
base64flavors.forEach((encoding) => {
// Check that the base64 decoder ignores illegal chars
const expectedIllegal = expected.slice(0, 60) + ' \x80' +
expected.slice(60, 120) + ' \xff' +
expected.slice(120, 180) + ' \x00' +
expected.slice(180, 240) + ' \x98' +
expected.slice(240, 300) + '\x03' +
expected.slice(300, 360);
b = Buffer.from(expectedIllegal, encoding);
assert.strictEqual(quote.length, b.length);
assert.strictEqual(quote, b.toString('ascii', 0, quote.length));
});
}Handle padding graciously, multiple-of-4 or notCurrent test(
SUITE,
'[Node.js] Handle padding graciously, multiple-of-4 or not',
() => {
const base64flavors = ['base64', 'base64url'] as const;
base64flavors.forEach(encoding => {
expect(bufferToString(stringToBuffer('', encoding), 'utf8')).to.equal('');
expect(bufferToString(stringToBuffer('K', encoding), 'utf8')).to.equal('');
expect(bufferToString(stringToBuffer('Kg==', encoding), 'utf8')).to.equal(
'*',
);
expect(bufferToString(stringToBuffer('Kio=', encoding), 'utf8')).to.equal(
'*'.repeat(2),
);
expect(bufferToString(stringToBuffer('Kioq', encoding), 'utf8')).to.equal(
'*'.repeat(3),
);
expect(
bufferToString(stringToBuffer('KioqKg==', encoding), 'utf8'),
).to.equal('*'.repeat(4));
expect(
bufferToString(stringToBuffer('KioqKio=', encoding), 'utf8'),
).to.equal('*'.repeat(5));
expect(
bufferToString(stringToBuffer('KioqKioq', encoding), 'utf8'),
).to.equal('*'.repeat(6));
expect(
bufferToString(stringToBuffer('KioqKioqKg==', encoding), 'utf8'),
).to.equal('*'.repeat(7));
expect(
bufferToString(stringToBuffer('KioqKioqKio=', encoding), 'utf8'),
).to.equal('*'.repeat(8));
expect(
bufferToString(stringToBuffer('KioqKioqKioq', encoding), 'utf8'),
).to.equal('*'.repeat(9));
expect(
bufferToString(stringToBuffer('KioqKioqKioqKg==', encoding), 'utf8'),
).to.equal('*'.repeat(10));
expect(
bufferToString(stringToBuffer('KioqKioqKioqKio=', encoding), 'utf8'),
).to.equal('*'.repeat(11));
expect(
bufferToString(stringToBuffer('KioqKioqKioqKioq', encoding), 'utf8'),
).to.equal('*'.repeat(12));
expect(
bufferToString(stringToBuffer('KioqKioqKioqKioqKg==', encoding), 'utf8'),
).to.equal('*'.repeat(13));
expect(
bufferToString(stringToBuffer('KioqKioqKioqKioqKio=', encoding), 'utf8'),
).to.equal('*'.repeat(14));
expect(
bufferToString(stringToBuffer('KioqKioqKioqKioqKioq', encoding), 'utf8'),
).to.equal('*'.repeat(15));
expect(
bufferToString(
stringToBuffer('KioqKioqKioqKioqKioqKg==', encoding),
'utf8',
),
).to.equal('*'.repeat(16));
expect(
bufferToString(
stringToBuffer('KioqKioqKioqKioqKioqKio=', encoding),
'utf8',
),
).to.equal('*'.repeat(17));
expect(
bufferToString(
stringToBuffer('KioqKioqKioqKioqKioqKioq', encoding),
'utf8',
),
).to.equal('*'.repeat(18));
expect(
bufferToString(
stringToBuffer('KioqKioqKioqKioqKioqKioqKg==', encoding),
'utf8',
),
).to.equal('*'.repeat(19));
expect(
bufferToString(
stringToBuffer('KioqKioqKioqKioqKioqKioqKio=', encoding),
'utf8',
),
).to.equal('*'.repeat(20));
expect(bufferToString(stringToBuffer('Kg', encoding), 'utf8')).to.equal(
'*',
);
expect(bufferToString(stringToBuffer('Kio', encoding), 'utf8')).to.equal(
'*'.repeat(2),
);
expect(
bufferToString(stringToBuffer('KioqKg', encoding), 'utf8'),
).to.equal('*'.repeat(4));
expect(
bufferToString(stringToBuffer('KioqKio', encoding), 'utf8'),
).to.equal('*'.repeat(5));
expect(
bufferToString(stringToBuffer('KioqKioqKg', encoding), 'utf8'),
).to.equal('*'.repeat(7));
expect(
bufferToString(stringToBuffer('KioqKioqKio', encoding), 'utf8'),
).to.equal('*'.repeat(8));
expect(
bufferToString(stringToBuffer('KioqKioqKioqKg', encoding), 'utf8'),
).to.equal('*'.repeat(10));
expect(
bufferToString(stringToBuffer('KioqKioqKioqKio', encoding), 'utf8'),
).to.equal('*'.repeat(11));
expect(
bufferToString(stringToBuffer('KioqKioqKioqKioqKg', encoding), 'utf8'),
).to.equal('*'.repeat(13));
expect(
bufferToString(stringToBuffer('KioqKioqKioqKioqKio', encoding), 'utf8'),
).to.equal('*'.repeat(14));
expect(
bufferToString(
stringToBuffer('KioqKioqKioqKioqKioqKg', encoding),
'utf8',
),
).to.equal('*'.repeat(16));
expect(
bufferToString(
stringToBuffer('KioqKioqKioqKioqKioqKio', encoding),
'utf8',
),
).to.equal('*'.repeat(17));
expect(
bufferToString(
stringToBuffer('KioqKioqKioqKioqKioqKioqKg', encoding),
'utf8',
),
).to.equal('*'.repeat(19));
expect(
bufferToString(
stringToBuffer('KioqKioqKioqKioqKioqKioqKio', encoding),
'utf8',
),
).to.equal('*'.repeat(20));
});
expect(
stringToBuffer(
'72INjkR5fchcxk9+VgdGPFJDxUBFR5/rMFsghgxADiw==',
'base64',
).byteLength,
).to.equal(32);
expect(
stringToBuffer(
'72INjkR5fchcxk9-VgdGPFJDxUBFR5_rMFsghgxADiw==',
'base64url',
).byteLength,
).to.equal(32);
expect(
stringToBuffer(
'72INjkR5fchcxk9+VgdGPFJDxUBFR5/rMFsghgxADiw=',
'base64',
).byteLength,
).to.equal(32);
expect(
stringToBuffer(
'72INjkR5fchcxk9-VgdGPFJDxUBFR5_rMFsghgxADiw=',
'base64url',
).byteLength,
).to.equal(32);
expect(
stringToBuffer(
'72INjkR5fchcxk9+VgdGPFJDxUBFR5/rMFsghgxADiw',
'base64',
).byteLength,
).to.equal(32);
expect(
stringToBuffer(
'72INjkR5fchcxk9-VgdGPFJDxUBFR5_rMFsghgxADiw',
'base64url',
).byteLength,
).to.equal(32);
expect(
stringToBuffer(
'w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg==',
'base64',
).byteLength,
).to.equal(31);
expect(
stringToBuffer(
'w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg==',
'base64url',
).byteLength,
).to.equal(31);
expect(
stringToBuffer(
'w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg=',
'base64',
).byteLength,
).to.equal(31);
expect(
stringToBuffer(
'w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg=',
'base64url',
).byteLength,
).to.equal(31);
expect(
stringToBuffer(
'w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg',
'base64',
).byteLength,
).to.equal(31);
expect(
stringToBuffer(
'w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg',
'base64url',
).byteLength,
).to.equal(31);
},
);Original Node.js ( const base64flavors = ['base64', 'base64url'];
base64flavors.forEach((encoding) => {
assert.strictEqual(Buffer.from('', encoding).toString(), '');
assert.strictEqual(Buffer.from('K', encoding).toString(), '');
// multiple-of-4 with padding
assert.strictEqual(Buffer.from('Kg==', encoding).toString(), '*');
assert.strictEqual(Buffer.from('Kio=', encoding).toString(), '*'.repeat(2));
assert.strictEqual(Buffer.from('Kioq', encoding).toString(), '*'.repeat(3));
assert.strictEqual(
Buffer.from('KioqKg==', encoding).toString(), '*'.repeat(4));
assert.strictEqual(
Buffer.from('KioqKio=', encoding).toString(), '*'.repeat(5));
assert.strictEqual(
Buffer.from('KioqKioq', encoding).toString(), '*'.repeat(6));
assert.strictEqual(Buffer.from('KioqKioqKg==', encoding).toString(),
'*'.repeat(7));
assert.strictEqual(Buffer.from('KioqKioqKio=', encoding).toString(),
'*'.repeat(8));
assert.strictEqual(Buffer.from('KioqKioqKioq', encoding).toString(),
'*'.repeat(9));
assert.strictEqual(Buffer.from('KioqKioqKioqKg==', encoding).toString(),
'*'.repeat(10));
assert.strictEqual(Buffer.from('KioqKioqKioqKio=', encoding).toString(),
'*'.repeat(11));
assert.strictEqual(Buffer.from('KioqKioqKioqKioq', encoding).toString(),
'*'.repeat(12));
assert.strictEqual(Buffer.from('KioqKioqKioqKioqKg==', encoding).toString(),
'*'.repeat(13));
assert.strictEqual(Buffer.from('KioqKioqKioqKioqKio=', encoding).toString(),
'*'.repeat(14));
assert.strictEqual(Buffer.from('KioqKioqKioqKioqKioq', encoding).toString(),
'*'.repeat(15));
assert.strictEqual(
Buffer.from('KioqKioqKioqKioqKioqKg==', encoding).toString(),
'*'.repeat(16));
assert.strictEqual(
Buffer.from('KioqKioqKioqKioqKioqKio=', encoding).toString(),
'*'.repeat(17));
assert.strictEqual(
Buffer.from('KioqKioqKioqKioqKioqKioq', encoding).toString(),
'*'.repeat(18));
assert.strictEqual(Buffer.from('KioqKioqKioqKioqKioqKioqKg==',
encoding).toString(),
'*'.repeat(19));
assert.strictEqual(Buffer.from('KioqKioqKioqKioqKioqKioqKio=',
encoding).toString(),
'*'.repeat(20));
// No padding, not a multiple of 4
assert.strictEqual(Buffer.from('Kg', encoding).toString(), '*');
assert.strictEqual(Buffer.from('Kio', encoding).toString(), '*'.repeat(2));
assert.strictEqual(Buffer.from('KioqKg', encoding).toString(), '*'.repeat(4));
assert.strictEqual(
Buffer.from('KioqKio', encoding).toString(), '*'.repeat(5));
assert.strictEqual(Buffer.from('KioqKioqKg', encoding).toString(),
'*'.repeat(7));
assert.strictEqual(Buffer.from('KioqKioqKio', encoding).toString(),
'*'.repeat(8));
assert.strictEqual(Buffer.from('KioqKioqKioqKg', encoding).toString(),
'*'.repeat(10));
assert.strictEqual(Buffer.from('KioqKioqKioqKio', encoding).toString(),
'*'.repeat(11));
assert.strictEqual(Buffer.from('KioqKioqKioqKioqKg', encoding).toString(),
'*'.repeat(13));
assert.strictEqual(Buffer.from('KioqKioqKioqKioqKio', encoding).toString(),
'*'.repeat(14));
assert.strictEqual(Buffer.from('KioqKioqKioqKioqKioqKg', encoding).toString(),
'*'.repeat(16));
assert.strictEqual(
Buffer.from('KioqKioqKioqKioqKioqKio', encoding).toString(),
'*'.repeat(17));
assert.strictEqual(
Buffer.from('KioqKioqKioqKioqKioqKioqKg', encoding).toString(),
'*'.repeat(19));
assert.strictEqual(
Buffer.from('KioqKioqKioqKioqKioqKioqKio', encoding).toString(),
'*'.repeat(20));
});
// Handle padding graciously, multiple-of-4 or not
assert.strictEqual(
Buffer.from('72INjkR5fchcxk9+VgdGPFJDxUBFR5/rMFsghgxADiw==', 'base64').length,
32
);
assert.strictEqual(
Buffer.from('72INjkR5fchcxk9-VgdGPFJDxUBFR5_rMFsghgxADiw==', 'base64url')
.length,
32
);
assert.strictEqual(
Buffer.from('72INjkR5fchcxk9+VgdGPFJDxUBFR5/rMFsghgxADiw=', 'base64').length,
32
);
assert.strictEqual(
Buffer.from('72INjkR5fchcxk9-VgdGPFJDxUBFR5_rMFsghgxADiw=', 'base64url')
.length,
32
);
assert.strictEqual(
Buffer.from('72INjkR5fchcxk9+VgdGPFJDxUBFR5/rMFsghgxADiw', 'base64').length,
32
);
assert.strictEqual(
Buffer.from('72INjkR5fchcxk9-VgdGPFJDxUBFR5_rMFsghgxADiw', 'base64url')
.length,
32
);
assert.strictEqual(
Buffer.from('w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg==', 'base64').length,
31
);
assert.strictEqual(
Buffer.from('w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg==', 'base64url')
.length,
31
);
assert.strictEqual(
Buffer.from('w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg=', 'base64').length,
31
);
assert.strictEqual(
Buffer.from('w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg=', 'base64url')
.length,
31
);
assert.strictEqual(
Buffer.from('w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg', 'base64').length,
31
);
assert.strictEqual(
Buffer.from('w69jACy6BgZmaFvv96HG6MYksWytuZu3T1FvGnulPg', 'base64url').length,
31
);Test single base64 char encodes as 0.Current test(SUITE, '[Node.js] Test single base64 char encodes as 0.', () => {
expect(toU8(stringToBuffer('A', 'base64'))).to.deep.equal(new Uint8Array([]));
});Original Node.js ( // Test single base64 char encodes as 0.
assert.strictEqual(Buffer.from('A', 'base64').length, 0);Return empty output for invalid base64 with repeated leading padding (nodejs/node#3496)Current test(
SUITE,
'[Node.js] Return empty output for invalid base64 with repeated leading padding (nodejs/node#3496)',
() => {
expect(toU8(stringToBuffer('=bad'.repeat(1e4), 'base64'))).to.deep.equal(
new Uint8Array([]),
);
},
);Original Node.js ( // Regression test for https://github.com/nodejs/node/issues/3496.
assert.strictEqual(Buffer.from('=bad'.repeat(1e4), 'base64').length, 0);Ignore trailing whitespace in base64 input (nodejs/node#11987)Current test(
SUITE,
'[Node.js] Ignore trailing whitespace in base64 input (nodejs/node#11987)',
() => {
expect(toU8(stringToBuffer('w0 ', 'base64'))).to.deep.equal(
toU8(stringToBuffer('w0', 'base64')),
);
},
);Original Node.js ( // Regression test for https://github.com/nodejs/node/issues/11987.
assert.deepStrictEqual(Buffer.from('w0 ', 'base64'),
Buffer.from('w0', 'base64'));Ignore leading whitespace in base64 input (nodejs/node#13657)Current test(
SUITE,
'[Node.js] Ignore leading whitespace in base64 input (nodejs/node#13657)',
() => {
expect(toU8(stringToBuffer(' YWJvcnVtLg', 'base64'))).to.deep.equal(
toU8(stringToBuffer('YWJvcnVtLg', 'base64')),
);
},
);Original Node.js ( // Regression test for https://github.com/nodejs/node/issues/13657.
assert.deepStrictEqual(Buffer.from(' YWJvcnVtLg', 'base64'),
Buffer.from('YWJvcnVtLg', 'base64'));Test toString('base64url')Current test(SUITE, '[Node.js] Test toString(\'base64url\')', () => {
expect(bufferToString(stringToBuffer('Man', 'utf8'), 'base64url')).to.equal(
'TWFu',
);
expect(
bufferToString(stringToBuffer('Woman', 'utf8'), 'base64url'),
).to.equal('V29tYW4');
});Original Node.js ( //
// Test toString('base64url')
//
assert.strictEqual((Buffer.from('Man')).toString('base64url'), 'TWFu');
assert.strictEqual((Buffer.from('Woman')).toString('base64url'), 'V29tYW4');This string encodes single '.' character in UTF-16Current test(
SUITE,
'[Node.js] This string encodes single \'.\' character in UTF-16',
() => {
const dot = new Uint8Array([0xff, 0xfe, 0x2e, 0x00]).buffer as ArrayBuffer;
expect(bufferToString(dot, 'base64')).to.equal('//4uAA==');
expect(bufferToString(dot, 'base64url')).to.equal('__4uAA');
},
);Original Node.js ( {
// This string encodes single '.' character in UTF-16
const dot = Buffer.from('//4uAA==', 'base64');
assert.strictEqual(dot[0], 0xff);
assert.strictEqual(dot[1], 0xfe);
assert.strictEqual(dot[2], 0x2e);
assert.strictEqual(dot[3], 0x00);
assert.strictEqual(dot.toString('base64'), '//4uAA==');
}
{
// This string encodes single '.' character in UTF-16
const dot = Buffer.from('//4uAA', 'base64url');
assert.strictEqual(dot[0], 0xff);
assert.strictEqual(dot[1], 0xfe);
assert.strictEqual(dot[2], 0x2e);
assert.strictEqual(dot[3], 0x00);
assert.strictEqual(dot.toString('base64url'), '__4uAA');
}Test for proper UTF-8 EncodingCurrent test(SUITE, '[Node.js] Test for proper UTF-8 Encoding', () => {
expect(toU8(stringToBuffer('\u00fcber', 'utf8'))).to.deep.equal(
new Uint8Array([195, 188, 98, 101, 114]),
);
});Original Node.js ( {
// Test for proper UTF-8 Encoding
const e = Buffer.from('über');
assert.deepStrictEqual(e, Buffer.from([195, 188, 98, 101, 114]));
}Test UTF-8 string includes null characterCurrent test(SUITE, '[Node.js] Test UTF-8 string includes null character', () => {
expect(toU8(stringToBuffer('\0', 'utf8'))).to.deep.equal(
new Uint8Array([0x00]),
);
expect(toU8(stringToBuffer('\0\0', 'utf8'))).to.deep.equal(
new Uint8Array([0x00, 0x00]),
);
});Original Node.js ( {
// https://github.com/nodejs/node-v0.x-archive/pull/1210
// Test UTF-8 string includes null character
let buf = Buffer.from('\0');
assert.strictEqual(buf.length, 1);
buf = Buffer.from('\0\0');
assert.strictEqual(buf.length, 2);
}Test unmatched surrogates not producing invalid utf8 outputCurrent test(
SUITE,
'[Node.js] Test unmatched surrogates not producing invalid utf8 output',
() => {
expect(toU8(stringToBuffer('ab\ud800cd', 'utf8'))).to.deep.equal(
new Uint8Array([0x61, 0x62, 0xef, 0xbf, 0xbd, 0x63, 0x64]),
);
},
);Original Node.js ( {
// Test unmatched surrogates not producing invalid utf8 output
// ef bf bd = utf-8 representation of unicode replacement character
// see https://codereview.chromium.org/121173009/
const buf = Buffer.from('ab\ud800cd', 'utf8');
assert.strictEqual(buf[0], 0x61);
assert.strictEqual(buf[1], 0x62);
assert.strictEqual(buf[2], 0xef);
assert.strictEqual(buf[3], 0xbf);
assert.strictEqual(buf[4], 0xbd);
assert.strictEqual(buf[5], 0x63);
assert.strictEqual(buf[6], 0x64);
}latin1 encoding should write only one byte per character.Current test(
SUITE,
'[Node.js] latin1 encoding should write only one byte per character.',
() => {
expect(
toU8(stringToBuffer(String.fromCharCode(0xffff), 'latin1')),
).to.deep.equal(new Uint8Array([0xff]));
expect(
toU8(stringToBuffer(String.fromCharCode(0xaaee), 'latin1')),
).to.deep.equal(new Uint8Array([0xee]));
},
);Original Node.js ( {
// latin1 encoding should write only one byte per character.
const b = Buffer.from([0xde, 0xad, 0xbe, 0xef]);
let s = String.fromCharCode(0xffff);
b.write(s, 0, 'latin1');
assert.strictEqual(b[0], 0xff);
assert.strictEqual(b[1], 0xad);
assert.strictEqual(b[2], 0xbe);
assert.strictEqual(b[3], 0xef);
s = String.fromCharCode(0xaaee);
b.write(s, 0, 'latin1');
assert.strictEqual(b[0], 0xee);
assert.strictEqual(b[1], 0xad);
assert.strictEqual(b[2], 0xbe);
assert.strictEqual(b[3], 0xef);
}Binary encoding should write only one byte per character.Current test(
SUITE,
'[Node.js] Binary encoding should write only one byte per character.',
() => {
expect(
toU8(stringToBuffer(String.fromCharCode(0xffff), 'binary')),
).to.deep.equal(new Uint8Array([0xff]));
expect(
toU8(stringToBuffer(String.fromCharCode(0xaaee), 'binary')),
).to.deep.equal(new Uint8Array([0xee]));
},
);Original Node.js ( {
// Binary encoding should write only one byte per character.
const b = Buffer.from([0xde, 0xad, 0xbe, 0xef]);
let s = String.fromCharCode(0xffff);
b.write(s, 0, 'latin1');
assert.strictEqual(b[0], 0xff);
assert.strictEqual(b[1], 0xad);
assert.strictEqual(b[2], 0xbe);
assert.strictEqual(b[3], 0xef);
s = String.fromCharCode(0xaaee);
b.write(s, 0, 'latin1');
assert.strictEqual(b[0], 0xee);
assert.strictEqual(b[1], 0xad);
assert.strictEqual(b[2], 0xbe);
assert.strictEqual(b[3], 0xef);
}ASCII conversion in node.js simply masks off the high bits, it doesn't do transliteration.Current test(
SUITE,
'[Node.js] ASCII conversion in node.js simply masks off the high bits, it doesn\'t do transliteration.',
() => {
expect(
bufferToString(stringToBuffer('h\u00e9rit\u00e9', 'utf8'), 'ascii'),
).to.equal('hC)ritC)');
},
);Original Node.js ( // ASCII conversion in node.js simply masks off the high bits,
// it doesn't do transliteration.
assert.strictEqual(Buffer.from('hérité').toString('ascii'), 'hC)ritC)');Test ASCII decoding of UTF-8 multibyte characters at every byte offset.Current test(
SUITE,
'[Node.js] Test ASCII decoding of UTF-8 multibyte characters at every byte offset.',
() => {
const input =
'C\u2019est, graphiquement, la r\u00e9union d\u2019un accent aigu ' +
'et d\u2019un accent grave.';
const expected =
'Cb\u0000\u0019est, graphiquement, la rC)union ' +
'db\u0000\u0019un accent aigu et db\u0000\u0019un ' +
'accent grave.';
const bytes = toU8(stringToBuffer(input, 'utf8'));
for (let i = 0; i < expected.length; ++i) {
const slice = bytes.slice(i);
expect(bufferToString(slice.buffer as ArrayBuffer, 'ascii')).to.equal(
expected.slice(i),
);
}
},
);Original Node.js ( // 71 characters, 78 bytes. The ’ character is a triple-byte sequence.
const input = 'C’est, graphiquement, la réunion d’un accent aigu ' +
'et d’un accent grave.';
const expected = 'Cb\u0000\u0019est, graphiquement, la rC)union ' +
'db\u0000\u0019un accent aigu et db\u0000\u0019un ' +
'accent grave.';
const buf = Buffer.from(input);
for (let i = 0; i < expected.length; ++i) {
assert.strictEqual(buf.slice(i).toString('ascii'), expected.slice(i));
// Skip remainder of multi-byte sequence.
if (input.charCodeAt(i) > 65535) ++i;
if (input.charCodeAt(i) > 127) ++i;
} |
|
There are 4 Node.js test cases failed. I have changed the code and removed the incompatible test cases to make it work in 73e75c1. You can also revert them if you really wish. Test single hex character is discarded.Current test(SUITE, '[Node.js] Test single hex character is discarded.', () => {
expect(toU8(stringToBuffer('A', 'hex'))).to.deep.equal(new Uint8Array([]));
});Original Node.js ( // Test single hex character is discarded.
assert.strictEqual(Buffer.from('A', 'hex').length, 0);Test that if a trailing character is discarded, rest of string is processed.Current test(
SUITE,
'[Node.js] Test that if a trailing character is discarded, rest of string is processed.',
() => {
expect(toU8(stringToBuffer('Abx', 'hex'))).to.deep.equal(
new Uint8Array([0xab]),
);
expect(toU8(stringToBuffer('abc', 'hex'))).to.deep.equal(
new Uint8Array([0xab]),
);
},
);Original Node.js ( // Test that if a trailing character is discarded, rest of string is processed.
assert.deepStrictEqual(Buffer.from('Abx', 'hex'), Buffer.from('Ab', 'hex'));Test hex strings and bad hex stringsCurrent test(SUITE, '[Node.js] Test hex strings and bad hex strings', () => {
expect(toU8(stringToBuffer('abcdxx', 'hex'))).to.deep.equal(
new Uint8Array([0xab, 0xcd]),
);
expect(toU8(stringToBuffer('xxabcd', 'hex'))).to.deep.equal(
new Uint8Array([]),
);
expect(toU8(stringToBuffer('cdxxab', 'hex'))).to.deep.equal(
new Uint8Array([0xcd]),
);
const bytes = new Uint8Array(256);
for (let i = 0; i < 256; i++) {
bytes[i] = i;
}
const hex = bufferToString(bytes.buffer as ArrayBuffer, 'hex');
const badHex = `${hex.slice(0, 256)}xx${hex.slice(256, 510)}`;
expect(toU8(stringToBuffer(badHex, 'hex'))).to.deep.equal(
bytes.slice(0, 128),
);
});Original Node.js ( // Test hex strings and bad hex strings
{
const buf = Buffer.alloc(4);
assert.strictEqual(buf.length, 4);
assert.deepStrictEqual(buf, Buffer.from([0, 0, 0, 0]));
assert.strictEqual(buf.write('abcdxx', 0, 'hex'), 2);
assert.deepStrictEqual(buf, Buffer.from([0xab, 0xcd, 0x00, 0x00]));
assert.strictEqual(buf.toString('hex'), 'abcd0000');
assert.strictEqual(buf.write('abcdef01', 0, 'hex'), 4);
assert.deepStrictEqual(buf, Buffer.from([0xab, 0xcd, 0xef, 0x01]));
assert.strictEqual(buf.toString('hex'), 'abcdef01');
const copy = Buffer.from(buf.toString('hex'), 'hex');
assert.strictEqual(buf.toString('hex'), copy.toString('hex'));
}
{
const buf = Buffer.alloc(5);
assert.strictEqual(buf.write('abcdxx', 1, 'hex'), 2);
assert.strictEqual(buf.toString('hex'), '00abcd0000');
}
{
const buf = Buffer.alloc(4);
assert.deepStrictEqual(buf, Buffer.from([0, 0, 0, 0]));
assert.strictEqual(buf.write('xxabcd', 0, 'hex'), 0);
assert.deepStrictEqual(buf, Buffer.from([0, 0, 0, 0]));
assert.strictEqual(buf.write('xxab', 1, 'hex'), 0);
assert.deepStrictEqual(buf, Buffer.from([0, 0, 0, 0]));
assert.strictEqual(buf.write('cdxxab', 0, 'hex'), 1);
assert.deepStrictEqual(buf, Buffer.from([0xcd, 0, 0, 0]));
}
{
const buf = Buffer.alloc(256);
for (let i = 0; i < 256; i++)
buf[i] = i;
const hex = buf.toString('hex');
assert.deepStrictEqual(Buffer.from(hex, 'hex'), buf);
const badHex = `${hex.slice(0, 256)}xx${hex.slice(256, 510)}`;
assert.deepStrictEqual(Buffer.from(badHex, 'hex'), buf.slice(0, 128));
}Test for proper ascii Encoding, length should be 4Current test(SUITE, '[Node.js] Test for proper ascii Encoding, length should be 4', () => {
expect(toU8(stringToBuffer('\u00fcber', 'ascii'))).to.deep.equal(
new Uint8Array([252, 98, 101, 114]),
);
});Original Node.js ( {
// Test for proper ascii Encoding, length should be 4
const f = Buffer.from('über', 'ascii');
assert.deepStrictEqual(f, Buffer.from([252, 98, 101, 114]));
}In summary, Node.js seems to accept more "invalid" hex strings than we think and try to produce any meaningful result from it. |
hex roundtrip all byte values -> [Node.js] Test hex strings and bad hex strings latin1 decode truncates code points above 0xFF to low byte -> [Node.js] latin1 encoding should write only one byte per character. ascii encode strips high bit -> [Node.js] ASCII conversion in node.js simply masks off the high bits, it doesn't do transliteration. + [Node.js] Test ASCII decoding of UTF-8 multibyte characters at every byte offset.
|
@boorad Hi. Could you please help me to review these commits? I think other commits should be fine. 73e75c1: While this makes our native converter matches the behavior of Node.js, it is a breaking change compared with RNQC v1.0.17 If they are all fine, we only need to bump the nitro to 0.31.2. I prefer to do it in a separated PR. |
|
@wh201906 These commits look good to me. I'm thinking this breaking change needs documenting, and probably leads to a minor version bump to Compared to
Anyone relying on throws from hex validation as input validation will now silently get partial/empty results. Anyone who expected ASCII to sanitize high-bit chars will now get latin1-like output. I'm good with the Nitro bump in another PR 👍 |
This PR migrates some string encoding/decoding functions in
HybridUtilsfrom OpenSSL/manual implementations tosimdutf:Add the full source ofsimdutfv8.2.0simdutfv8.2.0 as a submodulesimdutffor base64/base64url encoding and decodingsimdutfforencodeLatin1()Buffer.from()in Node.jsThese changes significantly improve performance when encoding/decoding large base64 payloads. I ran the benchmark on an old Android device and here are the results:
(I added some test cases with 1kB data, but this PR doesn't include them)
base64 1MB encode throughput compared with CraftzdogBuffer: From 4.92x to 9.16x
base64 1MB decode throughput compared with CraftzdogBuffer: From 169.07x to 398.76x