Skip to content

Commit 0dad065

Browse files
committed
docs(converters): add URL ↔ PURL conversion walkthrough
Covers the four static methods on UrlConverter (fromUrl, toDownloadUrl, toRepositoryUrl, getAllUrls), the 25+ hostnames registered for fromUrl dispatch, worked examples per-ecosystem, the RepositoryUrl / DownloadUrl shapes, and how to add a new ecosystem's URL parser with a complete template. Junior-dev level: hostname support matrix up front so readers can tell at a glance whether their ecosystem is covered, hazards section names the gotchas (exact hostname match, unversioned PURLs, hostile URL input).
1 parent dbe8800 commit 0dad065

2 files changed

Lines changed: 301 additions & 0 deletions

File tree

docs/converters.md

Lines changed: 295 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,295 @@
1+
# Converters
2+
3+
The `UrlConverter` class — convert between URLs (repository,
4+
download, registry) and PURLs. Read this when you are turning a
5+
human-copyable URL into a PURL, or going the other way to hand a
6+
user a clickable download link.
7+
8+
## Who this is for
9+
10+
Contributors extending the URL ↔ PURL support to a new ecosystem,
11+
or callers integrating Socket's data with tools that speak URL,
12+
not PURL.
13+
14+
## The three directions
15+
16+
```
17+
┌────────────┐ fromUrl() ┌────────────┐
18+
│ URL │ ──────────────────▶│ PackageURL │
19+
│ │ │ │
20+
│ │◀──────────────────┐│ │
21+
│ │ toRepositoryUrl()├│ │
22+
│ │ toDownloadUrl() ││ │
23+
│ │ ││ │
24+
└────────────┘ │└────────────┘
25+
26+
getAllUrls() returns both
27+
```
28+
29+
- **`UrlConverter.fromUrl(str)`** — URL string → PackageURL (or
30+
`undefined` if the URL is not recognized).
31+
- **`UrlConverter.toDownloadUrl(purl)`** — PackageURL → artifact
32+
download URL (tarball, jar, wheel, …). Returns `undefined` if the
33+
type doesn't support downloads.
34+
- **`UrlConverter.toRepositoryUrl(purl)`** — PackageURL → source
35+
repository URL (GitHub/GitLab/Bitbucket page or clone URL).
36+
Returns `undefined` if the type doesn't know its repository.
37+
- **`UrlConverter.getAllUrls(purl)`** — convenience wrapper
38+
returning both download and repository URLs in one call.
39+
40+
All four methods are **static** on `UrlConverter`. Instances are
41+
not needed or exposed.
42+
43+
## Supported hostnames for `fromUrl`
44+
45+
When you call `UrlConverter.fromUrl('https://github.com/lodash/lodash')`
46+
the library dispatches on the URL's hostname. These hostnames are
47+
registered:
48+
49+
| Hostname | Dispatches to |
50+
|---|---|
51+
| `registry.npmjs.org` | npm registry API parser |
52+
| `www.npmjs.com` | npm website parser (human-facing URLs) |
53+
| `pypi.org` | pypi |
54+
| `repo1.maven.org`, `central.maven.org` | maven |
55+
| `rubygems.org` | gem |
56+
| `crates.io` | cargo |
57+
| `www.nuget.org`, `api.nuget.org` | nuget |
58+
| `pkg.go.dev` | golang |
59+
| `hex.pm` | hex (Elixir/Erlang) |
60+
| `pub.dev` | pub (Dart/Flutter) |
61+
| `packagist.org` | composer (PHP) |
62+
| `hub.docker.com` | docker |
63+
| `cocoapods.org` | cocoapods |
64+
| `hackage.haskell.org` | hackage |
65+
| `cran.r-project.org` | cran |
66+
| `anaconda.org` | conda |
67+
| `metacpan.org` | cpan |
68+
| `luarocks.org` | luarocks |
69+
| `swiftpackageindex.com` | swift |
70+
| `huggingface.co` | huggingface |
71+
| `marketplace.visualstudio.com` | vscode-extension |
72+
| `open-vsx.org` | vscode-extension |
73+
| `github.com` | github (repo PURL) |
74+
| `gitlab.com` | gitlab |
75+
| `bitbucket.org` | bitbucket |
76+
77+
`UrlConverter.supportsFromUrl(str)` answers "is this URL
78+
recognized?" without parsing.
79+
80+
## Worked examples — `fromUrl`
81+
82+
### npm — both registry and website
83+
84+
```typescript
85+
UrlConverter.fromUrl('https://www.npmjs.com/package/lodash')
86+
// → PackageURL('npm', undefined, 'lodash')
87+
88+
UrlConverter.fromUrl('https://www.npmjs.com/package/@scope/pkg')
89+
// → PackageURL('npm', '@scope', 'pkg')
90+
91+
UrlConverter.fromUrl('https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz')
92+
// → PackageURL('npm', undefined, 'lodash', '4.17.21')
93+
```
94+
95+
### GitHub / GitLab / Bitbucket — VCS-style
96+
97+
```typescript
98+
UrlConverter.fromUrl('https://github.com/lodash/lodash')
99+
// → PackageURL('github', 'lodash', 'lodash')
100+
101+
UrlConverter.fromUrl('https://github.com/lodash/lodash/tree/4.17.21')
102+
// → PackageURL('github', 'lodash', 'lodash', '4.17.21')
103+
104+
UrlConverter.fromUrl('https://gitlab.com/gitlab-org/gitlab')
105+
// → PackageURL('gitlab', 'gitlab-org', 'gitlab')
106+
```
107+
108+
### Pypi
109+
110+
```typescript
111+
UrlConverter.fromUrl('https://pypi.org/project/requests/')
112+
// → PackageURL('pypi', undefined, 'requests')
113+
114+
UrlConverter.fromUrl('https://pypi.org/project/requests/2.31.0/')
115+
// → PackageURL('pypi', undefined, 'requests', '2.31.0')
116+
```
117+
118+
### Unrecognized host
119+
120+
```typescript
121+
UrlConverter.fromUrl('https://example.com/foo/bar')
122+
// → undefined
123+
```
124+
125+
`fromUrl` never throws on unrecognized input. A caller that needs
126+
"throw on unknown" can wrap:
127+
128+
```typescript
129+
function parseOrThrow(url: string): PackageURL {
130+
const purl = UrlConverter.fromUrl(url)
131+
if (!purl) {
132+
throw new Error(`Unrecognized URL: ${url}`)
133+
}
134+
return purl
135+
}
136+
```
137+
138+
## Worked examples — `toDownloadUrl`
139+
140+
```typescript
141+
const purl = new PackageURL('npm', undefined, 'lodash', '4.17.21')
142+
UrlConverter.toDownloadUrl(purl)
143+
// → { url: 'https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz' }
144+
145+
const pypi = new PackageURL('pypi', undefined, 'requests', '2.31.0')
146+
UrlConverter.toDownloadUrl(pypi)
147+
// → { url: 'https://files.pythonhosted.org/…/requests-2.31.0.tar.gz' }
148+
```
149+
150+
`toDownloadUrl` requires the PURL have a `version` — you cannot
151+
download "some version of lodash." If version is missing, returns
152+
`undefined`.
153+
154+
For ecosystems whose artifacts live at a predictable URL given
155+
`(name, version)`, the converter returns that URL. For ecosystems
156+
where the download requires API metadata lookup (e.g. resolving a
157+
sha digest), the converter returns `undefined` and you will need
158+
to use the ecosystem's own API.
159+
160+
## Worked examples — `toRepositoryUrl`
161+
162+
```typescript
163+
const github = new PackageURL('github', 'lodash', 'lodash')
164+
UrlConverter.toRepositoryUrl(github)
165+
// → { type: 'git', url: 'https://github.com/lodash/lodash.git' }
166+
167+
const pypi = new PackageURL('pypi', undefined, 'requests')
168+
UrlConverter.toRepositoryUrl(pypi)
169+
// → undefined (pypi itself doesn't expose a canonical repo URL)
170+
```
171+
172+
For some ecosystems, the repository URL depends on qualifiers set
173+
on the PURL:
174+
175+
```typescript
176+
const pypiWithRepo = new PackageURL(
177+
'pypi', undefined, 'requests', '2.31.0',
178+
{ repository_url: 'https://github.com/psf/requests' }
179+
)
180+
UrlConverter.toRepositoryUrl(pypiWithRepo)
181+
// → { type: 'git', url: 'https://github.com/psf/requests.git' }
182+
```
183+
184+
When a PURL carries a `repository_url` qualifier, the converter
185+
prefers that over any built-in inference. The qualifier wins because
186+
it is authoritative: the PURL author said "this is where the source
187+
lives."
188+
189+
## `RepositoryUrl` and `DownloadUrl` shapes
190+
191+
Both converters return an object, not a bare string, so callers can
192+
tell the kind of URL at a glance:
193+
194+
```typescript
195+
interface RepositoryUrl {
196+
type: 'git' | 'hg' | 'svn' | 'web'
197+
url: string
198+
}
199+
200+
interface DownloadUrl {
201+
url: string
202+
// (Some types also carry a sha/checksum field if known.)
203+
}
204+
```
205+
206+
The `type` on `RepositoryUrl` matters because `git clone <url>` is
207+
the right command for `type: 'git'` but **not** for `type: 'svn'`
208+
or `type: 'web'` (the latter is a browsable page, not a clone
209+
target).
210+
211+
## `getAllUrls` — both in one call
212+
213+
```typescript
214+
const urls = UrlConverter.getAllUrls(purl)
215+
// → { download: DownloadUrl | undefined, repository: RepositoryUrl | undefined }
216+
```
217+
218+
Use this when you are building a display (e.g. a package
219+
information panel) and want both URLs computed together.
220+
221+
## Adding a new ecosystem's URL parser
222+
223+
The support matrix above grows when you:
224+
225+
1. **Add a hostname parser.** Implement a `UrlParser` function
226+
that takes a parsed URL and returns a `PackageURL | undefined`.
227+
Register it in the `FROM_URL_PARSERS` map near the top of
228+
`src/url-converter.ts`.
229+
2. **Add `toDownloadUrl` support.** Add a case to the
230+
`toDownloadUrl` dispatch that builds the artifact URL from
231+
`(name, version, qualifiers)`. Add the type to
232+
`DOWNLOAD_URL_TYPES`.
233+
3. **Add `toRepositoryUrl` support.** Add a case to the
234+
`toRepositoryUrl` dispatch. Add the type to
235+
`REPOSITORY_URL_TYPES`.
236+
4. **Write tests.** Each parser needs round-trip coverage:
237+
`fromUrl(known)` → PURL → `toDownloadUrl(PURL)` → matches the
238+
input (or a canonical sibling).
239+
5. **Run `pnpm test` and `pnpm cover`**; both must stay green with
240+
100% coverage.
241+
242+
A typical `UrlParser` looks like:
243+
244+
```typescript
245+
function parseMyEcosystem(url: URL): PackageURL | undefined {
246+
// Extract (name, version, extras) from url.pathname / url.searchParams
247+
const match = /^\/packages\/([^/]+)(?:\/([^/]+))?/.exec(url.pathname)
248+
if (!match) {
249+
return undefined
250+
}
251+
const name = decodeURIComponent(match[1]!)
252+
const version = match[2] ? decodeURIComponent(match[2]) : undefined
253+
try {
254+
return new PackageURL('myeco', undefined, name, version)
255+
} catch {
256+
// Constructor threw — invalid shape or injection. Don't surface.
257+
return undefined
258+
}
259+
}
260+
```
261+
262+
The **try/catch** around `new PackageURL(...)` is important: a URL
263+
parser converts unrecognized input to `undefined`, not a thrown
264+
error. Callers distinguish "unknown URL" from "malformed PURL" by
265+
the return type.
266+
267+
## Hazards and caveats
268+
269+
- **Hostname matching is exact.** `https://subdomain.github.com/x/y`
270+
is not recognized; only `github.com`. If you need
271+
subdomain-tolerant matching, add the variant to the registry.
272+
- **http vs https is ignored.** The converter normalizes both to the
273+
same parser.
274+
- **URL canonicalization.** `fromUrl('https://github.com/X/')` and
275+
`fromUrl('https://github.com/X')` produce the same PURL — trailing
276+
slashes are stripped. Query strings and fragments are parser-
277+
dependent; check the individual parser before relying on them.
278+
- **`toDownloadUrl` + unversioned PURLs.** If your PURL has no
279+
`version`, download URL is `undefined`. Don't default to
280+
"latest" — the PURL spec treats an unversioned PURL as ambiguous,
281+
not as "latest."
282+
- **Don't feed untrusted URLs without pre-validation.** `fromUrl`
283+
does not throw on garbage, but a very long string or a weird
284+
`url.pathname` can still walk a parser's path-split logic. If
285+
your callers are hostile, size-limit the input first.
286+
287+
## Further reading
288+
289+
- [`docs/architecture.md`](./architecture.md) — module map.
290+
- [`docs/builders.md`](./builders.md) — the fluent API.
291+
- [`docs/hardening.md`](./hardening.md) — injection / freeze /
292+
error shape, including url-converter's try/catch pattern.
293+
- [`docs/api.md`](./api.md) — full API reference.
294+
- [`src/url-converter.ts`](../src/url-converter.ts) — the
295+
implementation (~1300 lines — the biggest source file by far).

tour.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,12 @@
179179
"source": "docs/builders.md",
180180
"summary": "The PurlBuilder fluent API — construct a PackageURL step by step with per-field setters and per-ecosystem factories."
181181
},
182+
{
183+
"filename": "converters",
184+
"title": "Converters",
185+
"source": "docs/converters.md",
186+
"summary": "URL ↔ PURL conversion across ~25 ecosystems — fromUrl, toDownloadUrl, toRepositoryUrl, getAllUrls."
187+
},
182188
{
183189
"filename": "hardening",
184190
"title": "Hardening",

0 commit comments

Comments
 (0)