|
| 1 | +# Converters |
| 2 | + |
| 3 | +The `UrlConverter` class — convert between URLs (repository, |
| 4 | +download, registry) and PURLs. Read this when you are turning a |
| 5 | +human-copyable URL into a PURL, or going the other way to hand a |
| 6 | +user a clickable download link. |
| 7 | + |
| 8 | +## Who this is for |
| 9 | + |
| 10 | +Contributors extending the URL ↔ PURL support to a new ecosystem, |
| 11 | +or callers integrating Socket's data with tools that speak URL, |
| 12 | +not PURL. |
| 13 | + |
| 14 | +## The three directions |
| 15 | + |
| 16 | +``` |
| 17 | + ┌────────────┐ fromUrl() ┌────────────┐ |
| 18 | + │ URL │ ──────────────────▶│ PackageURL │ |
| 19 | + │ │ │ │ |
| 20 | + │ │◀──────────────────┐│ │ |
| 21 | + │ │ toRepositoryUrl()├│ │ |
| 22 | + │ │ toDownloadUrl() ││ │ |
| 23 | + │ │ ││ │ |
| 24 | + └────────────┘ │└────────────┘ |
| 25 | + │ |
| 26 | + getAllUrls() returns both |
| 27 | +``` |
| 28 | + |
| 29 | +- **`UrlConverter.fromUrl(str)`** — URL string → PackageURL (or |
| 30 | + `undefined` if the URL is not recognized). |
| 31 | +- **`UrlConverter.toDownloadUrl(purl)`** — PackageURL → artifact |
| 32 | + download URL (tarball, jar, wheel, …). Returns `undefined` if the |
| 33 | + type doesn't support downloads. |
| 34 | +- **`UrlConverter.toRepositoryUrl(purl)`** — PackageURL → source |
| 35 | + repository URL (GitHub/GitLab/Bitbucket page or clone URL). |
| 36 | + Returns `undefined` if the type doesn't know its repository. |
| 37 | +- **`UrlConverter.getAllUrls(purl)`** — convenience wrapper |
| 38 | + returning both download and repository URLs in one call. |
| 39 | + |
| 40 | +All four methods are **static** on `UrlConverter`. Instances are |
| 41 | +not needed or exposed. |
| 42 | + |
| 43 | +## Supported hostnames for `fromUrl` |
| 44 | + |
| 45 | +When you call `UrlConverter.fromUrl('https://github.com/lodash/lodash')` |
| 46 | +the library dispatches on the URL's hostname. These hostnames are |
| 47 | +registered: |
| 48 | + |
| 49 | +| Hostname | Dispatches to | |
| 50 | +|---|---| |
| 51 | +| `registry.npmjs.org` | npm registry API parser | |
| 52 | +| `www.npmjs.com` | npm website parser (human-facing URLs) | |
| 53 | +| `pypi.org` | pypi | |
| 54 | +| `repo1.maven.org`, `central.maven.org` | maven | |
| 55 | +| `rubygems.org` | gem | |
| 56 | +| `crates.io` | cargo | |
| 57 | +| `www.nuget.org`, `api.nuget.org` | nuget | |
| 58 | +| `pkg.go.dev` | golang | |
| 59 | +| `hex.pm` | hex (Elixir/Erlang) | |
| 60 | +| `pub.dev` | pub (Dart/Flutter) | |
| 61 | +| `packagist.org` | composer (PHP) | |
| 62 | +| `hub.docker.com` | docker | |
| 63 | +| `cocoapods.org` | cocoapods | |
| 64 | +| `hackage.haskell.org` | hackage | |
| 65 | +| `cran.r-project.org` | cran | |
| 66 | +| `anaconda.org` | conda | |
| 67 | +| `metacpan.org` | cpan | |
| 68 | +| `luarocks.org` | luarocks | |
| 69 | +| `swiftpackageindex.com` | swift | |
| 70 | +| `huggingface.co` | huggingface | |
| 71 | +| `marketplace.visualstudio.com` | vscode-extension | |
| 72 | +| `open-vsx.org` | vscode-extension | |
| 73 | +| `github.com` | github (repo PURL) | |
| 74 | +| `gitlab.com` | gitlab | |
| 75 | +| `bitbucket.org` | bitbucket | |
| 76 | + |
| 77 | +`UrlConverter.supportsFromUrl(str)` answers "is this URL |
| 78 | +recognized?" without parsing. |
| 79 | + |
| 80 | +## Worked examples — `fromUrl` |
| 81 | + |
| 82 | +### npm — both registry and website |
| 83 | + |
| 84 | +```typescript |
| 85 | +UrlConverter.fromUrl('https://www.npmjs.com/package/lodash') |
| 86 | +// → PackageURL('npm', undefined, 'lodash') |
| 87 | + |
| 88 | +UrlConverter.fromUrl('https://www.npmjs.com/package/@scope/pkg') |
| 89 | +// → PackageURL('npm', '@scope', 'pkg') |
| 90 | + |
| 91 | +UrlConverter.fromUrl('https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz') |
| 92 | +// → PackageURL('npm', undefined, 'lodash', '4.17.21') |
| 93 | +``` |
| 94 | + |
| 95 | +### GitHub / GitLab / Bitbucket — VCS-style |
| 96 | + |
| 97 | +```typescript |
| 98 | +UrlConverter.fromUrl('https://github.com/lodash/lodash') |
| 99 | +// → PackageURL('github', 'lodash', 'lodash') |
| 100 | + |
| 101 | +UrlConverter.fromUrl('https://github.com/lodash/lodash/tree/4.17.21') |
| 102 | +// → PackageURL('github', 'lodash', 'lodash', '4.17.21') |
| 103 | + |
| 104 | +UrlConverter.fromUrl('https://gitlab.com/gitlab-org/gitlab') |
| 105 | +// → PackageURL('gitlab', 'gitlab-org', 'gitlab') |
| 106 | +``` |
| 107 | + |
| 108 | +### Pypi |
| 109 | + |
| 110 | +```typescript |
| 111 | +UrlConverter.fromUrl('https://pypi.org/project/requests/') |
| 112 | +// → PackageURL('pypi', undefined, 'requests') |
| 113 | + |
| 114 | +UrlConverter.fromUrl('https://pypi.org/project/requests/2.31.0/') |
| 115 | +// → PackageURL('pypi', undefined, 'requests', '2.31.0') |
| 116 | +``` |
| 117 | + |
| 118 | +### Unrecognized host |
| 119 | + |
| 120 | +```typescript |
| 121 | +UrlConverter.fromUrl('https://example.com/foo/bar') |
| 122 | +// → undefined |
| 123 | +``` |
| 124 | + |
| 125 | +`fromUrl` never throws on unrecognized input. A caller that needs |
| 126 | +"throw on unknown" can wrap: |
| 127 | + |
| 128 | +```typescript |
| 129 | +function parseOrThrow(url: string): PackageURL { |
| 130 | + const purl = UrlConverter.fromUrl(url) |
| 131 | + if (!purl) { |
| 132 | + throw new Error(`Unrecognized URL: ${url}`) |
| 133 | + } |
| 134 | + return purl |
| 135 | +} |
| 136 | +``` |
| 137 | + |
| 138 | +## Worked examples — `toDownloadUrl` |
| 139 | + |
| 140 | +```typescript |
| 141 | +const purl = new PackageURL('npm', undefined, 'lodash', '4.17.21') |
| 142 | +UrlConverter.toDownloadUrl(purl) |
| 143 | +// → { url: 'https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz' } |
| 144 | + |
| 145 | +const pypi = new PackageURL('pypi', undefined, 'requests', '2.31.0') |
| 146 | +UrlConverter.toDownloadUrl(pypi) |
| 147 | +// → { url: 'https://files.pythonhosted.org/…/requests-2.31.0.tar.gz' } |
| 148 | +``` |
| 149 | + |
| 150 | +`toDownloadUrl` requires the PURL have a `version` — you cannot |
| 151 | +download "some version of lodash." If version is missing, returns |
| 152 | +`undefined`. |
| 153 | + |
| 154 | +For ecosystems whose artifacts live at a predictable URL given |
| 155 | +`(name, version)`, the converter returns that URL. For ecosystems |
| 156 | +where the download requires API metadata lookup (e.g. resolving a |
| 157 | +sha digest), the converter returns `undefined` and you will need |
| 158 | +to use the ecosystem's own API. |
| 159 | + |
| 160 | +## Worked examples — `toRepositoryUrl` |
| 161 | + |
| 162 | +```typescript |
| 163 | +const github = new PackageURL('github', 'lodash', 'lodash') |
| 164 | +UrlConverter.toRepositoryUrl(github) |
| 165 | +// → { type: 'git', url: 'https://github.com/lodash/lodash.git' } |
| 166 | + |
| 167 | +const pypi = new PackageURL('pypi', undefined, 'requests') |
| 168 | +UrlConverter.toRepositoryUrl(pypi) |
| 169 | +// → undefined (pypi itself doesn't expose a canonical repo URL) |
| 170 | +``` |
| 171 | + |
| 172 | +For some ecosystems, the repository URL depends on qualifiers set |
| 173 | +on the PURL: |
| 174 | + |
| 175 | +```typescript |
| 176 | +const pypiWithRepo = new PackageURL( |
| 177 | + 'pypi', undefined, 'requests', '2.31.0', |
| 178 | + { repository_url: 'https://github.com/psf/requests' } |
| 179 | +) |
| 180 | +UrlConverter.toRepositoryUrl(pypiWithRepo) |
| 181 | +// → { type: 'git', url: 'https://github.com/psf/requests.git' } |
| 182 | +``` |
| 183 | + |
| 184 | +When a PURL carries a `repository_url` qualifier, the converter |
| 185 | +prefers that over any built-in inference. The qualifier wins because |
| 186 | +it is authoritative: the PURL author said "this is where the source |
| 187 | +lives." |
| 188 | + |
| 189 | +## `RepositoryUrl` and `DownloadUrl` shapes |
| 190 | + |
| 191 | +Both converters return an object, not a bare string, so callers can |
| 192 | +tell the kind of URL at a glance: |
| 193 | + |
| 194 | +```typescript |
| 195 | +interface RepositoryUrl { |
| 196 | + type: 'git' | 'hg' | 'svn' | 'web' |
| 197 | + url: string |
| 198 | +} |
| 199 | + |
| 200 | +interface DownloadUrl { |
| 201 | + url: string |
| 202 | + // (Some types also carry a sha/checksum field if known.) |
| 203 | +} |
| 204 | +``` |
| 205 | + |
| 206 | +The `type` on `RepositoryUrl` matters because `git clone <url>` is |
| 207 | +the right command for `type: 'git'` but **not** for `type: 'svn'` |
| 208 | +or `type: 'web'` (the latter is a browsable page, not a clone |
| 209 | +target). |
| 210 | + |
| 211 | +## `getAllUrls` — both in one call |
| 212 | + |
| 213 | +```typescript |
| 214 | +const urls = UrlConverter.getAllUrls(purl) |
| 215 | +// → { download: DownloadUrl | undefined, repository: RepositoryUrl | undefined } |
| 216 | +``` |
| 217 | + |
| 218 | +Use this when you are building a display (e.g. a package |
| 219 | +information panel) and want both URLs computed together. |
| 220 | + |
| 221 | +## Adding a new ecosystem's URL parser |
| 222 | + |
| 223 | +The support matrix above grows when you: |
| 224 | + |
| 225 | +1. **Add a hostname parser.** Implement a `UrlParser` function |
| 226 | + that takes a parsed URL and returns a `PackageURL | undefined`. |
| 227 | + Register it in the `FROM_URL_PARSERS` map near the top of |
| 228 | + `src/url-converter.ts`. |
| 229 | +2. **Add `toDownloadUrl` support.** Add a case to the |
| 230 | + `toDownloadUrl` dispatch that builds the artifact URL from |
| 231 | + `(name, version, qualifiers)`. Add the type to |
| 232 | + `DOWNLOAD_URL_TYPES`. |
| 233 | +3. **Add `toRepositoryUrl` support.** Add a case to the |
| 234 | + `toRepositoryUrl` dispatch. Add the type to |
| 235 | + `REPOSITORY_URL_TYPES`. |
| 236 | +4. **Write tests.** Each parser needs round-trip coverage: |
| 237 | + `fromUrl(known)` → PURL → `toDownloadUrl(PURL)` → matches the |
| 238 | + input (or a canonical sibling). |
| 239 | +5. **Run `pnpm test` and `pnpm cover`**; both must stay green with |
| 240 | + 100% coverage. |
| 241 | + |
| 242 | +A typical `UrlParser` looks like: |
| 243 | + |
| 244 | +```typescript |
| 245 | +function parseMyEcosystem(url: URL): PackageURL | undefined { |
| 246 | + // Extract (name, version, extras) from url.pathname / url.searchParams |
| 247 | + const match = /^\/packages\/([^/]+)(?:\/([^/]+))?/.exec(url.pathname) |
| 248 | + if (!match) { |
| 249 | + return undefined |
| 250 | + } |
| 251 | + const name = decodeURIComponent(match[1]!) |
| 252 | + const version = match[2] ? decodeURIComponent(match[2]) : undefined |
| 253 | + try { |
| 254 | + return new PackageURL('myeco', undefined, name, version) |
| 255 | + } catch { |
| 256 | + // Constructor threw — invalid shape or injection. Don't surface. |
| 257 | + return undefined |
| 258 | + } |
| 259 | +} |
| 260 | +``` |
| 261 | + |
| 262 | +The **try/catch** around `new PackageURL(...)` is important: a URL |
| 263 | +parser converts unrecognized input to `undefined`, not a thrown |
| 264 | +error. Callers distinguish "unknown URL" from "malformed PURL" by |
| 265 | +the return type. |
| 266 | + |
| 267 | +## Hazards and caveats |
| 268 | + |
| 269 | +- **Hostname matching is exact.** `https://subdomain.github.com/x/y` |
| 270 | + is not recognized; only `github.com`. If you need |
| 271 | + subdomain-tolerant matching, add the variant to the registry. |
| 272 | +- **http vs https is ignored.** The converter normalizes both to the |
| 273 | + same parser. |
| 274 | +- **URL canonicalization.** `fromUrl('https://github.com/X/')` and |
| 275 | + `fromUrl('https://github.com/X')` produce the same PURL — trailing |
| 276 | + slashes are stripped. Query strings and fragments are parser- |
| 277 | + dependent; check the individual parser before relying on them. |
| 278 | +- **`toDownloadUrl` + unversioned PURLs.** If your PURL has no |
| 279 | + `version`, download URL is `undefined`. Don't default to |
| 280 | + "latest" — the PURL spec treats an unversioned PURL as ambiguous, |
| 281 | + not as "latest." |
| 282 | +- **Don't feed untrusted URLs without pre-validation.** `fromUrl` |
| 283 | + does not throw on garbage, but a very long string or a weird |
| 284 | + `url.pathname` can still walk a parser's path-split logic. If |
| 285 | + your callers are hostile, size-limit the input first. |
| 286 | + |
| 287 | +## Further reading |
| 288 | + |
| 289 | +- [`docs/architecture.md`](./architecture.md) — module map. |
| 290 | +- [`docs/builders.md`](./builders.md) — the fluent API. |
| 291 | +- [`docs/hardening.md`](./hardening.md) — injection / freeze / |
| 292 | + error shape, including url-converter's try/catch pattern. |
| 293 | +- [`docs/api.md`](./api.md) — full API reference. |
| 294 | +- [`src/url-converter.ts`](../src/url-converter.ts) — the |
| 295 | + implementation (~1300 lines — the biggest source file by far). |
0 commit comments