docs: add LLM-friendly content export (llms.txt / llms-full.txt)#20993
docs: add LLM-friendly content export (llms.txt / llms-full.txt)#20993bloxster wants to merge 2 commits intorelease/3.4from
Conversation
Adds docusaurus-plugin-llms-txt@0.1.3 which generates two files at build time: - /llms.txt — page index with short descriptions (for LLM routing) - /llms-full.txt — full text of all docs (for long-context LLMs) Matches the LLM content export already live at cocoon.erigon.tech. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yperbasis
left a comment
There was a problem hiding this comment.
Concerns worth addressing before merge
- Supply-chain attribution doesn't match what's on npm
The PR description links to https://github.com/PaloAltoNetworks/docusaurus-plugin-llms-txt. That URL does not exist — gh api repos/PaloAltoNetworks/docusaurus-plugin-llms-txt returns
404. PaloAltoNetworks publishes docusaurus-openapi-docs, not this one.
What you'd actually be installing:
- npm metadata: no repository, no homepage, no bugs field
- Sole maintainer: jverre jverre@gmail.com (personal Gmail, single account)
- Last published: 2024-12-30 (≈16 months stale as of today)
- Versions ever published: 0.1.0 → 0.1.3 (4 patch-level releases, no minor/major)
Compare with two healthier alternatives that exist on npm under the same name pattern:
┌───────────────────────────────────────────┬─────────┬──────────────────────────────────────────┬───────────────────────────────┐
│ Package │ Version │ Repo │ Maintenance │
├───────────────────────────────────────────┼─────────┼──────────────────────────────────────────┼───────────────────────────────┤
│ docusaurus-plugin-llms-txt (this PR) │ 0.1.3 │ none declared │ last release 2024-12-30 │
├───────────────────────────────────────────┼─────────┼──────────────────────────────────────────┼───────────────────────────────┤
│ @signalwire/docusaurus-plugin-llms-txt │ 1.2.2 │ github.com/signalwire/docusaurus-plugins │ active, scoped, repo declared │
├───────────────────────────────────────────┼─────────┼──────────────────────────────────────────┼───────────────────────────────┤
│ din0s/docusaurus-plugin-llms-txt (GitHub) │ — │ github.com/din0s/... (32★) │ last commit 2026-04-23 │
└───────────────────────────────────────────┴─────────┴──────────────────────────────────────────┴───────────────────────────────┘
Recommendation: Either switch to @signalwire/docusaurus-plugin-llms-txt (mature, scoped, declared repo, 1.x semver), or — if there's a specific reason to keep the unscoped one — fix
the PR description to point at the actual upstream and call out that it's a single-maintainer 0.x package with no repo metadata. As written, the description tells reviewers it's a
Palo Alto Networks plugin, which it isn't.
- The docs site build isn't in CI
I grepped .github/workflows/ for any docusaurus / docs-site / cd docs/site invocation. None found. That means:
- No PR check verifies that npm run build still succeeds with this plugin added.
- The four-item Test plan is entirely manual and won't be run in CI.
- A future Docusaurus or React bump could silently break the docs build, and you'd discover it on deploy.
This isn't a blocker for this PR (the change is too small to break much), but adding a docs-site build job is cheap insurance and would have been the natural place to gate this
change.
- Plugin is configured with defaults — verify it indexes both docs plugins
docs/site/docusaurus.config.ts registers two docs plugins:
- the default docs plugin via preset-classic (root /)
- a second @docusaurus/plugin-content-docs instance with id: 'help-center', routeBasePath: 'help-center'
docusaurus-plugin-llms-txt is added as a bare string with no config. Worth confirming locally that both plugin instances end up in llms-full.txt — the test plan currently only checks
the file exists, not that help-center pages are included. If they aren't, you'll need an explicit { include: [...] } or similar option.
- Drive-by observations
- The PR description is marked "🤖 Generated with Claude Code." The PaloAltoNetworks attribution looks like a model hallucination of the upstream — a quick npm-page check would have
caught it. Worth adding a manual review step for PR descriptions that name external orgs. - package-lock.json adds only the plugin and no transitive deps beyond fs-extra / gray-matter, both of which were already resolved in the tree (no new install footprint). Good.
- engines.node: ">=16.14" from the plugin is satisfied by your >=20.0 floor; no action needed.
Verdict
Request changes — not for the code itself (the diff is fine), but to:
- Correct the upstream attribution in the PR description, or swap to @signalwire/docusaurus-plugin-llms-txt which has declared provenance and active maintenance.
- Manually verify the help-center docs make it into llms-full.txt before merging (the current test-plan checkboxes don't cover this).
- Optional follow-up: add a tiny docs-site build job to CI so future plugin/dep bumps are gated.
If (1) is "we deliberately picked the jverre package, here's why," that should be stated in the description so reviewers don't have to dig into npm metadata to discover the gap
between the PR text and what's actually being installed.
|
Closing in favour of #21000. After review, we decided against
PR #21000 replaces this with a ~130-line pure Python stdlib script ( |
Pull request was closed
Summary
docusaurus-plugin-llms-txtv0.1.3 todocs/sitehttps://docs.erigon.tech/llms.txt— page index with short descriptions (LLM routing)https://docs.erigon.tech/llms-full.txt— full text of all docs (long-context LLMs)cocoon.erigon.tech/llms-full.txtTest plan
cd docs/site && npm run build— clean buildout/llms.txtandout/llms-full.txtare generatedhttps://docs.erigon.tech/llms.txtis accessible after deployhttps://docs.erigon.tech/llms-full.txtis accessible after deploy🤖 Generated with Claude Code