Skip to content

docs: add LLM-friendly content export (llms.txt / llms-full.txt)#20993

Closed
bloxster wants to merge 2 commits intorelease/3.4from
docs/llms-txt
Closed

docs: add LLM-friendly content export (llms.txt / llms-full.txt)#20993
bloxster wants to merge 2 commits intorelease/3.4from
docs/llms-txt

Conversation

@bloxster
Copy link
Copy Markdown
Collaborator

@bloxster bloxster commented May 5, 2026

Summary

  • Adds docusaurus-plugin-llms-txt v0.1.3 to docs/site
  • At build time the plugin generates two static files served at the site root:
    • https://docs.erigon.tech/llms.txt — page index with short descriptions (LLM routing)
    • https://docs.erigon.tech/llms-full.txt — full text of all docs (long-context LLMs)
  • Matches the LLM content export already live at cocoon.erigon.tech/llms-full.txt

Test plan

  • cd docs/site && npm run build — clean build
  • Verify out/llms.txt and out/llms-full.txt are generated
  • Check https://docs.erigon.tech/llms.txt is accessible after deploy
  • Check https://docs.erigon.tech/llms-full.txt is accessible after deploy

🤖 Generated with Claude Code

Adds docusaurus-plugin-llms-txt@0.1.3 which generates two files at
build time:
- /llms.txt     — page index with short descriptions (for LLM routing)
- /llms-full.txt — full text of all docs (for long-context LLMs)

Matches the LLM content export already live at cocoon.erigon.tech.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bloxster bloxster enabled auto-merge (squash) May 5, 2026 09:23
Copy link
Copy Markdown
Member

@yperbasis yperbasis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerns worth addressing before merge

  1. Supply-chain attribution doesn't match what's on npm

The PR description links to https://github.com/PaloAltoNetworks/docusaurus-plugin-llms-txt. That URL does not exist — gh api repos/PaloAltoNetworks/docusaurus-plugin-llms-txt returns
404. PaloAltoNetworks publishes docusaurus-openapi-docs, not this one.

What you'd actually be installing:

  • npm metadata: no repository, no homepage, no bugs field
  • Sole maintainer: jverre jverre@gmail.com (personal Gmail, single account)
  • Last published: 2024-12-30 (≈16 months stale as of today)
  • Versions ever published: 0.1.0 → 0.1.3 (4 patch-level releases, no minor/major)

Compare with two healthier alternatives that exist on npm under the same name pattern:

 ┌───────────────────────────────────────────┬─────────┬──────────────────────────────────────────┬───────────────────────────────┐                                                     
 │                  Package                  │ Version │                   Repo                   │          Maintenance          │                                                     
 ├───────────────────────────────────────────┼─────────┼──────────────────────────────────────────┼───────────────────────────────┤                                                     
 │ docusaurus-plugin-llms-txt (this PR)      │ 0.1.3   │ none declared                            │ last release 2024-12-30       │
 ├───────────────────────────────────────────┼─────────┼──────────────────────────────────────────┼───────────────────────────────┤
 │ @signalwire/docusaurus-plugin-llms-txt    │ 1.2.2   │ github.com/signalwire/docusaurus-plugins │ active, scoped, repo declared │                                                     
 ├───────────────────────────────────────────┼─────────┼──────────────────────────────────────────┼───────────────────────────────┤                                                     
 │ din0s/docusaurus-plugin-llms-txt (GitHub) │ —       │ github.com/din0s/... (32★)               │ last commit 2026-04-23        │                                                     
 └───────────────────────────────────────────┴─────────┴──────────────────────────────────────────┴───────────────────────────────┘                                                     

Recommendation: Either switch to @signalwire/docusaurus-plugin-llms-txt (mature, scoped, declared repo, 1.x semver), or — if there's a specific reason to keep the unscoped one — fix
the PR description to point at the actual upstream and call out that it's a single-maintainer 0.x package with no repo metadata. As written, the description tells reviewers it's a
Palo Alto Networks plugin, which it isn't.

  1. The docs site build isn't in CI

I grepped .github/workflows/ for any docusaurus / docs-site / cd docs/site invocation. None found. That means:

  • No PR check verifies that npm run build still succeeds with this plugin added.
  • The four-item Test plan is entirely manual and won't be run in CI.
  • A future Docusaurus or React bump could silently break the docs build, and you'd discover it on deploy.

This isn't a blocker for this PR (the change is too small to break much), but adding a docs-site build job is cheap insurance and would have been the natural place to gate this
change.

  1. Plugin is configured with defaults — verify it indexes both docs plugins

docs/site/docusaurus.config.ts registers two docs plugins:

  • the default docs plugin via preset-classic (root /)
  • a second @docusaurus/plugin-content-docs instance with id: 'help-center', routeBasePath: 'help-center'

docusaurus-plugin-llms-txt is added as a bare string with no config. Worth confirming locally that both plugin instances end up in llms-full.txt — the test plan currently only checks
the file exists, not that help-center pages are included. If they aren't, you'll need an explicit { include: [...] } or similar option.

  1. Drive-by observations
  • The PR description is marked "🤖 Generated with Claude Code." The PaloAltoNetworks attribution looks like a model hallucination of the upstream — a quick npm-page check would have
    caught it. Worth adding a manual review step for PR descriptions that name external orgs.
  • package-lock.json adds only the plugin and no transitive deps beyond fs-extra / gray-matter, both of which were already resolved in the tree (no new install footprint). Good.
  • engines.node: ">=16.14" from the plugin is satisfied by your >=20.0 floor; no action needed.

Verdict

Request changes — not for the code itself (the diff is fine), but to:

  1. Correct the upstream attribution in the PR description, or swap to @signalwire/docusaurus-plugin-llms-txt which has declared provenance and active maintenance.
  2. Manually verify the help-center docs make it into llms-full.txt before merging (the current test-plan checkboxes don't cover this).
  3. Optional follow-up: add a tiny docs-site build job to CI so future plugin/dep bumps are gated.

If (1) is "we deliberately picked the jverre package, here's why," that should be stated in the description so reviewers don't have to dig into npm metadata to discover the gap
between the PR text and what's actually being installed.

@bloxster
Copy link
Copy Markdown
Collaborator Author

bloxster commented May 5, 2026

Closing in favour of #21000.

After review, we decided against docusaurus-plugin-llms-txt:

  1. Wrong conversion direction — the plugin compiles MDX → HTML then converts it back to markdown. Our source is already clean markdown; the HTML round-trip is lossy and unnecessary.
  2. Supply chain concerns — no declared source repo, personal Gmail maintainer, 16 months without updates.
  3. Build coupling — requires a full Docusaurus build to generate the files, adding CI weight for a task that doesn't need it.

PR #21000 replaces this with a ~130-line pure Python stdlib script (docs/site/scripts/generate-llms.py) that reads the .mdx source files directly, strips MDX syntax, and produces cleaner output with zero npm dependencies.

@bloxster bloxster closed this May 5, 2026
auto-merge was automatically disabled May 5, 2026 13:29

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants