-
-
Notifications
You must be signed in to change notification settings - Fork 290
audit: BreadcrumbList, robots policy, og:title, sitemap priorities #832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
0ba5264
4b143f0
758cff5
3ffe847
56cae9e
5de8526
0af8f3c
3c8ac01
b56c813
69e5eee
3622c50
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,12 +1,108 @@ | ||||||||||
| # Block specific bot | ||||||||||
| # Keploy docs robots.txt | ||||||||||
| # Policy: allow AI search/answer engines, block training-only crawlers, | ||||||||||
| # block Bytespider. Search bots drive visibility in ChatGPT, Claude, | ||||||||||
| # Perplexity, Copilot, Gemini answers. Training bots feed future model | ||||||||||
| # weights and provide nothing back. | ||||||||||
| # Reference: Speedscale / Katalon / Testsigma split policy (2026 competitor audit) | ||||||||||
|
|
||||||||||
| # ============================================================================= | ||||||||||
| # ALLOW — AI search / answer engines | ||||||||||
| # ============================================================================= | ||||||||||
|
|
||||||||||
| User-agent: OAI-SearchBot | ||||||||||
| Allow: / | ||||||||||
|
|
||||||||||
| User-agent: ChatGPT-User | ||||||||||
| Allow: / | ||||||||||
|
|
||||||||||
| User-agent: Claude-SearchBot | ||||||||||
| Allow: / | ||||||||||
|
|
||||||||||
| User-agent: Claude-User | ||||||||||
| Allow: / | ||||||||||
|
|
||||||||||
| User-agent: PerplexityBot | ||||||||||
| Allow: / | ||||||||||
|
|
||||||||||
| User-agent: Perplexity-User | ||||||||||
| Allow: / | ||||||||||
|
|
||||||||||
| User-agent: Gemini-Deep-Research | ||||||||||
| Allow: / | ||||||||||
|
|
||||||||||
| User-agent: GoogleOther | ||||||||||
| Allow: / | ||||||||||
|
|
||||||||||
| User-agent: Applebot | ||||||||||
| Allow: / | ||||||||||
|
|
||||||||||
| User-agent: DuckAssistBot | ||||||||||
| Allow: / | ||||||||||
|
|
||||||||||
| User-agent: Amazonbot | ||||||||||
| Allow: / | ||||||||||
|
||||||||||
| Allow: / | |
| Allow: / | |
| Crawl-delay: 5 | |
| Disallow: /cgi-bin/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 56cae9e. Added Crawl-delay: 5 and Disallow: /cgi-bin/ inside the named AI-search User-agent group so the allowed bots (OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Gemini-Deep-Research, GoogleOther, Applebot, DuckAssistBot, Amazonbot) get the same rate-limit and global disallow as User-agent: *. The legacy-version disallows (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) were already duplicated in this group for the same inheritance reason — this extends that pattern to the two global rules you flagged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 56cae9e: Crawl-delay: 5 and Disallow: /cgi-bin/ are now mirrored inside the AI-search allow group alongside the legacy-version disallows, so the group is a proper superset of the User-agent: * defaults. Named AI search bots (Perplexity/Applebot/OAI-SearchBot/etc.) now see the same crawl-rate limit and /cgi-bin/ block as fall-through bots.
Copilot
AI
Apr 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The legacy-version Disallow: /docs/1.0.0/ (and 2.0.0/3.0.0) rules are only under User-agent: *, so they will not apply to crawlers that match one of the explicit allow groups above (e.g., PerplexityBot, Applebot, OAI-SearchBot). If the intent is to block those legacy versions for all crawlers, either move the legacy disallows into each explicit allow group (and keep Allow: /), or remove the explicit allow groups entirely and let those bots fall through to User-agent: * (while keeping explicit disallow groups for training bots).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 758cff5. Went with option (a) but consolidated: the 11 AI-search-bot allow groups are now a single block that uses multiple User-agent: headers sharing one rule set, with the three legacy-version Disallow lines (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) applied directly inside it. Same intent ("allow these AI search bots everywhere except legacy versions") but now actually enforced, and only 8 lines of net change instead of 33.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 758cff5 (the earlier commit that moved the legacy disallows inside the named allow group). The /docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/ lines now sit directly under the User-agent: OAI-SearchBot / ChatGPT-User / Claude-SearchBot / ... / Amazonbot block so every allowed AI bot gets the legacy-version block, not just crawlers that fall through to User-agent: *. 56cae9e just now extended the same pattern to Crawl-delay: 5 and Disallow: /cgi-bin/ per your other comment — both global rules are now duplicated inside the named group as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR description mentions adding a new “Keploy vs Alternatives” doc page and updating the v4 sidebar, but those artifacts don’t appear to be present in this change set (no
keploy-vs-alternativesdoc found and no sidebar entry references it). Either the description needs updating to reflect the actual changes in this PR, or the missing doc/sidebar changes need to be included so the PR matches its stated scope.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by updating the PR title and body via REST API. The title is now 'audit: BreadcrumbList, robots policy, og:title, sitemap priorities' and the Task 33 section describing the Keploy vs Alternatives page has been removed from the body. Added a trailing Note that explains the file and sidebar entry were created earlier in the branch and then removed in commit b56c813 per @nehagup's review feedback — product comparison framing belongs on the landing site, not the docs subtree. The current PR scope is BreadcrumbList + robots.txt + og:title + sitemap priorities only.