Skip to content

Commit b1a65ae

Browse files
slayerjainnehagupclaude
authored
audit: BreadcrumbList, robots policy, og:title, sitemap priorities (#832)
* fix(robots): nuanced AI bot policy — allow search, block training Adopt the Speedscale / Katalon / Testsigma split: - Allow AI SEARCH bots (drive answer visibility): OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Gemini-Deep-Research, GoogleOther, Applebot, DuckAssistBot, Amazonbot. - Block TRAINING-ONLY bots: GPTBot, ClaudeBot, anthropic-ai, CCBot, Google-Extended, Applebot-Extended, Meta-ExternalAgent, FacebookBot, cohere-ai, Diffbot, Omgilibot, ImagesiftBot. - Keep Bytespider blocked. Also add belt-and-braces Disallow for unmaintained legacy doc versions (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) to reinforce existing noindex+canonical signals for crawlers that ignore them. Reopens Task 52 per user direction 2026-04-14. Mirrors the corresponding policy change on landing and blog-website robots.txt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Neha Gupta <gneha21@yahoo.in> * fix(docs): emit BreadcrumbList schema even when sidebar trail is null LIVE-20. Live audit of /docs/concepts/reference/glossary/idempotency/ on 2026-04-14 showed @type schema blocks Article, ImageObject, Organization, WebPage — and no BreadcrumbList. The /docs/ root, by contrast, has BreadcrumbList. This is the specific glossary-page regression that Task 13 was meant to catch but didn't. Root cause: the DocBreadcrumbs component had an early `return null` when useSidebarBreadcrumbs() returned null/undefined, which suppresses both the visual breadcrumb UI AND the JSON-LD schema emission. For deep glossary pages whose sidebar context resolves to null, this meant zero BreadcrumbList — the regression. Changes: - Replace `if (!breadcrumbs) return null` with a safe fallback to an empty sidebarTrail array. Schema emission + Home/Docs items run unconditionally. - Only render the visual <nav> when sidebarTrail has entries (avoids showing an empty breadcrumb UI on schema-only pages). - Propagate the sidebarTrail rename through the visual render path. Verify after deploy: curl -s https://keploy.io/docs/concepts/reference/glossary/idempotency/ | \ grep -c '"@type":"BreadcrumbList"' # expected: 1 (was 0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Neha Gupta <gneha21@yahoo.in> * fix(robots): apply legacy-version disallows inside AI-search allow group Per the robots.txt spec, a bot that matches a named User-agent group reads rules only from that group — it does not fall through to User-agent: *. So the Disallow: /docs/{1,2,3}.0.0/ lines under User-agent: * were silently inapplicable to PerplexityBot/Applebot/OAI-SearchBot/etc., meaning those bots were still crawling the unmaintained legacy versions despite the noindex/canonical/global block combo. Consolidate the 11 AI search bot allow groups into a single block using multiple User-agent headers, and add the three legacy-version Disallow lines inside it so the intent — "allow AI search bots everywhere except legacy versions" — is actually enforced. No semantic change to training bots, Bytespider, or the * fallback group. Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com> * fix(docs): LIVE-12 per-page og:title, LIVE-13 hub Article suppression, Task 35 sitemap priorities Three docs-side fixes bundled. LIVE-12 — per-page og:title / twitter:title Previously every docs page rendered with og:title "Keploy Documentation" because the title inherited from docusaurus.config.js's site-level `title` field. Social share cards on LinkedIn / Slack / X therefore all showed the same generic headline regardless of which glossary / concept / quickstart page was shared. Fix: emit <meta property="og:title" content={title}> and <meta name="twitter:title" content={title}> in the swizzled DocItem Head component, pulling from the per-page metadata.title that the <title> tag already uses. Also adds og:description / twitter:description so preview cards carry the page-specific description. No site-level config change required. LIVE-13 — suppress Article schema on /docs/ landing and category indexes The /docs/ root was shipping Article JSON-LD even though it is a hub page with no single author, no single publication date, and no single headline — a type mismatch that AI models may flag as noise. Fix: compute `suppressArticleSchema` from permalink / frontmatter and short-circuit the articleSchema construction when the page is the /docs/ root or a category index. The DocBreadcrumbs JSON-LD continues to emit normally so hub pages still have navigation signal. Task 35 — differentiate docs sitemap priorities Original priority buckets only covered quickstart (0.8), concepts / keploy-explained (0.7), and keploy-cloud (0.6). Default was 0.5 for everything else including the high-value /docs/ root and running-keploy sections. New bucket structure in createSitemapItems: 1.0 /docs/ root (primary entry point) 0.9 /docs/quickstart/* (highest-intent user flow) 0.8 /docs/running-keploy/* (primary product docs) 0.7 /docs/concepts/*, /docs/keploy-explained/* 0.6 /docs/keploy-cloud/*, /docs/ci-cd/*, /docs/faq, /docs/troubleshooting 0.5 /docs/concepts/reference/glossary/* (long-tail, many pages) Added an explanatory comment inline so the next editor understands the bucket rationale. Verify after deploy: curl -s https://keploy.io/docs/concepts/reference/glossary/idempotency/ | \ grep -oE 'og:title"[^>]*content="[^"]+"' # expected: "What is Idempotency in REST APIs? Complete Guide" curl -s https://keploy.io/docs/ | grep -c '"@type":"Article"' # expected: 0 curl -s https://keploy.io/docs/sitemap.xml | \ python3 -c "import sys,re; \ priorities = re.findall(r'<priority>([0-9.]+)</priority>', sys.stdin.read()); \ print('unique priorities:', sorted(set(priorities)))" # expected: ['0.5', '0.6', '0.7', '0.8', '0.9', '1.0'] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Neha Gupta <gneha21@yahoo.in> * fix(robots): mirror Crawl-delay + /cgi-bin disallow into AI search group Copilot review caught that named User-agent groups in robots.txt do not inherit rules from User-agent: *. The AI-search allow group (OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Gemini-Deep-Research, GoogleOther, Applebot, DuckAssistBot, Amazonbot) was therefore ignoring both the global Crawl-delay: 5 limit AND the Disallow: /cgi-bin/ in the fallback User-agent: * block. Duplicated both lines into the named group so the same policy applies: search bots are rate-limited to 5s per request, and they cannot crawl /cgi-bin/. The legacy-version disallows (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) were already duplicated in this block for the same inheritance reason. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Neha Gupta <gneha21@yahoo.in> * feat(docs): Task 33 — Keploy vs Alternatives comparison page Live audit + competitor analysis (llms-full.txt for competitors already has comparison tables, but docs had none). Adds a dedicated comparison page under /docs/keploy-explained/keploy-vs-alternatives with: - Feature comparison matrix: Keploy vs Postman, Katalon, WireMock, Testcontainers across 9 capabilities (test generation model, SDK requirement, mock generation, non-determinism, secret masking, CI/CD, license, kernel version). - Approach differences: plain-language description of each tool's core abstraction so readers can self-sort. - When to pick each: decision tree by team profile / API shape. - Migration paths: concrete steps for moving from Postman or Katalon to Keploy without throwing away existing work. - Related reading cross-links to how-keploy-works, integration-testing-faq, api-testing-faq. Added to version-4.0.0 sidebar in the Integration Testing → keploy-vs- alternatives slot, placed between Troubleshooting Guide and FAQs so it appears in the decision phase of the reader journey. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Neha Gupta <gneha21@yahoo.in> * fix(copilot-review): FAQ + troubleshooting sitemap priority must match actual v4 routes Addresses Copilot review comment 3080865321 on docs PR #832. The previous sitemap createSitemapItems callback checked `url.includes("/faq")` and `url.includes("/troubleshooting")` — neither of those substrings appears in the actual v4 docs URLs. The FAQ pages live at: /docs/keploy-explained/integration-testing-faq/ /docs/keploy-explained/api-testing-faq/ /docs/keploy-explained/unit-testing-faq/ and the troubleshooting guide lives at: /docs/keploy-explained/common-errors/ (sidebar label: "Troubleshooting Guide"). Because `/faq` never matched, all three FAQ pages and the troubleshooting guide fell through to the `/keploy-explained/` rule immediately below and got priority 0.7, not the intended 0.6. Fix: 1. Changed the match patterns to `-faq/`, `-faq`, and `/common-errors` so they match the real URL fragments. 2. Moved the FAQ/troubleshooting check ABOVE the `/keploy-explained/` check so it takes precedence when a page satisfies both. 3. Updated the header comment block to name the actual pages covered by the 0.6 bucket. Signed-off-by: Neha Gupta <gneha21@yahoo.in> * style: prettier format on audit-touched files Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com> * fix(review): drop keploy-vs-alternatives page + scrub internal ticket refs Addresses 3 review comments from @nehagup on PR #832. 1. Delete versioned_docs/version-4.0.0/keploy-explained/keploy-vs-alternatives.md and remove its sidebar entry in version-4.0.0-sidebars.json. The page was added by the audit batch but the team decided not to ship a public alternatives comparison inside the docs; keeping it here would publish product-marketing framing under the docs subtree, which belongs on the landing site instead if anywhere. 2. Strip internal ticket identifiers from comments across docusaurus.config.js, src/theme/DocBreadcrumbs/index.js, and src/theme/DocItem/index.js. Internal audit references like "Task 35", "LIVE-12", "LIVE-13", "LIVE-20" rot as the task tracker evolves and leak private process detail into public source. Kept the explanatory comments that describe *why* each piece of logic exists, just without the ticket numbers. DocItem/index.js suppressArticleSchema logic and DocBreadcrumbs sidebarTrail fallback are left as-is — they are functional fixes that prevent invalid Article schema on hub pages and missing BreadcrumbList schema on glossary pages. Reply to @nehagup's "why are we suppressing" question posted separately on the comment thread. Signed-off-by: Neha Gupta <gneha21@yahoo.in> * fix(copilot-review): suppress Article schema on versioned docs roots too Addresses Copilot comment 3081151415 on PR #832. The previous suppressArticleSchema check only matched /docs/ as the docs root, but this site serves versioned hub pages too — /docs/4.0.0/, /docs/3.0.0/, /docs/2.0.0/, /docs/1.0.0/ — via onlyIncludeVersions and includeCurrentVersion in docusaurus.config.js. Each versioned root is also an index of content with no single author/date/headline, so emitting Article/BlogPosting/APIReference schema on those pages had the same type-mismatch problem the base case fix was addressing. Added a regex check for /docs/<digit-starting-version>/ so any current or archived versioned root is caught by the same suppression path. Current content pages inside versioned trees (e.g. /docs/4.0.0/keploy-explained/how-keploy-works/) still emit Article schema as normal since they have real authors, dates, and headlines — only the bare versioned roots are suppressed. Signed-off-by: Neha Gupta <gneha21@yahoo.in> --------- Signed-off-by: Neha Gupta <gneha21@yahoo.in> Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com> Co-authored-by: Neha Gupta <gneha21@yahoo.in> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: nehagup <15074229+nehagup@users.noreply.github.com>
1 parent 12aa3c3 commit b1a65ae

4 files changed

Lines changed: 220 additions & 43 deletions

File tree

docusaurus.config.js

Lines changed: 50 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -449,20 +449,66 @@ module.exports = {
449449
changefreq: "weekly",
450450
priority: 0.5,
451451
filename: "sitemap.xml",
452+
// Differentiate docs sitemap priorities by content type so
453+
// search engines spend crawl budget proportional to how
454+
// canonical each page is. Priority buckets:
455+
// 1.0 → /docs/ root (highest — primary entry point)
456+
// 0.9 → /docs/quickstart/* (highest-intent user flow)
457+
// 0.8 → /docs/running-keploy/* (primary product docs)
458+
// 0.7 → /docs/concepts/*, /docs/keploy-explained/*
459+
// 0.6 → /docs/keploy-cloud/*, /docs/ci-cd/*
460+
// 0.6 → /docs/keploy-explained/*-faq/ (3 FAQ pages) and
461+
// /docs/keploy-explained/common-errors/ (troubleshooting)
462+
// — reference-style, lower crawl priority than core docs
463+
// 0.5 → /docs/concepts/reference/glossary/* (long-tail
464+
// glossary; noindexed legacy versions excluded via
465+
// netlify headers + robots.txt)
452466
createSitemapItems: async (params) => {
453467
const {defaultCreateSitemapItems, ...rest} = params;
454468
const items = await defaultCreateSitemapItems(rest);
455469
return items.map((item) => {
456-
if (item.url.includes("/quickstart/")) {
470+
const url = item.url;
471+
// The /docs/ home page is the highest-priority entry point
472+
// for the whole docs subtree.
473+
if (url.endsWith("/docs/") || url.endsWith("/docs")) {
474+
return {...item, priority: 1.0, changefreq: "weekly"};
475+
}
476+
if (url.includes("/quickstart/")) {
477+
return {...item, priority: 0.9, changefreq: "weekly"};
478+
}
479+
if (url.includes("/running-keploy/")) {
457480
return {...item, priority: 0.8, changefreq: "weekly"};
458481
}
482+
if (url.includes("/concepts/reference/glossary/")) {
483+
// Glossary entries are numerous, long-tail, and often
484+
// off-topic for core product queries. Keep them in the
485+
// sitemap but mark them low priority.
486+
return {...item, priority: 0.5, changefreq: "monthly"};
487+
}
488+
// FAQ + troubleshooting match FIRST, because these pages live
489+
// under /keploy-explained/ in the v4 docs (e.g.
490+
// /docs/keploy-explained/integration-testing-faq/,
491+
// /docs/keploy-explained/api-testing-faq/,
492+
// /docs/keploy-explained/unit-testing-faq/,
493+
// /docs/keploy-explained/common-errors/ — "common-errors" is
494+
// the troubleshooting guide, labelled "Troubleshooting Guide"
495+
// in the sidebar). Without matching first, they would be
496+
// captured by the /keploy-explained/ rule below and get
497+
// priority 0.7 instead of the intended 0.6.
498+
if (
499+
url.includes("-faq/") ||
500+
url.includes("-faq") ||
501+
url.includes("/common-errors")
502+
) {
503+
return {...item, priority: 0.6, changefreq: "monthly"};
504+
}
459505
if (
460-
item.url.includes("/concepts/") ||
461-
item.url.includes("/keploy-explained/")
506+
url.includes("/concepts/") ||
507+
url.includes("/keploy-explained/")
462508
) {
463509
return {...item, priority: 0.7, changefreq: "weekly"};
464510
}
465-
if (item.url.includes("/keploy-cloud/")) {
511+
if (url.includes("/keploy-cloud/") || url.includes("/ci-cd/")) {
466512
return {...item, priority: 0.6, changefreq: "monthly"};
467513
}
468514
return item;

src/theme/DocBreadcrumbs/index.js

Lines changed: 39 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,12 @@ export default function DocBreadcrumbs() {
5151
const {siteConfig} = useDocusaurusContext();
5252
const {pathname} = useLocation();
5353

54-
if (!breadcrumbs) {
55-
return null;
56-
}
54+
// Previously this component early-returned when useSidebarBreadcrumbs()
55+
// returned null/undefined, which caused glossary and reference pages
56+
// not in the sidebar config to ship with zero BreadcrumbList schema.
57+
// Treat null/undefined as "no sidebar trail, emit Home + Docs schema
58+
// anyway" so AI crawlers always get a hierarchy signal.
59+
const sidebarTrail = Array.isArray(breadcrumbs) ? breadcrumbs : [];
5760

5861
const toAbsoluteUrl = (baseUrl, url) => {
5962
if (!url) {
@@ -89,9 +92,9 @@ export default function DocBreadcrumbs() {
8992
}
9093
}
9194

92-
if (breadcrumbs.length > 0) {
93-
breadcrumbs.forEach((crumb, index) => {
94-
const isLast = index === breadcrumbs.length - 1;
95+
if (sidebarTrail.length > 0) {
96+
sidebarTrail.forEach((crumb, index) => {
97+
const isLast = index === sidebarTrail.length - 1;
9598
const href =
9699
crumb.type === "category" && crumb.linkUnlisted
97100
? undefined
@@ -130,35 +133,37 @@ export default function DocBreadcrumbs() {
130133
</script>
131134
</Head>
132135
)}
133-
<nav
134-
className={clsx(
135-
ThemeClassNames.docs.docBreadcrumbs,
136-
styles.breadcrumbsContainer
137-
)}
138-
aria-label={translate({
139-
id: "theme.docs.breadcrumbs.navAriaLabel",
140-
message: "Breadcrumbs",
141-
description: "The ARIA label for the breadcrumbs",
142-
})}
143-
>
144-
<ul className="breadcrumbs">
145-
{homePageRoute && <HomeBreadcrumbItem />}
146-
{breadcrumbs.map((item, idx) => {
147-
const isLast = idx === breadcrumbs.length - 1;
148-
const href =
149-
item.type === "category" && item.linkUnlisted
150-
? undefined
151-
: item.href;
152-
return (
153-
<BreadcrumbsItem key={idx} active={isLast}>
154-
<BreadcrumbsItemLink href={href} isLast={isLast}>
155-
{item.label}
156-
</BreadcrumbsItemLink>
157-
</BreadcrumbsItem>
158-
);
136+
{sidebarTrail.length > 0 && (
137+
<nav
138+
className={clsx(
139+
ThemeClassNames.docs.docBreadcrumbs,
140+
styles.breadcrumbsContainer
141+
)}
142+
aria-label={translate({
143+
id: "theme.docs.breadcrumbs.navAriaLabel",
144+
message: "Breadcrumbs",
145+
description: "The ARIA label for the breadcrumbs",
159146
})}
160-
</ul>
161-
</nav>
147+
>
148+
<ul className="breadcrumbs">
149+
{homePageRoute && <HomeBreadcrumbItem />}
150+
{sidebarTrail.map((item, idx) => {
151+
const isLast = idx === sidebarTrail.length - 1;
152+
const href =
153+
item.type === "category" && item.linkUnlisted
154+
? undefined
155+
: item.href;
156+
return (
157+
<BreadcrumbsItem key={idx} active={isLast}>
158+
<BreadcrumbsItemLink href={href} isLast={isLast}>
159+
{item.label}
160+
</BreadcrumbsItemLink>
161+
</BreadcrumbsItem>
162+
);
163+
})}
164+
</ul>
165+
</nav>
166+
)}
162167
</>
163168
);
164169
}

src/theme/DocItem/index.js

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -143,12 +143,36 @@ export default function DocItem(props) {
143143
const currentYear = new Date().getFullYear();
144144
const image = assets?.image ?? frontMatter?.image;
145145
const imageWithBaseUrl = useBaseUrl(image || "");
146-
const socialImage = image ? toAbsoluteUrl(siteConfig?.url, imageWithBaseUrl) : null;
146+
const socialImage = image
147+
? toAbsoluteUrl(siteConfig?.url, imageWithBaseUrl)
148+
: null;
147149
const normalizedMetaKeywords = Array.isArray(metaKeywords)
148150
? metaKeywords.join(", ")
149151
: metaKeywords;
152+
// Suppress Article / BlogPosting / APIReference schema on the /docs/
153+
// root, versioned docs roots like /docs/4.0.0/, and any category
154+
// index pages. Article schema on a hub page is a type mismatch
155+
// because a hub does not have a single author, a single publication
156+
// date, or a single headline — it is an index of content. Hub pages
157+
// emit only the normal DocBreadcrumbs JSON-LD.
158+
const permalink = metadata?.permalink || "";
159+
// Versioned root pattern: /docs/<version>/ or /docs/<version> where
160+
// <version> starts with a digit. Covers current and archived
161+
// versions listed in docusaurus.config.js onlyIncludeVersions.
162+
const isVersionedDocsRoot =
163+
/^\/docs\/\d[\w.-]*(?:\/index)?\/?$/.test(permalink);
164+
const isDocsRoot =
165+
permalink === "/docs/" ||
166+
permalink === "/docs" ||
167+
permalink.endsWith("/docs/index") ||
168+
permalink.endsWith("/docs/") ||
169+
isVersionedDocsRoot;
170+
const isCategoryIndex =
171+
frontMatter?.slug === "index" || /\/category\/|\/index\/?$/.test(permalink);
172+
const suppressArticleSchema = isDocsRoot || isCategoryIndex;
173+
150174
const articleSchema =
151-
pageUrl && title
175+
pageUrl && title && !suppressArticleSchema
152176
? {
153177
"@context": "https://schema.org",
154178
"@type": schemaType,
@@ -187,6 +211,19 @@ export default function DocItem(props) {
187211
{normalizedMetaKeywords && (
188212
<meta name="keywords" content={normalizedMetaKeywords} />
189213
)}
214+
{/* Per-page og:title and og:description override the
215+
docusaurus.config.js site-level defaults, which would
216+
otherwise emit the same og:title on every docs page
217+
regardless of content. Social card previews now reflect
218+
the actual page title. */}
219+
<meta property="og:title" content={title} />
220+
{description && (
221+
<meta property="og:description" content={description} />
222+
)}
223+
<meta name="twitter:title" content={title} />
224+
{description && (
225+
<meta name="twitter:description" content={description} />
226+
)}
190227
{socialImage && <meta property="og:image" content={socialImage} />}
191228
{socialImage && <meta name="twitter:image" content={socialImage} />}
192229
{socialImage && (
@@ -288,7 +325,10 @@ export default function DocItem(props) {
288325
href="https://join.slack.com/t/keploy/shared_invite/zt-357qqm9b5-PbZRVu3Yt2rJIa6ofrwWNg"
289326
aria-label="Slack"
290327
>
291-
<span className="docs-inline-footer__slack" aria-hidden="true" />
328+
<span
329+
className="docs-inline-footer__slack"
330+
aria-hidden="true"
331+
/>
292332
</a>
293333
</div>
294334
<div className="docs-inline-footer__usecase">

static/robots.txt

Lines changed: 88 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,98 @@
1-
# Block specific bot
1+
# Keploy docs robots.txt
2+
# Policy: allow AI search/answer engines, block training-only crawlers,
3+
# block Bytespider. Search bots drive visibility in ChatGPT, Claude,
4+
# Perplexity, Copilot, Gemini answers. Training bots feed future model
5+
# weights and provide nothing back.
6+
# Reference: Speedscale / Katalon / Testsigma split policy (2026 competitor audit)
7+
8+
# =============================================================================
9+
# ALLOW — AI search / answer engines
10+
# Legacy-version disallows are repeated inside this group because a bot that
11+
# matches a named User-agent group only reads rules from THAT group; it does
12+
# not fall through to `User-agent: *`. Without these lines, Perplexity/
13+
# Applebot/OAI-SearchBot/etc. would still crawl /docs/{1,2,3}.0.0/ despite
14+
# the global block further below.
15+
# =============================================================================
16+
17+
User-agent: OAI-SearchBot
18+
User-agent: ChatGPT-User
19+
User-agent: Claude-SearchBot
20+
User-agent: Claude-User
21+
User-agent: PerplexityBot
22+
User-agent: Perplexity-User
23+
User-agent: Gemini-Deep-Research
24+
User-agent: GoogleOther
25+
User-agent: Applebot
26+
User-agent: DuckAssistBot
27+
User-agent: Amazonbot
28+
Allow: /
29+
Crawl-delay: 5
30+
Disallow: /cgi-bin/
31+
Disallow: /docs/1.0.0/
32+
Disallow: /docs/2.0.0/
33+
Disallow: /docs/3.0.0/
34+
35+
# =============================================================================
36+
# DISALLOW — Training-only crawlers
37+
# =============================================================================
38+
39+
User-agent: GPTBot
40+
Disallow: /
41+
42+
User-agent: ClaudeBot
43+
Disallow: /
44+
45+
User-agent: anthropic-ai
46+
Disallow: /
47+
48+
User-agent: CCBot
49+
Disallow: /
50+
51+
User-agent: Google-Extended
52+
Disallow: /
53+
54+
User-agent: Applebot-Extended
55+
Disallow: /
56+
57+
User-agent: Meta-ExternalAgent
58+
Disallow: /
59+
60+
User-agent: FacebookBot
61+
Disallow: /
62+
63+
User-agent: cohere-ai
64+
Disallow: /
65+
66+
User-agent: Diffbot
67+
Disallow: /
68+
69+
User-agent: Omgilibot
70+
Disallow: /
71+
72+
User-agent: ImagesiftBot
73+
Disallow: /
74+
75+
# Always-block scraper
276
User-agent: Bytespider
377
Disallow: /
478

5-
# Default rules — apply to all crawlers including AI bots
79+
# =============================================================================
80+
# DEFAULT — Googlebot, Bingbot, and all other crawlers
81+
# =============================================================================
82+
683
User-agent: *
784
Allow: /
885
Crawl-delay: 5
986
Disallow: /cgi-bin/
1087

88+
# Block unmaintained legacy doc versions (already set via noindex + canonical,
89+
# belt-and-braces for crawlers that ignore those signals).
90+
Disallow: /docs/1.0.0/
91+
Disallow: /docs/2.0.0/
92+
Disallow: /docs/3.0.0/
93+
94+
# =============================================================================
1195
# Sitemap
96+
# =============================================================================
97+
1298
Sitemap: https://keploy.io/docs/sitemap.xml

0 commit comments

Comments
 (0)