Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 41 additions & 34 deletions src/theme/DocBreadcrumbs/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,14 @@ export default function DocBreadcrumbs() {
const {siteConfig} = useDocusaurusContext();
const {pathname} = useLocation();

if (!breadcrumbs) {
return null;
}
// LIVE-20 fix. Previously this component early-returned when
// useSidebarBreadcrumbs() returned null/undefined, which caused
// glossary and reference pages not in the sidebar config to ship
// with zero BreadcrumbList schema (audited 2026-04-14 on
// /docs/concepts/reference/glossary/idempotency/).
// Now we treat null/undefined as "no sidebar trail, emit Home + Docs
// schema anyway" so AI crawlers always get a hierarchy signal.
const sidebarTrail = Array.isArray(breadcrumbs) ? breadcrumbs : [];

const toAbsoluteUrl = (baseUrl, url) => {
if (!url) {
Expand Down Expand Up @@ -89,9 +94,9 @@ export default function DocBreadcrumbs() {
}
}

if (breadcrumbs.length > 0) {
breadcrumbs.forEach((crumb, index) => {
const isLast = index === breadcrumbs.length - 1;
if (sidebarTrail.length > 0) {
sidebarTrail.forEach((crumb, index) => {
const isLast = index === sidebarTrail.length - 1;
const href =
crumb.type === "category" && crumb.linkUnlisted
? undefined
Expand Down Expand Up @@ -130,35 +135,37 @@ export default function DocBreadcrumbs() {
</script>
</Head>
)}
<nav
className={clsx(
ThemeClassNames.docs.docBreadcrumbs,
styles.breadcrumbsContainer
)}
aria-label={translate({
id: "theme.docs.breadcrumbs.navAriaLabel",
message: "Breadcrumbs",
description: "The ARIA label for the breadcrumbs",
})}
>
<ul className="breadcrumbs">
{homePageRoute && <HomeBreadcrumbItem />}
{breadcrumbs.map((item, idx) => {
const isLast = idx === breadcrumbs.length - 1;
const href =
item.type === "category" && item.linkUnlisted
? undefined
: item.href;
return (
<BreadcrumbsItem key={idx} active={isLast}>
<BreadcrumbsItemLink href={href} isLast={isLast}>
{item.label}
</BreadcrumbsItemLink>
</BreadcrumbsItem>
);
{sidebarTrail.length > 0 && (
<nav
className={clsx(
ThemeClassNames.docs.docBreadcrumbs,
styles.breadcrumbsContainer
)}
aria-label={translate({
id: "theme.docs.breadcrumbs.navAriaLabel",
message: "Breadcrumbs",
description: "The ARIA label for the breadcrumbs",
})}
</ul>
</nav>
>
<ul className="breadcrumbs">
{homePageRoute && <HomeBreadcrumbItem />}
{sidebarTrail.map((item, idx) => {
const isLast = idx === sidebarTrail.length - 1;
const href =
item.type === "category" && item.linkUnlisted
? undefined
: item.href;
return (
<BreadcrumbsItem key={idx} active={isLast}>
<BreadcrumbsItemLink href={href} isLast={isLast}>
{item.label}
</BreadcrumbsItemLink>
</BreadcrumbsItem>
);
})}
</ul>
</nav>
)}
</>
);
}
100 changes: 98 additions & 2 deletions static/robots.txt
Original file line number Diff line number Diff line change
@@ -1,12 +1,108 @@
# Block specific bot
# Keploy docs robots.txt
# Policy: allow AI search/answer engines, block training-only crawlers,
# block Bytespider. Search bots drive visibility in ChatGPT, Claude,
# Perplexity, Copilot, Gemini answers. Training bots feed future model
# weights and provide nothing back.
# Reference: Speedscale / Katalon / Testsigma split policy (2026 competitor audit)
Comment on lines +1 to +6
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description mentions adding a new “Keploy vs Alternatives” doc page and updating the v4 sidebar, but those artifacts don’t appear to be present in this change set (no keploy-vs-alternatives doc found and no sidebar entry references it). Either the description needs updating to reflect the actual changes in this PR, or the missing doc/sidebar changes need to be included so the PR matches its stated scope.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by updating the PR title and body via REST API. The title is now 'audit: BreadcrumbList, robots policy, og:title, sitemap priorities' and the Task 33 section describing the Keploy vs Alternatives page has been removed from the body. Added a trailing Note that explains the file and sidebar entry were created earlier in the branch and then removed in commit b56c813 per @nehagup's review feedback — product comparison framing belongs on the landing site, not the docs subtree. The current PR scope is BreadcrumbList + robots.txt + og:title + sitemap priorities only.


# =============================================================================
# ALLOW — AI search / answer engines
# =============================================================================

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Gemini-Deep-Research
Allow: /

User-agent: GoogleOther
Allow: /

User-agent: Applebot
Allow: /

User-agent: DuckAssistBot
Allow: /

User-agent: Amazonbot
Allow: /
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit AI-search User-agent group won’t inherit rules from User-agent: *, so those bots will ignore Crawl-delay: 5 and Disallow: /cgi-bin/. If the intent is to keep the same crawl-rate limit and global disallows for all allowed crawlers, duplicate those rules inside this named allow group as well (alongside the legacy-version disallows).

Suggested change
Allow: /
Allow: /
Crawl-delay: 5
Disallow: /cgi-bin/

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 56cae9e. Added Crawl-delay: 5 and Disallow: /cgi-bin/ inside the named AI-search User-agent group so the allowed bots (OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Gemini-Deep-Research, GoogleOther, Applebot, DuckAssistBot, Amazonbot) get the same rate-limit and global disallow as User-agent: *. The legacy-version disallows (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) were already duplicated in this group for the same inheritance reason — this extends that pattern to the two global rules you flagged.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 56cae9e: Crawl-delay: 5 and Disallow: /cgi-bin/ are now mirrored inside the AI-search allow group alongside the legacy-version disallows, so the group is a proper superset of the User-agent: * defaults. Named AI search bots (Perplexity/Applebot/OAI-SearchBot/etc.) now see the same crawl-rate limit and /cgi-bin/ block as fall-through bots.


# =============================================================================
# DISALLOW — Training-only crawlers
# =============================================================================

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: ImagesiftBot
Disallow: /

# Always-block scraper
User-agent: Bytespider
Disallow: /

# Default rules — apply to all crawlers including AI bots
# =============================================================================
# DEFAULT — Googlebot, Bingbot, and all other crawlers
# =============================================================================

User-agent: *
Allow: /
Crawl-delay: 5
Disallow: /cgi-bin/

# Block unmaintained legacy doc versions (already set via noindex + canonical,
# belt-and-braces for crawlers that ignore those signals).
Disallow: /docs/1.0.0/
Disallow: /docs/2.0.0/
Disallow: /docs/3.0.0/
Comment on lines +88 to +92
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy-version Disallow: /docs/1.0.0/ (and 2.0.0/3.0.0) rules are only under User-agent: *, so they will not apply to crawlers that match one of the explicit allow groups above (e.g., PerplexityBot, Applebot, OAI-SearchBot). If the intent is to block those legacy versions for all crawlers, either move the legacy disallows into each explicit allow group (and keep Allow: /), or remove the explicit allow groups entirely and let those bots fall through to User-agent: * (while keeping explicit disallow groups for training bots).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 758cff5. Went with option (a) but consolidated: the 11 AI-search-bot allow groups are now a single block that uses multiple User-agent: headers sharing one rule set, with the three legacy-version Disallow lines (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) applied directly inside it. Same intent ("allow these AI search bots everywhere except legacy versions") but now actually enforced, and only 8 lines of net change instead of 33.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 758cff5 (the earlier commit that moved the legacy disallows inside the named allow group). The /docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/ lines now sit directly under the User-agent: OAI-SearchBot / ChatGPT-User / Claude-SearchBot / ... / Amazonbot block so every allowed AI bot gets the legacy-version block, not just crawlers that fall through to User-agent: *. 56cae9e just now extended the same pattern to Crawl-delay: 5 and Disallow: /cgi-bin/ per your other comment — both global rules are now duplicated inside the named group as well.


# =============================================================================
# Sitemap
# =============================================================================

Sitemap: https://keploy.io/docs/sitemap.xml
Loading