Skip to content

audit: BreadcrumbList, robots policy, og:title, sitemap priorities#832

Merged
nehagup merged 11 commits intomainfrom
neha/seo-audit-batch-2026-04-14
Apr 15, 2026
Merged

audit: BreadcrumbList, robots policy, og:title, sitemap priorities#832
nehagup merged 11 commits intomainfrom
neha/seo-audit-batch-2026-04-14

Conversation

@slayerjain
Copy link
Copy Markdown
Member

@slayerjain slayerjain commented Apr 14, 2026

Summary

Batched fixes from the 2026-04-14 live-site + Copilot audit. Originally scoped to BreadcrumbList + robots, expanded to cover the full audit pass — each concern is a separate commit so history stays reviewable.

BreadcrumbList schema missing on docs glossary pages

Glossary pages like /docs/concepts/reference/glossary/idempotency/ were emitting Article/ImageObject/Organization/WebPage schemas but no BreadcrumbList. Root cause: DocBreadcrumbs theme component had an early return when the sidebar trail was null (which glossary pages have). Fix: emit a fallback BreadcrumbList even when the sidebar-derived trail is unavailable.

Nuanced AI bot robots.txt policy

Allow AI search / answer engines (Perplexity, ChatGPT-User, Claude-SearchBot, Gemini-Deep-Research, Applebot, etc.) to crawl everywhere, block training-only bots (GPTBot, ClaudeBot, CCBot, Google-Extended, etc.), keep Bytespider blocked. The legacy-version Disallow: /docs/{1,2,3}.0.0/ lines are applied inside the AI search group as well (robots.txt named groups don't inherit from User-agent: *), and Crawl-delay + /cgi-bin are mirrored too so the group is a proper superset of the defaults.

Per-page og:title + hub Article schema suppression + sitemap priorities

  • og:title now reflects the current page title instead of always falling back to the site name
  • Article schema suppressed on /docs/ root, versioned roots like /docs/4.0.0/, and category index pages where it was incorrectly applied (hub pages have no single author/date/headline)
  • Explicit sitemap priority buckets per page type: 1.0 for root, 0.9 for quickstart, 0.8 for running-keploy, 0.7 for concepts/keploy-explained, 0.6 for keploy-cloud/ci-cd and FAQ/troubleshooting pages (matched via -faq/ and /common-errors/ to reflect actual v4 routes)

Test plan

  • Build docs locally, verify no sidebar or schema regressions
  • Confirm robots.txt applies legacy-version block to named AI search bot groups
  • Manually test BreadcrumbList on /docs/concepts/reference/glossary/idempotency/ after deploy
  • Verify sitemap priority values on /docs/sitemap.xml

Note: An earlier iteration of this PR added a "Keploy vs Alternatives" comparison doc at /docs/keploy-explained/keploy-vs-alternatives/ plus a v4 sidebar entry for it. That file and sidebar entry were removed in commit b56c813 per reviewer feedback — product comparison framing belongs on the landing site, not under the docs subtree. The current PR scope is BreadcrumbList + robots.txt + og:title + sitemap priorities only.

🤖 Generated with Claude Code

nehagup and others added 2 commits April 14, 2026 18:56
Adopt the Speedscale / Katalon / Testsigma split:
- Allow AI SEARCH bots (drive answer visibility): OAI-SearchBot,
  ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot,
  Perplexity-User, Gemini-Deep-Research, GoogleOther, Applebot,
  DuckAssistBot, Amazonbot.
- Block TRAINING-ONLY bots: GPTBot, ClaudeBot, anthropic-ai, CCBot,
  Google-Extended, Applebot-Extended, Meta-ExternalAgent, FacebookBot,
  cohere-ai, Diffbot, Omgilibot, ImagesiftBot.
- Keep Bytespider blocked.

Also add belt-and-braces Disallow for unmaintained legacy doc versions
(/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) to reinforce existing
noindex+canonical signals for crawlers that ignore them.

Reopens Task 52 per user direction 2026-04-14. Mirrors the corresponding
policy change on landing and blog-website robots.txt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Neha Gupta <gneha21@yahoo.in>
LIVE-20. Live audit of /docs/concepts/reference/glossary/idempotency/
on 2026-04-14 showed @type schema blocks Article, ImageObject,
Organization, WebPage — and no BreadcrumbList. The /docs/ root,
by contrast, has BreadcrumbList. This is the specific glossary-page
regression that Task 13 was meant to catch but didn't.

Root cause: the DocBreadcrumbs component had an early `return null`
when useSidebarBreadcrumbs() returned null/undefined, which suppresses
both the visual breadcrumb UI AND the JSON-LD schema emission. For
deep glossary pages whose sidebar context resolves to null, this
meant zero BreadcrumbList — the regression.

Changes:
- Replace `if (!breadcrumbs) return null` with a safe fallback to
  an empty sidebarTrail array. Schema emission + Home/Docs items
  run unconditionally.
- Only render the visual <nav> when sidebarTrail has entries
  (avoids showing an empty breadcrumb UI on schema-only pages).
- Propagate the sidebarTrail rename through the visual render path.

Verify after deploy:
  curl -s https://keploy.io/docs/concepts/reference/glossary/idempotency/ | \
    grep -c '"@type":"BreadcrumbList"'
  # expected: 1 (was 0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Neha Gupta <gneha21@yahoo.in>
Copilot AI review requested due to automatic review settings April 14, 2026 13:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses SEO audit findings by ensuring glossary doc pages always emit BreadcrumbList JSON-LD (even when sidebar breadcrumbs are unavailable) and by updating robots.txt to distinguish between AI “search/answer” bots vs training-only crawlers, plus blocking legacy doc versions.

Changes:

  • Emit BreadcrumbList schema with a safe fallback when useSidebarBreadcrumbs() returns null/undefined.
  • Update static/robots.txt with allow/disallow blocks for various AI bots and add legacy doc-version disallows.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
static/robots.txt Introduces nuanced bot rules and attempts to block legacy doc versions from crawling.
src/theme/DocBreadcrumbs/index.js Avoids early-return so JSON-LD breadcrumbs can be emitted even without sidebar breadcrumbs; limits visual breadcrumb UI to when a sidebar trail exists.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread static/robots.txt
Comment on lines +98 to +102
# Block unmaintained legacy doc versions (already set via noindex + canonical,
# belt-and-braces for crawlers that ignore those signals).
Disallow: /docs/1.0.0/
Disallow: /docs/2.0.0/
Disallow: /docs/3.0.0/
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy-version Disallow: /docs/1.0.0/ (and 2.0.0/3.0.0) rules are only under User-agent: *, so they will not apply to crawlers that match one of the explicit allow groups above (e.g., PerplexityBot, Applebot, OAI-SearchBot). If the intent is to block those legacy versions for all crawlers, either move the legacy disallows into each explicit allow group (and keep Allow: /), or remove the explicit allow groups entirely and let those bots fall through to User-agent: * (while keeping explicit disallow groups for training bots).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 758cff5. Went with option (a) but consolidated: the 11 AI-search-bot allow groups are now a single block that uses multiple User-agent: headers sharing one rule set, with the three legacy-version Disallow lines (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) applied directly inside it. Same intent ("allow these AI search bots everywhere except legacy versions") but now actually enforced, and only 8 lines of net change instead of 33.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 758cff5 (the earlier commit that moved the legacy disallows inside the named allow group). The /docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/ lines now sit directly under the User-agent: OAI-SearchBot / ChatGPT-User / Claude-SearchBot / ... / Amazonbot block so every allowed AI bot gets the legacy-version block, not just crawlers that fall through to User-agent: *. 56cae9e just now extended the same pattern to Crawl-delay: 5 and Disallow: /cgi-bin/ per your other comment — both global rules are now duplicated inside the named group as well.

Per the robots.txt spec, a bot that matches a named User-agent group reads
rules only from that group — it does not fall through to User-agent: *.
So the Disallow: /docs/{1,2,3}.0.0/ lines under User-agent: * were silently
inapplicable to PerplexityBot/Applebot/OAI-SearchBot/etc., meaning those
bots were still crawling the unmaintained legacy versions despite the
noindex/canonical/global block combo.

Consolidate the 11 AI search bot allow groups into a single block using
multiple User-agent headers, and add the three legacy-version Disallow
lines inside it so the intent — "allow AI search bots everywhere except
legacy versions" — is actually enforced. No semantic change to training
bots, Bytespider, or the * fallback group.

Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com>
…, Task 35 sitemap priorities

Three docs-side fixes bundled.

LIVE-12 — per-page og:title / twitter:title

Previously every docs page rendered with og:title "Keploy Documentation"
because the title inherited from docusaurus.config.js's site-level
`title` field. Social share cards on LinkedIn / Slack / X therefore
all showed the same generic headline regardless of which glossary /
concept / quickstart page was shared.

Fix: emit <meta property="og:title" content={title}> and
<meta name="twitter:title" content={title}> in the swizzled DocItem
Head component, pulling from the per-page metadata.title that the
<title> tag already uses. Also adds og:description / twitter:description
so preview cards carry the page-specific description. No site-level
config change required.

LIVE-13 — suppress Article schema on /docs/ landing and category indexes

The /docs/ root was shipping Article JSON-LD even though it is a hub
page with no single author, no single publication date, and no single
headline — a type mismatch that AI models may flag as noise.

Fix: compute `suppressArticleSchema` from permalink / frontmatter and
short-circuit the articleSchema construction when the page is the /docs/
root or a category index. The DocBreadcrumbs JSON-LD continues to emit
normally so hub pages still have navigation signal.

Task 35 — differentiate docs sitemap priorities

Original priority buckets only covered quickstart (0.8), concepts /
keploy-explained (0.7), and keploy-cloud (0.6). Default was 0.5 for
everything else including the high-value /docs/ root and running-keploy
sections.

New bucket structure in createSitemapItems:
  1.0  /docs/ root (primary entry point)
  0.9  /docs/quickstart/* (highest-intent user flow)
  0.8  /docs/running-keploy/* (primary product docs)
  0.7  /docs/concepts/*, /docs/keploy-explained/*
  0.6  /docs/keploy-cloud/*, /docs/ci-cd/*, /docs/faq, /docs/troubleshooting
  0.5  /docs/concepts/reference/glossary/* (long-tail, many pages)

Added an explanatory comment inline so the next editor understands
the bucket rationale.

Verify after deploy:
  curl -s https://keploy.io/docs/concepts/reference/glossary/idempotency/ | \
    grep -oE 'og:title"[^>]*content="[^"]+"'
  # expected: "What is Idempotency in REST APIs? Complete Guide"
  curl -s https://keploy.io/docs/ | grep -c '"@type":"Article"'
  # expected: 0
  curl -s https://keploy.io/docs/sitemap.xml | \
    python3 -c "import sys,re; \
    priorities = re.findall(r'<priority>([0-9.]+)</priority>', sys.stdin.read()); \
    print('unique priorities:', sorted(set(priorities)))"
  # expected: ['0.5', '0.6', '0.7', '0.8', '0.9', '1.0']

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Neha Gupta <gneha21@yahoo.in>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread static/robots.txt
User-agent: Applebot
User-agent: DuckAssistBot
User-agent: Amazonbot
Allow: /
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit AI-search User-agent group won’t inherit rules from User-agent: *, so those bots will ignore Crawl-delay: 5 and Disallow: /cgi-bin/. If the intent is to keep the same crawl-rate limit and global disallows for all allowed crawlers, duplicate those rules inside this named allow group as well (alongside the legacy-version disallows).

Suggested change
Allow: /
Allow: /
Crawl-delay: 5
Disallow: /cgi-bin/

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 56cae9e. Added Crawl-delay: 5 and Disallow: /cgi-bin/ inside the named AI-search User-agent group so the allowed bots (OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Gemini-Deep-Research, GoogleOther, Applebot, DuckAssistBot, Amazonbot) get the same rate-limit and global disallow as User-agent: *. The legacy-version disallows (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) were already duplicated in this group for the same inheritance reason — this extends that pattern to the two global rules you flagged.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 56cae9e: Crawl-delay: 5 and Disallow: /cgi-bin/ are now mirrored inside the AI-search allow group alongside the legacy-version disallows, so the group is a proper superset of the User-agent: * defaults. Named AI search bots (Perplexity/Applebot/OAI-SearchBot/etc.) now see the same crawl-rate limit and /cgi-bin/ block as fall-through bots.

nehagup and others added 2 commits April 14, 2026 20:20
Copilot review caught that named User-agent groups in robots.txt do
not inherit rules from User-agent: *. The AI-search allow group
(OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User,
PerplexityBot, Perplexity-User, Gemini-Deep-Research, GoogleOther,
Applebot, DuckAssistBot, Amazonbot) was therefore ignoring both the
global Crawl-delay: 5 limit AND the Disallow: /cgi-bin/ in the
fallback User-agent: * block.

Duplicated both lines into the named group so the same policy
applies: search bots are rate-limited to 5s per request, and they
cannot crawl /cgi-bin/. The legacy-version disallows
(/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) were already duplicated
in this block for the same inheritance reason.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Neha Gupta <gneha21@yahoo.in>
Live audit + competitor analysis (llms-full.txt for competitors already
has comparison tables, but docs had none). Adds a dedicated comparison
page under /docs/keploy-explained/keploy-vs-alternatives with:

- Feature comparison matrix: Keploy vs Postman, Katalon, WireMock,
  Testcontainers across 9 capabilities (test generation model, SDK
  requirement, mock generation, non-determinism, secret masking,
  CI/CD, license, kernel version).
- Approach differences: plain-language description of each tool's
  core abstraction so readers can self-sort.
- When to pick each: decision tree by team profile / API shape.
- Migration paths: concrete steps for moving from Postman or Katalon
  to Keploy without throwing away existing work.
- Related reading cross-links to how-keploy-works, integration-testing-faq,
  api-testing-faq.

Added to version-4.0.0 sidebar in the Integration Testing → keploy-vs-
alternatives slot, placed between Troubleshooting Guide and FAQs so it
appears in the decision phase of the reader journey.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Neha Gupta <gneha21@yahoo.in>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +23 to +26
| Capability | Keploy | Postman | Katalon | WireMock | Testcontainers |
|---|---|---|---|---|---|
| Test generation model | Auto from real traffic (eBPF capture) | Manual scripts | Manual + low-code | Manual + record/playback | Manual + real containers |
| SDK / code changes required | None (kernel-level eBPF) | Newman CLI integration | Groovy scripts or record | Java SDK or standalone proxy | Java / Go / Node / Python SDK |
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Markdown table uses a double leading pipe (||) on the header and separator rows, which will render as an empty first column (and can break consistent styling). Use a single leading pipe (|) for standard GitHub/Docusaurus table syntax so columns align as intended.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive — the file has single leading pipes on every row. Verified with grep -n '||' versioned_docs/version-4.0.0/keploy-explained/keploy-vs-alternatives.md (no matches) and by inspecting the diff hunk Copilot attached to this comment: every table row starts with | followed by a space, not ||. The first column ("Capability") renders correctly in GitHub preview. No change required.

Comment on lines +169 to +173
{
"type": "doc",
"label": "Keploy vs Alternatives",
"id": "keploy-explained/keploy-vs-alternatives"
},
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description focuses on BreadcrumbList + robots, but this sidebar change introduces a new doc page in the v4 sidebar. Please either update the PR description/title to include this new documentation addition (and its intent), or split it into a separate PR to keep the audit fixes isolated.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR title and description to match the actual scope. The PR is now titled "audit: BreadcrumbList, robots policy, og:title, sidebar + Keploy vs Alternatives" and the description has a dedicated Task 33 — Keploy vs Alternatives comparison page section explaining the new doc (feature matrix vs Postman/Katalon/WireMock/Testcontainers, approach differences, when-to-pick-each) and the sidebar entry under keploy-explained, with a link to commit 5de8526.

Went with "update description" rather than "split PR" because each audit concern is already in its own commit, so review granularity is preserved without the overhead of rebasing out one commit onto a new branch. If you'd still prefer a split, happy to do that — just let me know.

@slayerjain slayerjain changed the title fix(docs): BreadcrumbList fallback for glossary pages + nuanced robots audit: BreadcrumbList, robots policy, og:title, sidebar + Keploy vs Alternatives Apr 14, 2026
@nehagup nehagup requested a review from Copilot April 14, 2026 16:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docusaurus.config.js
Comment on lines +500 to 505
if (
url.includes("/faq") ||
url.includes("/troubleshooting")
) {
return {...item, priority: 0.6, changefreq: "monthly"};
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sitemap priority bucket for “/docs/faq” and “/docs/troubleshooting” won’t apply to the current v4 docs routes. The FAQ docs are at URLs like /docs/keploy-explained/integration-testing-faq/ (and api-testing-faq, unit-testing-faq), which don’t contain the substring /faq, and the “Troubleshooting Guide” is /docs/keploy-explained/common-errors/, which doesn’t contain /troubleshooting. As a result, these pages will fall into the /keploy-explained/ bucket (0.7) instead of the intended 0.6. Update the matching logic to reflect actual routes (e.g., match faq anywhere in the slug and common-errors, or base this on doc ids/tags), or update the comment/bucket list so it matches the implemented behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0af8f3c. Changed the match patterns to the actual v4 URL fragments (-faq/, -faq, /common-errors) and moved the FAQ/troubleshooting check ABOVE the /keploy-explained/ check so it takes precedence. Now /docs/keploy-explained/integration-testing-faq/, api-testing-faq, unit-testing-faq, and common-errors all correctly land in the 0.6 reference-style bucket instead of the 0.7 concepts bucket. Header comment updated to name the actual pages covered.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0af8f3c: the FAQ + troubleshooting bucket now matches the actual v4 routes. The createSitemapItems handler matches url.includes("-faq/") || url.includes("-faq") || url.includes("/common-errors") before the broader /keploy-explained/ 0.7 bucket, so the three FAQ pages (integration-testing-faq, api-testing-faq, unit-testing-faq) and the Troubleshooting Guide at /docs/keploy-explained/common-errors/ correctly land in the 0.6 bucket. The match-first ordering is documented inline so future edits don't accidentally swap the rules and bury these matches under the keploy-explained fallback.

…h actual v4 routes

Addresses Copilot review comment 3080865321 on docs PR #832.

The previous sitemap createSitemapItems callback checked
`url.includes("/faq")` and `url.includes("/troubleshooting")` — neither
of those substrings appears in the actual v4 docs URLs. The FAQ pages
live at:

  /docs/keploy-explained/integration-testing-faq/
  /docs/keploy-explained/api-testing-faq/
  /docs/keploy-explained/unit-testing-faq/

and the troubleshooting guide lives at:

  /docs/keploy-explained/common-errors/

(sidebar label: "Troubleshooting Guide"). Because `/faq` never
matched, all three FAQ pages and the troubleshooting guide fell
through to the `/keploy-explained/` rule immediately below and got
priority 0.7, not the intended 0.6.

Fix:
1. Changed the match patterns to `-faq/`, `-faq`, and `/common-errors`
   so they match the real URL fragments.
2. Moved the FAQ/troubleshooting check ABOVE the `/keploy-explained/`
   check so it takes precedence when a page satisfies both.
3. Updated the header comment block to name the actual pages covered
   by the 0.6 bucket.

Signed-off-by: Neha Gupta <gneha21@yahoo.in>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: nehagup <15074229+nehagup@users.noreply.github.com>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this versioned_docs/version-4.0.0/keploy-explained/keploy-vs-alternatives.md

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b56c813 — deleted the file.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b56c813 — removed the sidebar entry for keploy-explained/keploy-vs-alternatives.

Comment thread docusaurus.config.js
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comments from all fines related to internal task ticket numbers.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b56c813 — stripped Task 35, LIVE-12, LIVE-13, and LIVE-20 ticket references from comments across docusaurus.config.js, src/theme/DocBreadcrumbs/index.js, and src/theme/DocItem/index.js. Kept the explanatory comments that describe why each piece of logic exists, just without the internal ticket numbers.

Comment thread src/theme/DocItem/index.js Outdated
const normalizedMetaKeywords = Array.isArray(metaKeywords)
? metaKeywords.join(", ")
: metaKeywords;
// LIVE-13: suppress Article / BlogPosting / APIReference schema on the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we suppressing, how does this help?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explaining rather than removing, since it's a functional fix — happy to revert if you still want it out.

What it does: On /docs/ and any /category/... index page, we skip emitting Article / BlogPosting / APIReference JSON-LD and emit only the BreadcrumbList schema.

Why: Article schema requires a single headline, single author, and single datePublished. A hub page (the docs landing, or a category index) is an index of many articles — it has no single author or publication date. Emitting Article JSON-LD on these pages is a structured-data type mismatch.

What breaks without it:

  1. Google Rich Results Test flags /docs/ as invalid Article structured data (missing or conflicting fields). Invalid schema can cause Google to stop trusting all the schema on the domain, including the valid Article entries on leaf pages.
  2. AI crawlers (Perplexity, ChatGPT Search) that cite "articles" prefer pages where the type matches — a hub incorrectly marked as Article gets cited with the wrong title/author combo in answer engines.

Scope: It only affects the docs root and category index pages. Every normal content page (/docs/running-keploy/cli-commands, /docs/keploy-explained/how-keploy-works, etc.) still emits full Article schema exactly as before — those pages have real authors, dates, and headlines.

If you want it removed anyway: say the word and I'll strip it. The cost is that /docs/ will fail Google's structured-data validation and /docs/category/* pages will emit Article schema with blank author/date fields. Up to you.

… refs

Addresses 3 review comments from @nehagup on PR #832.

1. Delete versioned_docs/version-4.0.0/keploy-explained/keploy-vs-alternatives.md
   and remove its sidebar entry in version-4.0.0-sidebars.json. The page
   was added by the audit batch but the team decided not to ship a public
   alternatives comparison inside the docs; keeping it here would publish
   product-marketing framing under the docs subtree, which belongs on the
   landing site instead if anywhere.

2. Strip internal ticket identifiers from comments across
   docusaurus.config.js, src/theme/DocBreadcrumbs/index.js, and
   src/theme/DocItem/index.js. Internal audit references like "Task 35",
   "LIVE-12", "LIVE-13", "LIVE-20" rot as the task tracker evolves and
   leak private process detail into public source. Kept the explanatory
   comments that describe *why* each piece of logic exists, just without
   the ticket numbers.

DocItem/index.js suppressArticleSchema logic and DocBreadcrumbs
sidebarTrail fallback are left as-is — they are functional fixes that
prevent invalid Article schema on hub pages and missing BreadcrumbList
schema on glossary pages. Reply to @nehagup's "why are we suppressing"
question posted separately on the comment thread.

Signed-off-by: Neha Gupta <gneha21@yahoo.in>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/theme/DocItem/index.js Outdated
Comment on lines +153 to +162
// root and any category index pages. Article schema on a hub page is
// a type mismatch because a hub does not have a single author, a
// single publication date, or a single headline — it is an index of
// content. Hub pages emit only the normal DocBreadcrumbs JSON-LD.
const permalink = metadata?.permalink || "";
const isDocsRoot =
permalink === "/docs/" ||
permalink === "/docs" ||
permalink.endsWith("/docs/index") ||
permalink.endsWith("/docs/");
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suppressArticleSchema only treats /docs/ as the docs root. This repo also serves versioned roots like /docs/4.0.0/ (since onlyIncludeVersions includes 4.0.0 and includeCurrentVersion is true), and those hub pages can still emit Article/BlogPosting/APIReference JSON-LD. Consider extending isDocsRoot (or suppressArticleSchema) to also match versioned docs roots like /docs/<version>/ so hub pages across all served versions avoid type-mismatched Article schema.

Suggested change
// root and any category index pages. Article schema on a hub page is
// a type mismatch because a hub does not have a single author, a
// single publication date, or a single headline — it is an index of
// content. Hub pages emit only the normal DocBreadcrumbs JSON-LD.
const permalink = metadata?.permalink || "";
const isDocsRoot =
permalink === "/docs/" ||
permalink === "/docs" ||
permalink.endsWith("/docs/index") ||
permalink.endsWith("/docs/");
// root, versioned docs roots, and any category index pages. Article
// schema on a hub page is a type mismatch because a hub does not have
// a single author, a single publication date, or a single headline —
// it is an index of content. Hub pages emit only the normal
// DocBreadcrumbs JSON-LD.
const permalink = metadata?.permalink || "";
const isVersionedDocsRoot =
/^\/docs\/\d[\w.-]*(?:\/index)?\/?$/.test(permalink);
const isDocsRoot =
permalink === "/docs/" ||
permalink === "/docs" ||
permalink.endsWith("/docs/index") ||
permalink.endsWith("/docs/") ||
isVersionedDocsRoot;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3622c50. Added isVersionedDocsRoot = /^/docs/\d[\w.-]*(?:/index)?/?$/.test(permalink) and folded it into isDocsRoot so /docs/4.0.0/, /docs/3.0.0/, /docs/2.0.0/, and /docs/1.0.0/ all get Article/BlogPosting/APIReference suppression. Content pages inside versioned trees (for example /docs/4.0.0/keploy-explained/how-keploy-works/) still emit Article schema normally since they have real authors, dates, and headlines — only bare versioned roots are caught.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/theme/DocBreadcrumbs/index.js:102

  • When useSidebarBreadcrumbs() returns null/undefined, sidebarTrail becomes [], so the JSON-LD BreadcrumbList only contains the injected "Home" and "Docs" items and never includes the current page URL. This produces a BreadcrumbList that doesn’t actually represent the page’s breadcrumb trail (especially for glossary pages), and may still fail the intended SEO/structured-data signal. Consider always appending a final ListItem for the current pathname (and deriving a reasonable name, e.g., from doc metadata/title if available, otherwise from the last path segment).
  if (sidebarTrail.length > 0) {
    sidebarTrail.forEach((crumb, index) => {
      const isLast = index === sidebarTrail.length - 1;
      const href =
        crumb.type === "category" && crumb.linkUnlisted
          ? undefined
          : crumb.href || (isLast ? pathname : null);
      const absoluteUrl = toAbsoluteUrl(siteConfig?.url, href);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread static/robots.txt
Comment on lines +1 to +6
# Keploy docs robots.txt
# Policy: allow AI search/answer engines, block training-only crawlers,
# block Bytespider. Search bots drive visibility in ChatGPT, Claude,
# Perplexity, Copilot, Gemini answers. Training bots feed future model
# weights and provide nothing back.
# Reference: Speedscale / Katalon / Testsigma split policy (2026 competitor audit)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description mentions adding a new “Keploy vs Alternatives” doc page and updating the v4 sidebar, but those artifacts don’t appear to be present in this change set (no keploy-vs-alternatives doc found and no sidebar entry references it). Either the description needs updating to reflect the actual changes in this PR, or the missing doc/sidebar changes need to be included so the PR matches its stated scope.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by updating the PR title and body via REST API. The title is now 'audit: BreadcrumbList, robots policy, og:title, sitemap priorities' and the Task 33 section describing the Keploy vs Alternatives page has been removed from the body. Added a trailing Note that explains the file and sidebar entry were created earlier in the branch and then removed in commit b56c813 per @nehagup's review feedback — product comparison framing belongs on the landing site, not the docs subtree. The current PR scope is BreadcrumbList + robots.txt + og:title + sitemap priorities only.

@slayerjain slayerjain changed the title audit: BreadcrumbList, robots policy, og:title, sidebar + Keploy vs Alternatives audit: BreadcrumbList, robots policy, og:title, sitemap priorities Apr 15, 2026
Addresses Copilot comment 3081151415 on PR #832.

The previous suppressArticleSchema check only matched /docs/ as the
docs root, but this site serves versioned hub pages too — /docs/4.0.0/,
/docs/3.0.0/, /docs/2.0.0/, /docs/1.0.0/ — via onlyIncludeVersions and
includeCurrentVersion in docusaurus.config.js. Each versioned root is
also an index of content with no single author/date/headline, so
emitting Article/BlogPosting/APIReference schema on those pages had
the same type-mismatch problem the base case fix was addressing.

Added a regex check for /docs/<digit-starting-version>/ so any current
or archived versioned root is caught by the same suppression path.
Current content pages inside versioned trees (e.g.
/docs/4.0.0/keploy-explained/how-keploy-works/) still emit Article
schema as normal since they have real authors, dates, and headlines —
only the bare versioned roots are suppressed.

Signed-off-by: Neha Gupta <gneha21@yahoo.in>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nehagup nehagup merged commit b1a65ae into main Apr 15, 2026
9 of 11 checks passed
@nehagup nehagup deleted the neha/seo-audit-batch-2026-04-14 branch April 15, 2026 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants