Skip to content

Kahlain/geo-status-extension

Repository files navigation

Inocta GEO Status (Chrome/Edge Side Panel Extension)

This extension analyzes the current page you are viewing and shows a GEO dashboard aimed at AI agent visibility (how likely generative engines are to retrieve/cite your content).

It also generates downloadable helper files:

  • llms.txt (structured summary)
  • markdown.md (best-effort page-to-markdown export)
  • humans.txt (generated from the form in the side panel)

How the extension works (simple overview)

1) Side panel UI (sidepanel.html, sidepanel.js, sidepanel.css)

  • Opens when you click the extension icon.
  • Asks the active tab for an audit using chrome.tabs.sendMessage(...).
  • Renders the score, metrics, file status, and suggestions.
  • Lets you export a JSON audit and download generated files.
  • Includes an in-app Guides popover with three built-in GEO documents (rendered from Markdown).

Header information hierarchy (UI rule)

To keep the side panel easy to scan, the header is organized into three levels:

  • Level 1 — Menu: the top navigation tabs (Dashboard / llms.txt / markdown.md / humans.txt).
  • Level 2 — Actions: text-only buttons (Guides / Open report / Export / Refresh).
  • Level 3 — Signals: the audit context controls (Citation vs Compliance mode + detected intent badge).

2) Page analyzer (content.js)

Runs inside the website tab and returns:

  • geo: scores + detailed metrics + suggestions
  • fileStatus: whether key site files exist (robots/sitemap/llms/humans)
  • robotsInsights: extra context used by the UI (see below)
  • files: generated llms.txt, markdown.md, and humans.txt

Intent model (classic user-goal) + commerce context

The tool separates two concepts:

  • Intent (user goal): what the user is trying to do on this page
    • INFORMATIONAL: learn/understand
    • NAVIGATIONAL: find a destination/hub (homepage, store locator, support)
    • TRANSACTIONAL: complete an action (buy, checkout, signup, booking)
    • COMMERCIAL: evaluate options before buying (comparisons, “best”, reviews)
  • Commerce context (page type): whether the page looks like ecommerce, and what kind
    • page.content_type: ecommerce_homepage, category_page, product_page, search_results_page, cart, checkout, store_locator, unknown
    • page.commerce_intent: evidence signals + a confidence score

This prevents “ecommerce” from becoming a confusing competing intent. Instead, ecommerce is captured as page context while intent stays about user behavior.

How intent affects scoring

  • Citation Score uses intent-aware weights. Different intents weight metrics differently.
  • Compliance Score does not change by intent; it is a Machine & AI Readiness audit (access + identity + basic extractability). It does not measure content quality or citation likelihood.

Sitemap detection (robots-first)

Some sites do not host their sitemap at /sitemap.xml (they might use sitemapindex.xml, .gz files, or multiple sitemap URLs).

To avoid false “missing sitemap” results, the extension now:

  1. Fetches /robots.txt
  2. Extracts all lines like:
    • Sitemap: https://example.com/sitemapindex.xml
  3. Verifies those URLs look like a sitemap (non-HTML response and sitemap/XML markers).
  4. If none are found, it falls back to common paths such as:
    • /sitemap.xml
    • /sitemapindex.xml
    • /sitemap.xml.gz

Verification strategy (fast but robust)

  • If robots.txt lists many sitemaps, the extension verifies them with bounded concurrency and stops once it finds the first valid sitemap (to keep audits fast).
  • If there are only a few sitemap URLs, it verifies all of them.

In the UI, when the sitemap is verified, the sitemap tile links to the real verified sitemap URL, not a forced https://host/sitemap.xml.

AI crawler blocking warning (warning-only)

Your site can score well on-page but still be hard for AI systems to access if robots.txt blocks AI crawlers.

The side panel shows a warning banner when robots.txt appears to block common AI bots, such as:

  • GPTBot
  • Google-Extended
  • CCBot
  • ClaudeBot
  • PerplexityBot
  • Amazonbot

Important:

  • This is informational only (it does not change your GEO score).
  • robots.txt policy is a business/legal decision. The extension warns you so you can make an informed choice.

How “blocked” is decided (more accurate)

The extension parses robots.txt into user-agent groups and rules, then evaluates access using common precedence rules:

  • Most specific path wins (longest matching rule path)
  • If there’s a tie, Allow: beats Disallow:

For the warning banner, we focus on whether each listed AI bot is effectively blocked at the site root (/).

Soft 404 (HTML) vs real files

Some sites return an HTML “404 page” when you request a missing file (example pattern: /404?url=/llms.txt). This is called a soft 404.

The extension labels a file as Soft 404 (HTML) when the response looks like HTML, even if the HTTP status is 200.

To avoid false soft-404s, the extension will not mark a sitemap as soft-404 if the content looks like real XML (e.g., contains <?xml, <urlset, or <sitemapindex) even if the server sends the wrong Content-Type header.

Notes / limitations

  • Some websites block extension content scripts on certain pages or require a refresh after installing/updating the extension.
  • robots.txt rules can still be complex (wildcards, unusual formatting, vendor-specific behaviors). The warning is designed to be helpful, not a legal guarantee.

Export JSON (two types)

The side panel Export menu currently provides:

1) Raw JSON (tool export)

  • A direct export of the audit results and supporting context.
  • Includes (when available) robots_insights, such as:
    • AI crawler blocking (aiBotsBlocked, aiBotsBlockedDetails)
    • sitemap discovery counts
    • selected baseOrigin (helps with www vs non-www sites)
  • Includes page.content_type, page.commerce_intent, and weights to make intent decisions transparent.

2) Report JSON (includes AI prompt)

  • Wraps the Raw JSON export inside a single JSON file that also contains a report_prompt string.
  • Purpose: you can paste the JSON into an AI assistant and ask it to generate a client-friendly GEO report without recalculating any scores.

About

Chrome Extension that scores web pages for AI citation likelihood and technical AI readiness. Built by inocta.io.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors