Inconsistent Google Referrer Spoof #171

MooseJammer42 · 2026-03-06T16:46:17Z

MooseJammer42
Mar 6, 2026

I noticed two inconsistencies in Scrapling's spoofed Google Referrer Header.

Issue 1 — Incorrect `Referrer` Header (Easy Fix)

Google doesn't include the query in the referrer header. It should simply be:

https://www.google.com/

Issue 2 — Incorrect `Sec-Fetch-Site` Header (Complex)

Sec-Fetch-Site should be cross-site if the request is actually coming from Google. The current implementation sends none.

This one is much harder to fix because Sec-Fetch-Site is a Forbidden Request Header — browsers set it automatically and block scripts from modifying it. I was unable to override it via:

page.set_extra_http_headers()
page.route()
A CDP implementation (see below)

Wanted to get your thoughts on this and whether it's even worth addressing.

What I Tried (All Three Approaches in same script = ^ )

async def main_no_context():
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            channel="chrome",
            proxy=None,
            headless=False,
        )
        context = await browser.new_context(no_viewport=True)
        page = await context.new_page()
        state = set(['https://manytools.org/http-html-text/http-request-headers/', 'https://httpbin.org/headers'])

        async def handle_route(route):
            if route.request.url in state:
                await route.continue_(headers={
                    **route.request.headers,
                    "Sec-Fetch-Site": "cross-site",
                    "Referer": "https://www.google.com/"
                })
            else:
                await route.continue_()

        await page.route("**/*", handle_route)

        client = await context.new_cdp_session(page)
        await client.send("Fetch.enable", {"patterns": [{"urlPattern": "*", "requestStage": "Request"}]})

        async def handle_pause(event):
            headers = event.get("request", {}).get("headers", {})

            if (event["request"]["url"] in state and
                event.get("resourceType") == "Document"):  # only the initial GET
                headers["Sec-Fetch-Site"] = "cross-site"
                headers["Referer"] = "https://www.google.com/"

            await client.send("Fetch.continueRequest", {
                "requestId": event["requestId"],
                "headers": [{"name": k, "value": v} for k, v in headers.items()]
            })

        client.on("Fetch.requestPaused", handle_pause)
        page.set_extra_http_headers({"Sec-Fetch-Site": "cross-site"})

        await page.goto('https://manytools.org/http-html-text/http-request-headers/', referer="https://www.google.com/")
        await page.goto('https://httpbin.org/headers')
        await asyncio.to_thread(input, "Press any to continue: ")
        await browser.close()

asyncio.run(main_no_context(), debug=True)

Results of no spoof, current spoof (with updated url), and real google link

page.goto('https://manytools.org/...') — no referer set

page.goto('https://manytools.org/...', referer="https://www.google.com/")

Navigating directly from Google (expected behavior)

D4Vinci · 2026-03-06T22:51:03Z

D4Vinci
Mar 6, 2026
Maintainer

This is an interesting behaviour. I have opened an issue for this discussion to remind me to look into it later here.
But I didn't find a website that checks for this in referers. I think this needs some research.

0 replies

aniruddhaadak80 · 2026-03-10T06:42:12Z

aniruddhaadak80
Mar 10, 2026

The testing here is thorough, and the conclusion makes sense. Fixing the visible Referrer header is straightforward, but once the site is looking at headers like Sec-Fetch-Site, the browser security model becomes the real constraint rather than your implementation.

I would probably document this as a known limit instead of treating it as a bug still waiting to be solved. That makes expectations clearer for users and avoids suggesting that a full spoof is possible when the browser does not actually allow it.

0 replies

MooseJammer42 · 2026-03-10T14:30:31Z

MooseJammer42
Mar 10, 2026
Author

Spoofing Sec-Fetch-Site Header

I was able to do with html inline into page.set_content() then clicking the link. You can use page.route or page.set_extra_http_headers. But setting extra http headers will effect all additional requests to the page so a route implementation would be best. ( I have not looked at other cdp impmentations to do this )

Code

import asyncio
from patchright.async_api import async_playwright


async def spoof_cross_site():
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            channel="chrome",
            proxy=None,
            headless=False,
        )
        context = await browser.new_context(no_viewport=True)
        page = await context.new_page()
        state = set(['https://manytools.org/http-html-text/http-request-headers/','https://httpbin.org/headers'])

        async def handle_route(route):
            if route.request.url in state:
                await route.continue_(headers={
                    **route.request.headers,
                    "Sec-Fetch-Site": "cross-site",
                    "Referer": "https://www.google.com/"
                })
            else:
                await route.continue_()

        await page.route("**/*", handle_route)
        url = "https://httpbin.org/headers"
        await page.set_content(f'<a href="{url}" id="link">Go</a>')
        await page.click("#link")

        await asyncio.to_thread(input, "Press any to continue: ")
        await browser.close()


asyncio.run(spoof_cross_site())

Result

The resulting request headers captured by httpbin:

{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "Accept-Encoding": "gzip, deflate, br, zstd",
    "Accept-Language": "en-US,en;q=0.9",
    "Host": "httpbin.org",
    "Priority": "u=0, i",
    "Referer": "https://www.google.com/",
    "Sec-Ch-Ua": "\"Not:A-Brand\";v=\"99\", \"Google Chrome\";v=\"145\", \"Chromium\";v=\"145\"",
    "Sec-Ch-Ua-Mobile": "?0",
    "Sec-Ch-Ua-Platform": "\"Windows\"",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "cross-site",
    "Sec-Fetch-User": "?1",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/145.0.0.0 Safari/537.36"
  }
}

I believe everything matches up here except for google cookies. Which I heard some people visit google just to grab those and make there scraper look good.

0 replies

MooseJammer42 · 2026-03-11T17:25:27Z

MooseJammer42
Mar 11, 2026
Author

Enhancement

Could not stand the route("**/*") implementation so found out I could use route.fulfill on "google.com/spoof". Alls we have to do then is change the url.

Code

import asyncio
# patchright here!
from patchright.async_api import async_playwright

async def spoof_cross_site():
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            channel="chrome",
            proxy=None,
            headless=False,
        )
        context = await browser.new_context(no_viewport=True)
        page = await context.new_page()

        async def handle_route(route):
            nonlocal url
            await route.fulfill(
                status=200,
                content_type="text/html",
                body=f"""
                    <html>
                        <body>
                            <a id="link" href="{url}" referrerpolicy="origin">Go</a>
                        </body>
                    </html>
                """
            )

        await page.route("https://www.google.com/spoof", handle_route)

        url = 'https://httpbin.org/headers'
        await page.goto("https://www.google.com/spoof")
        await page.click('#link')
        await asyncio.to_thread(input, "Press any to continue: ")
        await browser.close()

asyncio.run(spoof_cross_site())

Notice referrerpolicy="origin" shortens "https://www.google.com/spoof" to "https://www.google.com/" in the headers like how google actually does it

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent Google Referrer Spoof #171

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Inconsistent Google Referrer Spoof #171

Uh oh!

MooseJammer42 Mar 6, 2026

Issue 1 — Incorrect Referrer Header (Easy Fix)

Issue 2 — Incorrect Sec-Fetch-Site Header (Complex)

What I Tried (All Three Approaches in same script = ^ )

Results of no spoof, current spoof (with updated url), and real google link

Replies: 4 comments

Uh oh!

D4Vinci Mar 6, 2026 Maintainer

Uh oh!

aniruddhaadak80 Mar 10, 2026

Uh oh!

MooseJammer42 Mar 10, 2026 Author

Spoofing Sec-Fetch-Site Header

Code

Result

Uh oh!

MooseJammer42 Mar 11, 2026 Author

Enhancement

Code

MooseJammer42
Mar 6, 2026

Issue 1 — Incorrect `Referrer` Header (Easy Fix)

Issue 2 — Incorrect `Sec-Fetch-Site` Header (Complex)

D4Vinci
Mar 6, 2026
Maintainer

aniruddhaadak80
Mar 10, 2026

MooseJammer42
Mar 10, 2026
Author

MooseJammer42
Mar 11, 2026
Author