Skip to content

custom domains: middleware and verification#2897

Open
Soxasora wants to merge 160 commits intostackernews:masterfrom
Soxasora:feat/custom-domains-base
Open

custom domains: middleware and verification#2897
Soxasora wants to merge 160 commits intostackernews:masterfrom
Soxasora:feat/custom-domains-base

Conversation

@Soxasora
Copy link
Copy Markdown
Member

@Soxasora Soxasora commented Mar 31, 2026

Description

Based on #2904, updated to Next.js 16

Part of #1942, revives #2145.
Introduces custom domains via Next.js proxy. This PR focuses on domain creation and verification.

The proxy detects custom domains and applies redirects and rewrites to generate a territory-centric SN version. Territory paths lose the ~subName prefix, and attempts to go outside of the domain's territory are corrected via redirect.1

Verification is a multi-step repeating process that involves DNS verification, AWS certificates creation/verification and attachment to the AWS load balancer listener.
AWS is simulated with Localstack on sndev.

More infos are available inside docs/dev/custom-domains.md (todo: not finished)

Screenshots

Screenshot 2026-04-08 at 17 49 36

Domain verification starts right when a custom domain is saved

Screenshot 2026-04-08 at 17 33 21

sndev: DNS is simulated with dnsmasq

Screenshot 2026-04-08 at 17 34 01

sndev: AWS certificates (ACM) are simulated with Localstack and are automatically verified on the next call

Screenshot 2026-04-08 at 17 37 53

custom domain is verified

Screenshot 2026-04-08 at 17 38 30

custom domain overview

Screenshot 2026-04-08 at 17 40 13

Concerns and known issues

  • s3 cors is not dynamic
  • cannot signup or login from scratch
  • polling time of domain verification (territory domain form) can be faster than 30 seconds
  • SN_TERRITORY_PATHS is hardcoded and may drift
    • If new territory-scoped routes are added, this list must be manually updated.
  • Redundant index on domainName
    • *claude thought
    • @@index([domainName])
    • There's already a unique constraint on domainName (line 918: @unique), which implicitly creates a B-tree index. The explicit @@index([domainName]) is redundant.
  • seo thumbnails not working for custom domains but we could open capture.stacker.news to other custom domains.
    • dynamic?
  • there are 5 db triggers and probably we don't need that much obscure behavior
    • triggers like the deletion of domain certificates on territory STOPPED status can be handled via JS, but at the same time these triggers help us handle a custom domain lifecycle in the safest way possible.
    • needs to be checked.

Checklist

Are your changes backwards compatible? Please answer below:
Yes, everything is an addition. An exception is the way we handle S3 via localstack, now the container is called AWS and also allows ACM simulations other than S3.

On a scale of 1-10 how well and how have you QA'd this change and any features it might affect? Please answer below:
7, QA OK, domain verification handles errors correctly and follows its step, middleware correctly detects a custom domain and obtains its connected sub.

For frontend changes: Tested on mobile, light and dark mode? Please answer below:
Yes, it uses standard SN colors and styles that fits with theme changes.

Did you introduce any new environment variables? If so, call them out explicitly here:
An endpoint for localstack that can be contacted by our worker container
LOCALSTACK_ENDPOINT=http://aws:4566

local DNS server via dnsmasq, with fallback to 1.1.1.1 in both dnsmasq and code.
DOMAINS_DNS_SERVER=172.20.0.2:53

tells Next.js what domains (sox.pizza) or wildcards (**.pizza) can be trusted for HMR in dev.
ALLOWED_DEV_ORIGINS=**.sndev

custom domains debug logs are gated by env var, by default they're disabled (0)
NEXT_PUBLIC_CUSTOM_DOMAINS_DEBUG=0

Did you use AI?
review


Note

High Risk
Touches request routing middleware and introduces new background jobs + DB triggers that manage domain state and AWS certificate attachment, which can break navigation or incorrectly attach/detach certs if misconfigured. Also adds new persistence and lifecycle automation in production-critical infrastructure (routing, cert management).

Overview
Enables beta territory custom domains by adding a Next.js proxy middleware that detects mapped domains, injects x-stacker-news-subname, and rewrites/redirects requests so territory routes work without the ~subName prefix while preserving existing referral/security header behavior.

Adds a full domain lifecycle: new Prisma models/migration + triggers for domains/verification records/attempts/certificates, GraphQL domain/setDomain APIs (owner + beta-gated) that schedule pgboss verification jobs, and a worker-driven multi-step verifier that checks DNS CNAME, requests/polls ACM certs, and attaches/detaches certs to an ALB listener (with Localstack + mocked ELBv2 in dev) plus periodic cleanup of long-held domains.

Updates UI/navigation and URL handling to be domain-aware (new DomainProvider, prefix/nav-key helpers, external territory link handling), adds a custom domain management form under territory edit, and extends dev/docker env to run a Localstack aws service and a new domains compose profile (plus new debug/HMR env vars).

Reviewed by Cursor Bugbot for commit 9eab96c. Bugbot is set up for automated code reviews on this repo. Configure here.

Footnotes

  1. I'm not sure if this should be the behavior, maybe it's better to redirect/open a new tab to stacker.news

Soxasora and others added 30 commits May 1, 2025 18:38
- ACM support
- custom domains crud, resolvers, fragments
- custom domains form, guidelines
- custom domains context
- domain verification every 5 minutes via pgboss
- domain validation schema
- basic custom domains middleware, to be completed
- TODOs tracings
- CustomDomain -> Domain
- DomainVerification table
- CNAME, TXT, SSL verification types
- WIP DomainVerification upsert
…ange status of a Record from its Attempt, multi-purpose dns verification
- use DomainVerificationStatus enum for domains and records
- adapt Territory Form UI to new schema
- return 'records' as an object with its types
- wip: prepare for attempts and certificate usage for prisma
fix:
- fix setDomain mutation transaction
- fix schema typedefs

enhance:
- DNS records guidelines with flex-wrap for longer records

cleanup:
- add comments to worker
- remove console.log on validation values
… HOLD

handle territory changes via triggers
- on territory stop, HOLD the domain
- on territory takeover from another user, delete the domain and its associated records

handle ACM certificates via trigger
- on domain/domainCertificate deletion, ask ACM to delete the certificate via a pgboss job; removes the need to ask ACM in multiple places

clear domains that have been on HOLD for more than 30 days, check every midnight via pgboss schedule

use 'domains' profile for worker jobs
Comment thread lib/validate.js Outdated
Comment thread worker/domainVerification.js Outdated
Comment thread lib/validate.js
Comment thread components/territory-domains.js Outdated
Comment thread proxy.js
Comment thread api/resolvers/domain.js
Comment thread proxy.js
Comment thread worker/domainVerification.js Outdated
Comment thread worker/domainVerification.js Outdated
Comment thread api/resolvers/domain.js
Comment thread components/form.js
{append}
</span>
)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CopyButton append mode never shows copied visual feedback

Low Severity

The new append rendering path in CopyButton renders a static {append} element and never reflects the copied state visually. The default button mode swaps its content to a Thumb icon on success, but the append mode always renders the same ClipboardLine icon regardless of whether the copy succeeded. Users relying on the clipboard icon (used in DomainGuidelines) get no inline feedback beyond the toast notification.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 616ce1f. Configure here.

Copy link
Copy Markdown
Member

@huumn huumn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks sane and well organized to me. Hard to tell exactly how it'll interact with all the reverse proxies, but that's what the admin gating is for.

Not blocking, but I had a bit of trouble navigating the state machine.

IMO there a few areas where my confidence is not super high:

  1. state machine related issues
  2. interactions with aws (out of our control mostly until we're in prod)
  3. header handling in the proxy (you made it dead simple but I need to walk through it another time or two)

I'll continue reviewing tonight to gain more confidence in those areas, but with admin gating, this is good to go afaict.

You mentioned you had some minor fixes you wanted to make before merging so I'll wait for your green light.

Comment thread .env.development
# reachable by containers on 172.30.0.2(:53), outside of docker with 0.0.0.0:5353
DOMAINS_DNS_SERVER=172.30.0.2
# custom domains debugging
NEXT_PUBLIC_CUSTOM_DOMAINS_DEBUG=1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add this to the service worker?

Copy link
Copy Markdown
Member

@huumn huumn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found a few non-blocking things in the proxy

Comment thread proxy.js
// NextJS recommends to not add the CSP header to prefetches and static assets
// See https://nextjs.org/docs/app/building-your-application/configuring/content-security-policy
{
source: '/((?!api|_next/static|_error|404|500|offline|_next/image|_next/webpack-hmr|favicon.ico).*)',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we exclude /api, calls to /api/graphql do not strip x-stackernews-subname. This isn't a problem now because we don't read the header outside of getserversideprops, but if that changes we'll be reading a spoofable x-stackernews-subname. I'm not sure why that might change though.

It's at least worth commenting on. We could also go further running a little header stripping middleware when source: '/((?api|_next/static|_error|404|500|offline|_next/image|_next/webpack-hmr|favicon.ico).*)'.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided to use the cached domain mappings where we have an asynchronous context, such as getGetServerSideProps. It feels more appropriate.

Comment thread proxy.js
// TODO: handle auth sync

// clean up the pathname from any subname
if (pathname.startsWith('/~')) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.pizza.com/~bitcoin is redirected to https://www.pizza.com/bitcoin. It's kind of low severity given it'd have to be manually input, but it might be worth redirecting to https://stacker.news/~bitcoin or 404ing.

If we redirect https://www.pizza.com/~bitcoin to https://stacker.news/~bitcoin in the proxy, then we can probably avoid doing badge rewrites in components.

Not blocking either. Just a suggestion.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.pizza.com/~bitcoin redirects to https://www.pizza.com since that part removes the territory name.

It would be much easier to redirect to stacker.news via proxy indeed. I'll implement this!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to implement this but in dev this can't work due to a Next.js bug: vercel/next.js#44482

It seems like redirecting from www.pizza.com/~bitcoin to localhost:3000/~bitcoin won't work because localhost will be ignored and it will be treated as a relative redirect (www.pizza.com/~bitcoin) causing a redirect loop.
If I instead redirect to, idk, stacker.news it will work.

In production it should work, but the fact that it's untestable and will actually break the middleware in dev... makes this a bit weird to work with.

I think we can postpone this to the UI/UX PR, I like the idea of being able to redirect to stacker news in one place, probably we need to be creative.

@huumn
Copy link
Copy Markdown
Member

huumn commented Apr 20, 2026

I found these mermaid diagrams helpful for understanding how things progress through domain verification, aws calls, and the proxy.

stateDiagram-v2
  [*] --> PENDING: setDomain creates row
  PENDING --> ACTIVE: verifyDomain returns VERIFIED
  PENDING --> HOLD: 48h elapsed OR 3 AWS retries exhausted OR Sub STOPPED
  ACTIVE --> HOLD: Sub STOPPED trigger
  HOLD --> PENDING: setDomain re-verify on same domain
  HOLD --> [*]: clearLongHeldDomains after 30 days
  ACTIVE --> [*]: Sub owner changes
  HOLD --> [*]: Sub owner changes
Loading
sequenceDiagram
  participant W as worker
  participant A as ACM
  participant E as ELBv2
  W->>A: "requestCertificate(domain, IdempotencyToken=domainId)"
  A-->>W: certificateArn
  W->>A: describeCertificate(arn)
  A-->>W: "{ Status: PENDING_VALIDATION, ResourceRecord }"
  Note right of W: store arn and SSL record in DB
  loop every 30s to 5m while PENDING
    W->>A: describeCertificate(arn)
    A-->>W: Status
  end
  W->>E: "addListenerCertificates(listener, arn)"
  E-->>W: "ok (no-op if already attached)"
  Note over W,E: Domain ACTIVE
  Note over W: Later, domain deleted or HOLD fires cleanup
  W->>E: "removeListenerCertificates(listener, arn)"
  E-->>W: "ok (no-op if not attached)"
  W->>A: deleteCertificate(arn)
  A-->>W: "ok (errors if cert in use)"
Loading
flowchart TB
  Req[Incoming request] --> Strip[proxy.js strips x-stacker-news-subname]
  Strip --> Ref[referrerMiddleware]
  Ref --> HostCheck{host equals NEXT_PUBLIC_URL.host?}
  HostCheck -- yes --> Sec[applySecurityHeaders]
  HostCheck -- no --> Map{getDomainMapping returns ACTIVE row?}
  Map -- no --> Sec
  Map -- yes --> Custom[customDomainMiddleware]
  Custom --> Rewrite{path starts with slash-tilde?}
  Rewrite -- yes --> Redirect[redirect to cleaned path]
  Rewrite -- no --> SubParam{has ?sub and mismatches?}
  SubParam -- yes --> RedirectFix[redirect with sub coerced]
  SubParam -- no --> IsTerr{path is / or territory path?}
  IsTerr -- yes --> RW[rewrite to /tilde-sub/path and set header]
  IsTerr -- no --> Pass[NextResponse.next and set header]
  Redirect --> Cookies[applyReferrerCookies]
  RedirectFix --> Cookies
  RW --> Cookies
  Pass --> Cookies
  Cookies --> Sec
Loading

Comment thread proxy.js Outdated
@Soxasora
Copy link
Copy Markdown
Member Author

Cleaned the code just a bit and implemented DOMAIN_BETA_IDS for private beta access beyond just SN admins.

@Soxasora Soxasora requested a review from huumn April 22, 2026 15:49
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 9eab96c. Configure here.

Comment thread lib/domains.js

return domains.reduce((acc, domain) => {
acc[domain.domainName.toLowerCase()] = { subName: domain.subName }
return acc
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Domain mapping cache omits domainName from stored values

High Severity

The domainsMappingsCache stores each entry as { subName: domain.subName } without including domainName. Both proxy.js and ssrApollo.js then access mapping.domainName, which is always undefined. In the proxy, this passes undefined as the domain argument to customDomainMiddleware, causing the x-stacker-news-domain header to be set to the string "undefined" and debug logging to break. In SSR, the domain prop propagated to DomainProvider will have domainName: undefined.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 9eab96c. Configure here.

Comment thread lib/domains.js
}

export async function getDomainMappingFromRequest (req) {
const host = req.headers.get('host')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

req.headers.get() crashes on plain object headers in SSR

High Severity

getDomainMappingFromRequest calls req.headers.get('host'), which uses the Web Headers API. This works in proxy.js where req is a NextRequest, but in ssrApollo.js the req comes from getServerSideProps and is a Node.js IncomingMessage with a plain object for headers. Calling .get() on a plain object throws TypeError, crashing every SSR page load.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 9eab96c. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature new product features that weren't there before territories

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants