feat(security): SSO/JWT authentication migration (Phase 1)#1569
feat(security): SSO/JWT authentication migration (Phase 1)#1569jsell-rh wants to merge 41 commits into
Conversation
Define desired state for migrating from OpenShift OAuth proxy to direct SSO/JWT authentication. Key decisions: - BFF pattern: Next.js as OIDC confidential client, browser gets session cookie - K8s impersonation: backend SA + Impersonate-User/Group preserves RBAC - Dual-path auth: JWT first, TokenReview fallback for API keys - Feature-flagged migration for incremental rollout - Supersedes ADR-0002 (raw token passthrough → impersonation) Includes migration workflow with consumer impact map and implementation notes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for cheerful-kitten-f556a0 canceled.
|
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reference the IAM consolidation proposal (PR #1466) as the long-term direction. This spec is Phase 1; future phases cover API keys → SSO service accounts, runner → OIDC token exchange, DB RBAC reconciler, and credential consolidation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… requirements - OIDC callback must coexist with existing integration auth routes - SSO client configuration requirements (one per environment, audience isolation) - Post-logout redirect URI and web origins specified Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Kind/local-dev environments include Keycloak with pre-configured realm - Replaces static JWKS ConfigMap, DISABLE_AUTH mock mode, and OC_TOKEN - Same JWT validation code path as production (no dev-only auth logic) - Realm config version-controlled as JSON export - E2E tests use local Keycloak in Kind environments - Design decision: Keycloak Identity Brokering for deployed environments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Keycloak Deployment, realm JSON, and env var config for Kind overlay - Maps what it replaces (static JWKS, DISABLE_AUTH, test-user SA) - Identity Brokering section for deployed environments - Updated manifest changes to include Kind overlay additions/removals Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d impersonation RBAC Slice 1 of SSO authentication migration (Phase 1): - Deploy Keycloak to Kind cluster with pre-configured realm (ambient-code) including confidential frontend client, public CLI client, and E2E client_credentials client. Dev users: developer/developer, admin/admin. - Add jwtauth package with JWKS-based JWT validation using lestrrat-go/jwx/v2. Validates signature, expiration, issuer, and audience. Extracts OIDC claims (sub, email, preferred_username, groups). - Add impersonate verb on users, groups, and serviceaccounts to backend-api ClusterRole for K8s impersonation under SSO auth. - Fix Kind overlay: relax runAsNonRoot for ambient-api-server, make control-plane OIDC env vars optional. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mpersonation Slice 2 of SSO authentication migration (Phase 1): - Wire JWT validation into backend middleware: forwardedIdentityMiddleware validates JWT against Keycloak JWKS, extracts identity from OIDC claims (sub, email, preferred_username, groups), and stores validated claims in Gin context for reuse by handlers. - Add dual-path auth in getK8sClientsDefault: JWT validation first, then TokenReview fallback for API keys (K8s ServiceAccount tokens). - Use K8s impersonation (Impersonate-User/Group) instead of raw bearer token when SSO is enabled. Backend SA token + impersonation preserves all existing RBAC enforcement. - Fix SSAR cache key to include impersonated identity instead of shared SA token, preventing cross-user authorization cache leaks. - Gate SSO path behind "sso-authentication" Unleash feature flag. - Add SSO env vars (SSO_ISSUER_URL, SSO_AUDIENCE) to backend Kind overlay. - Fix Keycloak realm: add audience mapper and protocol mappers for sub, email, preferred_username claims in access token. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Slice 3 of SSO authentication migration (Phase 1):
- Add openid-client v6, iron-session v8, and jose as dependencies
- Create OIDC client layer (src/lib/oidc.ts): discovery, authorization URL
construction with PKCE, code exchange, token refresh, end-session URL
- Create encrypted session cookie management (src/lib/session.ts):
iron-session with httpOnly/secure/sameSite cookies, transparent token
refresh when access token is within 60s of expiry
- Add SSO API routes:
- /api/auth/sso/login: generates PKCE, stores verifier/state in cookies,
redirects to Keycloak authorization endpoint
- /api/auth/sso/callback: exchanges code for tokens, stores in session
- /api/auth/sso/logout: destroys session, redirects to Keycloak logout
- Add Next.js middleware: redirects unauthenticated page requests to SSO
login when SSO_ENABLED=true
- Modify buildForwardHeadersAsync: SSO path extracts JWT from session,
sets Authorization: Bearer and X-Forwarded-* headers from JWT claims.
All 97+ consumers are unaffected.
- Update navigation logout to use SSO logout route when enabled
- Update /api/me to accept Authorization header for auth check
- Add SSO env vars to Kind frontend deployment patch
- Support SSO_PUBLIC_ISSUER_URL for Kind dev (browser vs cluster URLs)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keycloak supports any port for localhost redirect URIs per RFC 8252 section 7.3. Registering http://localhost/* (without port) accepts callbacks on any ephemeral port, eliminating port-forward mismatches. Also set webOrigins to "+" (all valid redirect origins) for CORS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
openid-client v6 requires a standard URL instance (not NextURL). Construct callback URL from SSO_REDIRECT_URI base to match the redirect_uri sent during authorization, since request.url inside the container resolves to 0.0.0.0:3000 rather than localhost:11646. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In Kind, Keycloak's iss response parameter uses the public URL (localhost:30090) while openid-client validates against the internal URL (keycloak-service:8080). Remap the iss param before passing to authorizationCodeGrant so RFC 9207 issuer validation passes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set KC_HOSTNAME to the internal service URL so Keycloak uses a consistent issuer in all tokens and OIDC responses, regardless of whether the browser reaches it via localhost:30090 or the server reaches it via keycloak-service:8080. This eliminates issuer mismatches in ID token validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In Kind, the browser reaches Keycloak via localhost:30090 while backend and frontend servers use keycloak-service:8080. Keycloak sets the token issuer based on the authorization session URL, causing mismatches. Fixes: - Add alt issuer support to JWT validator (AddAltIssuer) so the backend accepts tokens from both internal and public Keycloak URLs. Production environments use a single URL and don't need alt issuers. - Use standard openid-client authorizationCodeGrant in production (full ID token validation). Fall back to manual token exchange in dev when SSO_PUBLIC_ISSUER_URL differs from SSO_ISSUER_URL. - Set cookies directly on redirect response in login route (cookies() API mutations don't transfer to NextResponse.redirect). - Derive post-login redirect origin from SSO_REDIRECT_URI to avoid container-internal 0.0.0.0:3000 address. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ints The OIDC discovery config was cached as a module-level singleton with no expiry. If Keycloak restarted and got a new ClusterIP, token refresh calls would fail silently (ECONNREFUSED) and the session would be destroyed, logging the user out. Add a 5-minute TTL so the config is re-discovered periodically. This matches the Keycloak JWKS cache interval and ensures endpoint URLs stay current after dependency restarts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
getUserSubjectFromContext now prefers userEmail (matching the Impersonate-User header) when creating RoleBindings. Previously it used userName (preferred_username), causing a mismatch: the RoleBinding subject would be "developer" but impersonation would use "developer@local.dev", so RBAC checks would fail. This ensures lazy RoleBinding creation in CreateProject works correctly with SSO impersonation — no manual RoleBindings needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Callback route: redirect to /api/auth/sso/login instead of showing JSON error when OIDC state cookies are missing or exchange fails. Handles stale Keycloak sessions that skip the login page. - Logout route: derive post-logout redirect URI from SSO_REDIRECT_URI to avoid 0.0.0.0:3000 container address. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NEXT_PUBLIC_* env vars are inlined at build time in Next.js client components, so they're unavailable when the image is built without them. Instead, expose ssoEnabled from the /api/me server route and read it in the navigation component via useCurrentUser(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The project layout had its own handleLogout hardcoded to /oauth/sign_out, separate from the main navigation. Unified both to use the runtime ssoEnabled flag from useCurrentUser(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Slice 4 of SSO authentication migration (Phase 1): - Update extract-token.sh to obtain JWT from Keycloak via client_credentials grant (ambient-e2e client). Falls back to K8s SA token when Keycloak is not available. - Add audience and sub protocol mappers to ambient-e2e Keycloak client so tokens have proper aud claim for backend validation. - Add ClusterRoleBinding for e2e service account identity (service-account-ambient-e2e) so E2E tests can access projects. - No developer RoleBindings — JIT provisioning via CreateProject handles first-time access correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the OIDC session expires and token refresh fails, the user now sees a blocking dialog instead of silent 401 errors: - Global 401 detection via QueryCache/MutationCache onError handlers - Skip retries on 401 to prevent request storms against the IdP - Non-dismissable AlertDialog with "Log in" button that preserves returnTo path so users land back on the same page - No "expiring soon" warning — server-side refresh handles access token renewal transparently; only surfaces when refresh token dies Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add logging to getAccessToken so token refresh attempts and failures are visible in pod logs (was silently swallowing errors). - Fix middleware to return 401 JSON for RSC/fetch requests instead of redirecting to Keycloak. Cross-origin redirects fail as XHR and cause CORS errors. Full page navigations still redirect to login. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
openid-client's refreshTokenGrant validates the ID token iss claim in the refresh response, which fails when the token was issued by localhost:30090 but the refresh goes through keycloak-service:8080. Use manual fetch to the token endpoint in split-URL mode (same approach as code exchange). Production uses the library's standard refreshTokenGrant with full validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The split-URL problem (browser→localhost:30090, server→keycloak-service:8080) caused token issuer mismatches that broke refresh tokens and ID token validation. Every workaround added complexity. Root fix: proxy Keycloak through the frontend at /sso/* so browser and server both reach Keycloak through the same origin. Combined with KC_HOSTNAME=http://keycloak-service:8080, all tokens now have a consistent issuer that matches the discovery endpoint. Changes: - Add /sso/[...path] catch-all route that proxies to Keycloak, rewriting Location headers on redirects - Set KC_HOSTNAME to internal service URL for consistent token issuer - Update SSO_PUBLIC_ISSUER_URL to use the proxy path - Exclude /sso from auth middleware matcher - Remove unused next.config.js rewrites (build-time, not runtime) This eliminates: alt issuers on the backend, manual token exchange fallbacks, iss parameter remapping in callbacks, and CORS errors on session expiry redirects. Production deployments use a single URL and don't need the proxy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Eliminates the split-URL issuer mismatch by properly configuring Keycloak's hostname-backchannel-dynamic feature: - KC_HOSTNAME=http://localhost:11646/sso — all tokens use the public URL as issuer, login pages render with proxy URLs - KC_HOSTNAME_BACKCHANNEL_DYNAMIC=true — internal services get backchannel URLs (token_endpoint, jwks_uri) via keycloak-service:8080 Frontend changes: - Manual OIDC discovery to bypass openid-client v6's issuer validation (known issue: github.com/panva/openid-client/issues/737) - Remove all split-URL workarounds (manual token exchange, iss remapping, URL rewriting in auth/logout/refresh) - openid-client's standard authorizationCodeGrant and refreshTokenGrant now work correctly for all flows Backend changes: - JWT validator uses discovered issuer from OIDC metadata (not the discovery URL) so it accepts the public issuer in tokens Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Frontend README: replace OC_TOKEN/OAuth proxy header docs with SSO env vars and OIDC session model description - Frontend .env.example: add SSO_* vars, move OC_* to legacy section - Backend README: replace DISABLE_AUTH migration guide with Keycloak dev auth instructions (JWT and SA token examples) - E2E README: update quick start to use extract-token.sh (Keycloak client_credentials with K8s SA fallback) - Kind dev guide: add Keycloak to bootstrap steps, document dev credentials and session lifetimes - CONTRIBUTING.md: add Keycloak to kind-up description, update access instructions with login info - OPENSHIFT_OAUTH.md: mark as legacy, link to SSO spec Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…facing The sso-authentication Unleash flag controls which auth path the backend uses. It is not visible in workspace settings and is not user-configurable — ops enables it per-environment during migration. Kind dev cluster creates and enables it automatically. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…bient-code/platform into jsell/spec/sso-authentication
Toggles SSO on/off for both frontend (SSO_ENABLED env var) and backend (sso-authentication Unleash flag) in a single command. Legacy mode (SA token auth) is the default after kind-up; run kind-sso-toggle to enable Keycloak OIDC. Also updates Kind dev guide and backend README to document the toggle and clarify that legacy mode is the default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract fileTabs.updateTaskStatus to a const so the useEffect dependency array references the stable callback directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI note: 3 pre-existing flaky test failuresThe following 3 tests fail intermittently due to shared mutable state (
These tests pass in isolation but fail when run as part of the full suite due to test ordering. The SSO changes do not affect this path — |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes: - models_test.go captured K8sClientMw/DynamicClient at module load time (when nil), then restored them to nil in AfterEach, poisoning subsequent tests. Move capture to BeforeEach so values are saved after SetupHandlerDependencies runs. - getUserSubjectFromContext now falls back to userID context value (set by SetTestToken in tests) after checking userEmail, userIDOriginal, and userName. This ensures tests that only set userID still work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
buildForwardHeadersSSO now falls back to Bearer token from request when no SSO session cookie exists. This enables: - SSO users: session cookie → JWT forwarded - E2E tests / API clients: Bearer token in request → forwarded directly Also adds Keycloak to wait-for-ready.sh to prevent race conditions where frontend starts before Keycloak is ready. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The ambient-e2e service account JWT was missing the preferred_username claim, causing the backend to fall back to the 'sub' claim (a UUID) for K8s impersonation. The RBAC expects 'service-account-ambient-e2e'. This adds a protocol mapper to include the service account's username in JWT tokens, enabling proper identity mapping for E2E tests in SSO mode. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The E2E tests failed because: 1. frontend-test-patch.yaml hardcoded SSO_ENABLED=true, but the backend SSO flag was off — causing Keycloak JWTs to be sent to K8s API which rejected them 2. extract-token.sh preferred Keycloak tokens, but the backend wasn't configured to validate them Fixes: - Set SSO_ENABLED=false by default in the Kind overlay. Use `make kind-sso-toggle` to enable SSO explicitly. - extract-token.sh now defaults to K8s SA token (works in both modes). Set E2E_USE_SSO=true to use Keycloak client_credentials instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a second E2E pass that enables SSO (frontend env + Unleash flag), re-extracts a Keycloak JWT via client_credentials, and runs the full Cypress suite again. This ensures both auth paths are exercised in CI. The SSO pass reuses the same Kind cluster — just toggles the auth mode, restarts affected deployments, and re-runs tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cy.visit() page navigations can't carry custom headers, so the SSO
middleware redirects to Keycloak which Cypress can't handle. Fix by
adding /api/auth/sso/e2e-login route that accepts a token and creates
a session cookie (non-production only).
Changes:
- New /api/auth/sso/e2e-login POST route: accepts {token}, creates
iron-session cookie. Returns 404 in production.
- Cypress beforeEach: calls e2e-login route when SSO_MODE is true
to create session cookie before page visits.
- cypress.config.ts: passes E2E_USE_SSO env var as SSO_MODE to tests.
- e2e.yml: adds frontend health check before SSO E2E pass.
- Revert middleware Bearer token check (doesn't help for navigations).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two root causes for SSO E2E failures: 1. The e2e-login route checked NODE_ENV !== "production", but the Docker image sets NODE_ENV=production. Changed to check E2E_TEST_HELPERS env var (opt-in, added to Kind overlay). 2. All SSO public URLs used port 11646 (port-forward for local dev), but CI uses NodePort on port 80. Changed KC_HOSTNAME, SSO_REDIRECT_URI, and SSO_PUBLIC_ISSUER_URL to use http://localhost (no port = port 80). Also added Keycloak readiness check and backend JWT validator verification to the CI toggle step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Kind overlay defaults SSO URLs to http://localhost (port 80) for CI, but local dev uses dynamic port-forward ports (11000+offset). kind-sso-toggle now patches SSO_REDIRECT_URI, SSO_PUBLIC_ISSUER_URL, and KC_HOSTNAME with the correct KIND_FWD_FRONTEND_PORT when enabling SSO. This makes SSO work seamlessly in local dev without manual URL configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The backend's InitJWTValidator needs Keycloak for OIDC discovery. When toggling SSO on, KC_HOSTNAME changes cause Keycloak to restart. Wait for Keycloak to be ready, then restart the backend so OIDC discovery succeeds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Phase 1 of SSO authentication migration — independently shippable. Replaces OpenShift OAuth proxy with direct OIDC authentication via Keycloak.
What's implemented
lestrrat-go/jwx/v2. Validates signature, expiration, issuer, and audience.Impersonate-User/Impersonate-Groupheaders preserve all existing RBAC enforcement without cluster OIDC federation.sso-authenticationUnleash flag (infrastructure, not user-facing).make kind-sso-toggleswitches between SSO and legacy mode.client_credentialsgrant from Keycloak with K8s SA fallback./sso/*catch-all route proxies Keycloak through the frontend origin, combined withKC_HOSTNAME_BACKCHANNEL_DYNAMICfor consistent token issuers.Key files
specs/security/sso-authentication.spec.md(12 requirements, 30 scenarios)workflows/security/sso-migration.workflow.mdcomponents/backend/jwtauth/validator.go,validator_test.gocomponents/backend/handlers/sso.go,middleware.go,server/server.go,server/k8s.gocomponents/frontend/src/lib/oidc.ts,session.ts,auth.tssrc/app/api/auth/sso/{login,callback,logout}/route.ts,src/app/sso/[...path]/route.tssrc/components/session-expired-dialog.tsx,src/lib/query-client.tsoverlays/kind/keycloak-*.yaml,sso-credentials.yaml,backend-sso-patch.yamlbase/rbac/backend-clusterrole.yaml(impersonate verb added)Default behavior
make kind-updeploys with legacy auth (SA token, no Keycloak redirect)make kind-sso-toggleenables Keycloak OIDC for both frontend and backenddeveloper/developerProduction deployment prerequisites
To deploy SSO in a non-Kind environment, you need:
https://<frontend>/api/auth/sso/callback)sso-credentials) withSSO_ISSUER_URL,SSO_CLIENT_ID,SSO_CLIENT_SECRET,SSO_AUDIENCE, andSESSION_SECRETSSO_ISSUER_URLandSSO_AUDIENCE(from the secret)SSO_ENABLED=true,SSO_ISSUER_URL,SSO_CLIENT_ID,SSO_CLIENT_SECRET,SSO_REDIRECT_URI,SESSION_SECRETsso-authenticationenabled for the target environmentemailclaim used for impersonationIdentity Brokering (running your own Keycloak that federates login to RH SSO) is not required for Phase 1. It is a Phase 2+ convenience for environments that want client management autonomy without RH SSO realm admin access.
The
/sso/*proxy route andKC_HOSTNAME_BACKCHANNEL_DYNAMICconfig are only needed in Kind (where the browser and server reach Keycloak via different URLs). In production, the browser and server use the same URL, so standardopenid-clientdiscovery works without the proxy.What's NOT in scope (by design)
Test plan
make kind-sso-toggleswitches between SSO and legacy modeclient_credentials🤖 Generated with Claude Code