fix(flagd): validate FLAGD_SYNC_PORT before parse to avoid kubelet service-link collision#1798
Conversation
…rvice-link collision `FlagdOptions.prebuild()` reads `FLAGD_SYNC_PORT` first in `in-process` resolver mode (introduced in open-feature#1651), falling back to `FLAGD_PORT` only when unset. Kubernetes service-link environment variable injection populates `FLAGD_SYNC_PORT` with `tcp://<clusterIP>:<port>` whenever a Service named `flagd-sync` shares the pod's namespace — which is exactly the topology produced by the upstream flagd Helm chart and the OpenFeature Operator's in-process configuration pattern. The SDK then calls `Integer.parseInt("tcp://...")` and dies at startup with `NumberFormatException`, breaking JVM consumers using `FLAGD_RESOLVER=in-process`. Validate the env var parses as a port in `[1, 65535]` before using it. On invalid input, log at WARN with a pointer to the most likely root cause (service-link injection) and fall back to `FLAGD_PORT`. Behaviour for valid, user-set `FLAGD_SYNC_PORT` is unchanged. The Node.js and Go in-process providers don't read `FLAGD_SYNC_PORT` at all, so this collision is Java-specific; users on those SDKs aren't affected today. A future change may want to revisit whether the SDK should read this env var at all given it's already reserved by the flagd daemon's own `--sync-port` CLI flag, but validation is the minimal change that unblocks affected users without altering the fallback chain. Signed-off-by: Jason Benedicic <48251655+jabenedicic@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request enhances the flagd provider's configuration logic by introducing validation for the FLAGD_SYNC_PORT environment variable. This change prevents the provider from crashing when encountering non-numeric values, such as those injected by Kubernetes service-links, by logging a warning and falling back to the standard port or default value. The feedback suggests adding a final safety check using the new validation method before the final integer parsing to ensure the provider remains stable even if the fallback port configuration is also invalid.
Per review feedback on open-feature#1798: the final `Integer.parseInt(portValue)` in `FlagdOptions.prebuild()` could still throw if `FLAGD_PORT` itself is invalid — same Kubernetes service-link injection pattern that hits the in-process branch can hit RPC-mode consumers when a Service named `flagd` exists in the pod's namespace, producing `FLAGD_PORT=tcp://...`. Validate `portValue` before the final parse and fall back to the resolver's default port on invalid input, with a WARN log to surface the misconfiguration rather than silently masking it. Adds two test cases: - RPC mode with invalid `FLAGD_PORT` → default RPC port. - In-process mode with both env vars invalid → default in-process port. Signed-off-by: Jason Benedicic <48251655+jabenedicic@users.noreply.github.com>
| String portValue = resolverType == Config.Resolver.IN_PROCESS | ||
| ? fallBackToEnvOrDefault(Config.SYNC_PORT_ENV_VAR_NAME, fromPortEnv) | ||
| : fromPortEnv; | ||
| String portValue = fromPortEnv; |
There was a problem hiding this comment.
Yes — portValue and fromPortEnv are intentionally distinct:
portValueis the mutable "chosen port" — it gets reassigned tosyncPortValueon the valid-FLAGD_SYNC_PORTpath (line 324) and todefaultPorton the final invalid-port fallback (line 355).fromPortEnvis the stable originalFLAGD_PORTvalue, still referenced in the WARN log on line 340 so the operator can see the exact fallback value being chosen.
Collapsing them would either lose that detail from the log message or require recomputing/getenv-ing it a second time. Happy to rename portValue → chosenPort if that makes the intent clearer, but I'd prefer to keep the two variables distinct.
Closes #1797
Summary
FlagdOptions.prebuild()parsesFLAGD_SYNC_PORTdirectly withInteger.parseIntwhenFLAGD_RESOLVER=in-process. When aServicenamedflagd-syncshares the pod's namespace (the exact topology produced by the upstream flagd Helm chart and the OpenFeature Operator'sInProcessConfigurationpattern), kubelet's service-link env injection populates the workload withFLAGD_SYNC_PORT=tcp://<clusterIP>:<port>, the SDK callsInteger.parseInt("tcp://..."), and the consumer crashes at startup withNumberFormatException.Full repro, affected versions, and root-cause analysis in #1797.
Change
Validate that
FLAGD_SYNC_PORTparses as an integer in[1, 65535]before using it. On invalid input, log at WARN with a pointer to the most likely cause (service-link injection) and fall back toFLAGD_PORT. Behaviour for valid, user-setFLAGD_SYNC_PORTis unchanged.Why this is the minimal change
The narrower fix (validation only) was preferred over alternatives:
FLAGD_SYNC_PORTentirely. Would match Node/Go SDK behaviour and avoid the env-var name reservation issue, but is a behaviour change for users who legitimately setFLAGD_SYNC_PORTafter feat: Add FLAGD_SYNC_PORT support for in-process providers with backwards compatibility #1651. Worth discussing separately.Validation is fully backwards-compatible: legitimately set
FLAGD_SYNC_PORT=NNNNcontinues to take precedence overFLAGD_PORTexactly as today.Tests
Three new cases in
FlagdOptionsTest, all using the existing@SetEnvironmentVariablepattern:testInProcessProvider_invalidSyncPortFallsBackToFlagdPort—FLAGD_SYNC_PORT=tcp://10.0.0.1:8015+FLAGD_PORT=8888→ port 8888 (the exact service-link injection scenario).testInProcessProvider_invalidSyncPortWithNoFlagdPortUsesDefault— invalidFLAGD_SYNC_PORT, noFLAGD_PORT→DEFAULT_IN_PROCESS_PORT.testInProcessProvider_outOfRangeSyncPortFallsBackToFlagdPort—FLAGD_SYNC_PORT=99999→ falls back (port out of valid range).Existing tests for
FLAGD_SYNC_PORTprecedence and fallback are unchanged and continue to pass. Full suite: 21 tests, 0 failures, 0 errors.mvn spotless:applyreports no formatting changes required.Out of scope
FLAGD_SYNC_PORTat all (cross-language alignment / env-var name reservation discussion) — flagged in flagd-provider-java reads FLAGD_SYNC_PORT in in-process mode, colliding with Kubernetes service-link env injection #1797 for a separate conversation.