Skip to content

Commit 5feaa82

Browse files
committed
feat: daemonset maxsurge to prevent unavailability on config changes (#819)
* feat: add maxSurge to DaemonSet rolling update strategy * chore: add TODO to use PreferSameNode once k8s 1.35 is minimum * test: assert DaemonSet rolling update strategy in smoke test * chore: update changelog * chore: lint fixes
1 parent a75fa14 commit 5feaa82

4 files changed

Lines changed: 25 additions & 1 deletion

File tree

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,13 @@ All notable changes to this project will be documented in this file.
88

99
- Support `configOverrides` for `config.json` (#818).
1010

11+
### Changed
12+
13+
- Set `maxSurge=1` and `maxUnavailable=0` on the OPA DaemonSet rolling update strategy to eliminate
14+
availability gaps during rolling updates ([#819]).
15+
1116
[#818]: https://github.com/stackabletech/opa-operator/pull/818
17+
[#819]: https://github.com/stackabletech/opa-operator/pull/819
1218

1319
## [26.3.0] - 2026-03-16
1420

rust/operator-binary/src/controller.rs

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ use stackable_operator::{
3535
k8s_openapi::{
3636
DeepMerge,
3737
api::{
38-
apps::v1::{DaemonSet, DaemonSetSpec},
38+
apps::v1::{DaemonSet, DaemonSetSpec, DaemonSetUpdateStrategy, RollingUpdateDaemonSet},
3939
core::v1::{
4040
ConfigMap, EmptyDirVolumeSource, EnvVar, EnvVarSource, HTTPGetAction,
4141
ObjectFieldSelector, Probe, SecretVolumeSource, ServiceAccount,
@@ -1183,6 +1183,13 @@ fn build_server_rolegroup_daemonset(
11831183
..LabelSelector::default()
11841184
},
11851185
template: pod_template,
1186+
update_strategy: Some(DaemonSetUpdateStrategy {
1187+
type_: Some("RollingUpdate".to_string()),
1188+
rolling_update: Some(RollingUpdateDaemonSet {
1189+
max_surge: Some(IntOrString::Int(1)),
1190+
max_unavailable: Some(IntOrString::Int(0)),
1191+
}),
1192+
}),
11861193
..DaemonSetSpec::default()
11871194
};
11881195

rust/operator-binary/src/service.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,12 @@ pub(crate) fn build_server_role_service(
6363
type_: Some(opa.spec.cluster_config.listener_class.k8s_service_type()),
6464
ports: Some(data_service_ports(opa.spec.cluster_config.tls_enabled())),
6565
selector: Some(service_selector_labels.into()),
66+
// This ensures that products (e.g. Trino) on a node always talk to the OPA pod on the
67+
// same node, avoiding cross-node latency. The downside is that if the local OPA pod is
68+
// unavailable, requests fail instead of falling back to another node.
69+
// TODO: Once our minimum supported Kubernetes version is 1.35, use
70+
// `trafficDistribution: PreferSameNode` instead, which prefers the local node but
71+
// gracefully falls back to other nodes if the local pod is unavailable.
6672
internal_traffic_policy: Some("Local".to_string()),
6773
..ServiceSpec::default()
6874
};

tests/templates/kuttl/smoke/10-assert.yaml.j2

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,11 @@ kind: DaemonSet
99
metadata:
1010
name: test-opa-server-default
1111
spec:
12+
updateStrategy:
13+
type: RollingUpdate
14+
rollingUpdate:
15+
maxSurge: 1
16+
maxUnavailable: 0
1217
template:
1318
spec:
1419
containers:

0 commit comments

Comments
 (0)