A lightweight Kubernetes chaos engineering operator with transparent target selection and optional manual approval.
Omen lets you declaratively define chaos experiments against your workloads. Each run:
- Selects a fixed set of target pods (preview)
- Optionally waits for manual approval
- Executes the chaos action against those exact targets
- Records per-target results and a summary
Two CRDs are provided:
- Experiment — defines the schedule, target selector, action, safety limits, and approval policy
- ExperimentRun — a single execution instance created by the controller, holding the target preview, approval state, and results
Curious about what's coming next? Check out our Roadmap to see our plans for network chaos, advanced target filtering, ChatOps integrations, and more!
helm install omen oci://ghcr.io/k-krew/charts/omen \
--namespace omen-system \
--create-namespace \
--version <version>To customise the installation:
helm install omen oci://ghcr.io/k-krew/charts/omen \
--namespace omen-system \
--create-namespace \
--version <version> \
--set manager.leaderElect=true \
--set resources.limits.memory=256Mi \
--set manager.webhookTimeout=30s \
--set manager.protectedNamespaces="{kube-system,omen-system,kube-public,my-critical-ns}"| Flag | Default | Description |
|---|---|---|
--webhook-timeout |
10s |
Timeout for outgoing approval webhook HTTP requests. Transient failures are retried by the controller with exponential backoff; the run only fails if still undelivered when the approval TTL expires. |
--leader-elect |
false |
Enable leader election for HA deployments. |
--metrics-bind-address |
0 |
Address for the metrics endpoint (0 disables it). |
--health-probe-bind-address |
:8081 |
Address for liveness/readiness probes. |
--protected-namespaces |
kube-system,omen-system,kube-public |
Comma-separated list of namespaces that cannot be targeted by any experiment. Enforced at both the validating webhook and target selection time. |
Omen enforces a list of protected namespaces that can never be targeted, regardless of what an Experiment specifies. The defaults are kube-system, omen-system, and kube-public.
The list is configured via the --protected-namespaces flag (comma-separated) and exposed in the Helm chart as manager.protectedNamespaces:
manager:
protectedNamespaces:
- kube-system
- omen-system
- kube-public
- my-critical-namespaceProtection is enforced in two places:
- Validating webhook — rejects
Experimentobjects whosespec.selector.namespaceis in the protected list at admission time. - Controller — filters out any pods in protected namespaces during target selection, even for cluster-scoped selectors.
Individual pods can be excluded from all chaos experiments by adding the annotation chaos.kreicer.dev/ignore: "true". This is useful for pods running critical in-flight work (e.g., database migrations, stateful leaders) that must not be interrupted.
kubectl annotate pod <pod-name> chaos.kreicer.dev/ignore=trueOr in the pod template:
metadata:
annotations:
chaos.kreicer.dev/ignore: "true"The annotated pod is removed from the eligible list before selection. If all matching pods carry the annotation, the run transitions to Skipped automatically.
- Go 1.26+
kubebuilderv4kubectlpointing at a local cluster
# Install CRDs
GOTOOLCHAIN=local make install
# Run the controller locally (uses ~/.kube/config)
GOTOOLCHAIN=local make runThe controller reads POD_NAMESPACE to exclude its own pods from target selection. Set it when running locally:
POD_NAMESPACE=omen-system GOTOOLCHAIN=local make runapiVersion: chaos.kreicer.dev/v1alpha1
kind: Experiment
metadata:
name: kill-one-pod
namespace: default
spec:
runPolicy:
type: Once
selector:
namespace: default
labels:
app: my-app
mode:
type: random
count: 1
action:
type: delete_pod
safety:
maxTargets: 1apiVersion: chaos.kreicer.dev/v1alpha1
kind: Experiment
metadata:
name: kill-third-of-fleet
namespace: default
spec:
runPolicy:
type: Once
selector:
namespace: default
labels:
app: my-app
mode:
type: random
percent: 33 # kill ~33% of matching pods, minimum 1
action:
type: delete_pod
safety:
maxTargets: 5percent is mutually exclusive with count. The calculated pod count is always rounded up and floored at 1, so the experiment always has an effect even against small replica sets. safety.maxTargets is applied as a hard cap after the percentage is resolved.
apiVersion: chaos.kreicer.dev/v1alpha1
kind: Experiment
metadata:
name: weekly-chaos
namespace: default
spec:
runPolicy:
type: Repeat
schedule: "0 10 * * 1" # every Monday at 10:00
cooldown: 24h
concurrencyPolicy: Forbid
selector:
namespace: staging
labels:
app: api-server
mode:
type: random
count: 2
action:
type: delete_pod
approval:
required: true
ttl: 30m
webhook:
url: https://hooks.example.com/omen-approval
safety:
maxTargets: 2To approve the run, patch the generated ExperimentRun:
kubectl patch experimentrun <run-name> \
--type=merge \
-p '{"spec":{"approved":true}}'Set dryRun: true on the Experiment to preview target selection without executing the action. Targets are recorded in ExperimentRun.status.previewTargets and results are marked Success without any pods being deleted.
Every phase transition of an ExperimentRun emits a standard Kubernetes Event on the object. Use kubectl describe to follow the lifecycle:
kubectl describe experimentrun <run-name>Events use Normal type for successful transitions (PreviewGenerated, Approved, Running, Completed) and Warning for failure states (Failed, Expired).
The TOTAL column in kubectl get expruns is populated as soon as targets are selected during the PreviewGenerated phase, so you can see how many pods will be affected before the run executes.
Experiment objects carry a finalizer (chaos.omen.com/finalizer). When an Experiment is deleted, the controller first deletes all owned ExperimentRuns and waits for them to be removed before releasing the finalizer. This prevents orphaned runs from executing chaos actions after the parent is gone.
# Regenerate CRDs and RBAC after editing types
GOTOOLCHAIN=local make manifests generate
# Build the binary
GOTOOLCHAIN=local make build
# Run tests (requires setup-envtest)
go install sigs.k8s.io/controller-runtime/tools/setup-envtest@latest
export KUBEBUILDER_ASSETS=$(setup-envtest use --print path)
go test ./... -v