validatedpatterns · dminnear-rh · Apr 8, 2026
diff --git a/content/patterns/trilio-continuous-recovery/_index.md b/content/patterns/trilio-continuous-recovery/_index.md
@@ -0,0 +1,112 @@
+---
+title: Trilio Continuous Restore
+date: 2026-04-08
+tier: sandbox
+summary: A demonstration of Trilio Continuous Restore for stateful applications
+rh_products:
+  - Red{nbsp}Hat OpenShift Container Platform
+  - Red{nbsp}Hat OpenShift GitOps
+  - Red{nbsp}Hat Advanced Cluster Management
+partners:
+  - Trilio
+industries:
+  - General
+aliases: /trilio-cr/
+links:
+  github: https://github.com/trilio-demo/trilio-continuous-restore
+  install: getting-started
+  bugs: https://github.com/trilio-demo/trilio-continuous-restore/issues
+  feedback: https://docs.google.com/forms/d/e/1FAIpQLScI76b6tD1WyPu2-d_9CCVDr3Fu5jYERthqLKJDUGwqBg7Vcg/viewform
+---
+
+# Trilio Continuous Restore — Red{nbsp}Hat Validated Pattern
+
+## Overview
+
+This Validated Pattern delivers an automated, GitOps-driven Disaster Recovery (DR) solution for stateful applications running on Red{nbsp}Hat OpenShift. By integrating [Trilio for Kubernetes](https://trilio.io) with the [Red{nbsp}Hat Validated Patterns framework](https://validatedpatterns.io), the pattern delivers:
+
+- **Automated backup** of stateful workloads on the primary (hub) cluster
+- **Continuous Restore** — Trilio's accelerated Recovery Time Objective (RTO) DR path that continuously pre-stages backup data on the DR cluster so that recovery requires only metadata retrieval, not a full data transfer
+- **Automated DR testing** — the full backup-to-restore lifecycle runs as a scheduled, self-healing GitOps workflow with no human intervention after initial setup
+- **Multi-cluster lifecycle management** through Red{nbsp}Hat Advanced Cluster Management (ACM)
+
+### Use case
+
+The pattern targets organizations that need a documented, repeatable DR posture for Kubernetes-native workloads — particularly those that must demonstrate RTO/Recovery Point Objective (RPO) targets through regular, automated DR tests rather than annual manual exercises.
+
+A WordPress + MySQL deployment is included as a representative stateful application. It serves as the reference workload for the full backup, restore, and URL-rewrite lifecycle.
+
+---
+
+## Architecture
+
+```mermaid
+graph TD
+    subgraph Git["Git (Source of Truth)"]
+        values["values-hub.yaml\nvalues-secondary.yaml\ncharts/"]
+    end
+
+    subgraph Hub["Hub Cluster (primary)"]
+        ACM["ACM"]
+        ArgoCD["ArgoCD"]
+        Vault["HashiCorp Vault + ESO"]
+        Trilio_Hub["Trilio Operator + TVM"]
+        CronJob["Imperative CronJob\n(DR lifecycle automation)"]
+    end
+
+    subgraph Spoke["DR Cluster (secondary)"]
+        Trilio_Spoke["Trilio Operator + TVM"]
+        EventTarget["EventTarget pod\n(pre-stages PVCs)"]
+        ConsistentSet["ConsistentSet\n(restore point)"]
+    end
+
+    S3["Shared S3 Bucket"]
+
+    Git -->|GitOps sync| ArgoCD
+    ArgoCD --> Trilio_Hub
+    Vault -->|S3 creds + license| Trilio_Hub
+    Trilio_Hub -->|backups| S3
+    ACM -->|provisions| Spoke
+    S3 -->|EventTarget polls| EventTarget
+    EventTarget --> ConsistentSet
+    CronJob -->|restore from ConsistentSet| ConsistentSet
+```
+
+### Component roles
+
+| Component | Where | Role |
+|-----------|-------|------|
+| Trilio Operator | Hub + Spoke | Installed through Operator Lifecycle Manager (OLM) from the `certified-operators` catalog, channel `5.3.x` |
+| TrilioVaultManager | Hub + Spoke | Trilio operand Custom Resource (CR); manages the Trilio data plane |
+| Red{nbsp}Hat OpenShift | Hub + Spoke | Container orchestration platform; provides OLM, storage, networking, and the GitOps operator substrate |
+| Red{nbsp}Hat OpenShift GitOps (ArgoCD) | Hub + Spoke | GitOps sync engine; all configuration is driven from Git |
+| Red{nbsp}Hat Advanced Cluster Management (ACM) | Hub | Cluster lifecycle, policy enforcement, and spoke provisioning |
+| Validated Patterns Imperative CronJob | Hub + Spoke | Runs the automated DR lifecycle on a 10-minute schedule |
+| BackupTarget | Hub + Spoke | Points to the shared S3 bucket; the spoke BackupTarget has the EventTarget flag set |
+| BackupPlan | Hub | Defines backup scope (wordpress namespace), quiesce/unquiesce hooks, and retention |
+| CR BackupPlan | Hub | Continuous Restore variant of BackupPlan; drives pre-staging on the spoke |
+| EventTarget pod | Spoke | Watches the shared S3 bucket for new backups; pre-stages Persistent Volume Claims (PVCs) locally |
+| ConsistentSet | Spoke | Cluster-scoped CR representing a fully pre-staged restore point |
+| HashiCorp Vault and External Secrets Operator (ESO) | Hub | Secret management; S3 credentials and Trilio license are never stored in Git |
+
+### How Continuous Restore works
+
+1. The hub creates a backup using the CR BackupPlan and writes it to the shared S3 storage.
+2. The EventTarget pod on the spoke detects the new backup and begins copying volume data locally — ahead of any DR event.
+3. When the spoke's imperative job detects an Available ConsistentSet, it submits a Restore CR. Because the data is already local, only backup metadata is fetched — resulting in significantly lower RTO than a standard on-demand restore.
+4. The post-restore Hook CR rewrites WordPress database URLs to the DR cluster's ingress domain.
+
+## Links
+
+- [Trilio for Kubernetes documentation](https://docs.trilio.io/kubernetes)
+- [Red{nbsp}Hat Validated Patterns](https://validatedpatterns.io)
+- [Validated Patterns imperative framework](https://validatedpatterns.io/learn/imperative-actions/)
+- [Red{nbsp}Hat Advanced Cluster Management (ACM)](https://access.redhat.com/documentation/en-us/red_hat_advanced_cluster_management_for_kubernetes)
+- [External Secrets Operator](https://external-secrets.io)
+
+## Next steps
+
+- [Prerequisites](prerequisites)
+- [Getting started](getting-started)
+- [CR operations](cr-operations)
+- [Troubleshooting](troubleshooting)
diff --git a/content/patterns/trilio-continuous-recovery/cr-operations.md b/content/patterns/trilio-continuous-recovery/cr-operations.md
@@ -0,0 +1,103 @@
+---
+title: CR operations
+weight: 30
+aliases: /trilio-cr/cr-operations/
+---
+
+## Operations
+
+### Monitoring DR status
+
+```bash
+# Hub — all phases
+make dr-status
+
+# Spoke — ConsistentSet and restore status (run on spoke context)
+oc get configmap trilio-cr-status -n imperative -o yaml
+```
+
+### Automated DR lifecycle
+
+The imperative framework runs continuously on a 10-minute schedule with no manual intervention required. The full lifecycle from a standing start (hub up, spoke just joined) to a completed Continuous Restore typically completes within 30–45 minutes.
+
+**Hub job sequence:**
+
+| Job | What it does | Skips when |
+|-----|-------------|------------|
+| `trilio-enable-cr` | Creates CR BackupPlan + ContinuousRestore Policy | CR BackupPlan already Available |
+| `trilio-cr-backup` | Creates a backup against the CR BackupPlan | Available CR backup exists |
+| `trilio-backup` | Creates a standard backup | Available backup exists |
+| `trilio-restore-standard` | Restores to `wordpress-restore` on hub | Completed restore exists |
+| `trilio-e2e-status` | Writes status ConfigMap; fails until all phases pass | — (always runs) |
+
+**Spoke job sequence (per DR cluster):**
+
+| Job | What it does | Skips when |
+|-----|-------------|------------|
+| `trilio-cr-status` | Validates ConsistentSet available; writes status ConfigMap | — (always runs; fails until Available) |
+| `trilio-cr-restore` | Restores from latest ConsistentSet to `wordpress-restore` | Completed restore exists |
+
+### Manual backup
+
+To trigger a backup outside the automated schedule:
+
+```bash
+ansible-navigator run ansible/playbooks/dr-backup.yaml
+```
+
+### Manual DR restore
+
+**Standard restore** (from a named backup):
+
+```bash
+ansible-navigator run ansible/playbooks/dr-restore.yaml \
+  -e restore_method=backup \
+  -e restore_namespace=<target-namespace>
+```
+
+**Continuous Restore** (from a pre-staged ConsistentSet on the DR cluster — accelerated RTO):
+
+```bash
+ansible-navigator run ansible/playbooks/dr-restore.yaml \
+  -e restore_method=consistentset \
+  -e restore_namespace=<target-namespace>
+```
+
+Both commands discover the cluster ingress domain automatically and apply the Route hostname transform.
+
+### Offboarding a spoke
+
+```bash
+# Step 1 — on the hub context
+make unlabel-spoke CLUSTER=<acm-cluster-name>
+
+# Step 2 — on the spoke context
+make offboard-spoke CLUSTER=<acm-cluster-name>
+```
+
+### Uninstalling the pattern
+
+```bash
+# On the hub context
+make offboard-hub
+```
+
+> Save your HashiCorp Vault root token and unseal keys before running `offboard-hub`. They are stored in the `imperative` namespace which is removed during offboard.
+
+---
+
+## Ansible playbook reference
+
+| Playbook | When to use | Key inputs |
+|----------|-------------|------------|
+| `dr-backup.yaml` | Trigger a manual backup on the hub | — |
+| `dr-restore.yaml` | Manual restore (backup or ConsistentSet method) | `restore_method`, `restore_namespace`, `source_backup` (optional) |
+| `validate-trilio.yaml` | Pre/post-change Trilio health validation | — |
+| `offboard-spoke.yaml` | Remove spoke-side Trilio resources | `cluster_name` |
+| `offboard-hub.yaml` | Full hub pattern teardown | — |
+
+Playbooks are run by using `ansible-navigator`:
+
+```bash
+ansible-navigator run ansible/playbooks/<playbook>.yaml [-e key=value ...]
+```
diff --git a/content/patterns/trilio-continuous-recovery/getting-started.md b/content/patterns/trilio-continuous-recovery/getting-started.md
@@ -0,0 +1,174 @@
+---
+title: Getting started
+weight: 20
+aliases: /trilio-cr/getting-started/
+---
+
+# Deploying the pattern
+
+## Deployment
+
+### 1. Clone the repository
+
+```bash
+git clone https://github.com/trilio-demo/trilio-continuous-restore
+cd trilio-continuous-restore
+```
+
+### 2. Configure S3 bucket details
+
+Edit `values-hub.yaml` and `values-secondary.yaml` to set your S3 bucket name and region:
+
+```yaml
+# In both values-hub.yaml and values-secondary.yaml, under the trilio-operand app overrides:
+overrides:
+  - name: backupTarget.bucketName
+    value: <your-bucket-name>
+  - name: backupTarget.region
+    value: <your-bucket-region>   # for example, us-east-1
+```
+
+### 3. Populate secrets
+
+Create `values-secret.yaml` from the template:
+
+```bash
+cp values-secret.yaml.template ~/values-secret-trilio-continuous-restore.yaml
+```
+
+Edit `~/values-secret-trilio-continuous-restore.yaml` and fill in your credentials:
+
+```yaml
+secrets:
+  - name: trilio-license
+    vaultPrefixes:
+    - global
+    fields:
+    - name: key
+      value: <your-trilio-license-key>   # single unbroken line, no escape characters
+
+  - name: trilio-s3
+    vaultPrefixes:
+    - global
+    fields:
+    - name: accessKey
+      value: <your-s3-access-key>
+    - name: secretKey
+      value: <your-s3-secret-key>
+```
+
+> Always update secrets in your home directory, never in the repo's `values-secret.yaml.template` so that secrets are never committed to git.
+
+### 4. Install the pattern
+
+```bash
+./pattern.sh make install
+```
+
+This command:
+1. Bootstraps HashiCorp Vault and loads secrets from `~/values-secret-trilio-continuous-restore.yaml`
+2. Installs the Validated Patterns operator on the hub
+3. Creates the `ValidatedPattern` CR which triggers ArgoCD to deploy all hub components
+
+Monitor progress in the ArgoCD UI or by running:
+
+```bash
+oc get application -n openshift-gitops
+```
+
+All applications should reach `Synced / Healthy` within 10–15 minutes.
+
+**Alternative: manual secret population by using `oc`**
+
+To write or rotate secrets directly in HashiCorp Vault without re-running `./pattern.sh make install`:
+
+```bash
+# Extract Vault root token
+VAULT_TOKEN=$(oc get secret vaultkeys -n imperative \
+  -o jsonpath='{.data.vault_data_json}' | \
+  base64 -d | python3 -c "import sys,json; print(json.load(sys.stdin)['root_token'])")
+
+# Write Trilio license
+oc exec -n vault vault-0 -- env VAULT_TOKEN=$VAULT_TOKEN \
+  vault kv put secret/global/trilio-license key="<your-license-key>"
+
+# Write S3 credentials
+oc exec -n vault vault-0 -- env VAULT_TOKEN=$VAULT_TOKEN \
+  vault kv put secret/global/trilio-s3 accessKey="<key>" secretKey="<secret>"
+```
+
+You can also reload secrets from `~/values-secret-trilio-continuous-restore.yaml` by running:
+
+```bash
+./pattern.sh make load-secrets
+```
+
+### 5. Verify hub deployment
+
+Check that Trilio is healthy:
+
+```bash
+oc get triliovaultmanager -n trilio-system
+# STATUS should be Deployed or Updated
+
+oc get target -n trilio-system
+# STATUS should be Available
+```
+
+Check the end-to-end DR status (updated automatically by the imperative framework):
+
+```bash
+make dr-status
+```
+
+Initial run: `trilio-enable-cr` and `trilio-backup` complete within the first two CronJob cycles (~20 minutes). Standard restore follows. All phases `PASS` indicates the hub is fully operational.
+
+---
+
+## Spoke (DR cluster) onboarding
+
+### 1. Import the DR cluster into ACM
+
+Import the DR cluster through the ACM console or the `oc` CLI. Note the cluster name assigned during import.
+
+### 2. Label and onboard
+
+```bash
+make onboard-spoke CLUSTER=<acm-cluster-name>
+```
+
+This labels the cluster with `clusterGroup=secondary`, which triggers ACM to deploy the spoke configuration through ArgoCD.
+
+After running `make onboard-spoke`, kick the spoke-side ArgoCD application to sync immediately (run on the spoke cluster context):
+
+```bash
+oc patch application.argoproj.io main-trilio-continuous-restore-secondary \
+  -n openshift-gitops --type merge \
+  -p '{"operation":{"sync":{}}}'
+```
+
+### 3. Monitor spoke onboarding
+
+```bash
+make spoke-status CLUSTER=<acm-cluster-name>
+```
+
+Expected progression:
+1. Trilio operator installs (OLM subscription)
+2. TrilioVaultManager deploys (ESO delivers S3 + license secrets)
+3. BackupTarget becomes Available (EventTarget pod starts)
+4. ConsistentSets begin appearing as hub backups are detected (~10–20 minutes after the hub's CR backup completes)
+5. Spoke imperative restore runs automatically after the first ConsistentSet is Available
+
+The full spoke onboarding sequence typically takes 15–25 minutes from label application to a running TrilioVaultManager. The imperative restore adds another 30–45 minutes on top of that for the first ConsistentSet to appear and the restore to complete.
+
+### Known: trilio-operand OutOfSync on spoke after onboarding
+
+ArgoCD may show `trilio-operand` as `OutOfSync / Missing` immediately after spoke onboarding. This is a CRD timing issue — ArgoCD attempts to sync the TrilioVaultManager CR before the Trilio operator has finished registering its Custom Resource Definitions (CRDs).
+
+The `SkipDryRunOnMissingResource=true` sync option is set in `values-secondary.yaml` to handle this automatically. If the issue persists after 5–10 minutes, manually refresh the ArgoCD application:
+
+```bash
+oc patch application trilio-operand -n main-trilio-continuous-restore-secondary \
+  --type merge -p '{"operation":{"sync":{}}}'
+```