Skip to content

hoc081098/PostgresPatroniHaproxyEfcoreDemo

Repository files navigation

PostgreSQL High Availability with Patroni, HAProxy & EF Core

Hits

Demo project to learn and validate PostgreSQL HA patterns end-to-end:

  • Patroni automatic failover and replica rejoin behavior
  • HAProxy write/read split (:5000 for primary, :5001 for replicas)
  • EF Core read/write DbContext split and read-your-writes edge case demo

Architecture Overview

                         ┌──────────────────────────────────────────┐
                         │         etcd cluster (3 nodes)           │
                         │  etcd1 : etcd2 : etcd3  (port 2379/2380) │
                         │      Raft consensus / leader election    │
                         └───────────┬──────────────────────────────┘
                                     │ ↕ heartbeat & lock
              ┌──────────────────────┼───────────────────────┐
              │                      │                       │
   ┌──────────▼──────────┐ ┌─────────▼───────────┐ ┌─────────▼───────────┐
   │  patroni1 (Spilo)   │ │  patroni2 (Spilo)   │ │  patroni3 (Spilo)   │
   │  PostgreSQL :5432   │ │  PostgreSQL :5432   │ │  PostgreSQL :5432   │
   │  REST API    :8008  │ │  REST API    :8008  │ │  REST API    :8008  │
   │  (primary or repl)  │ │  (replica)          │ │  (replica)          │
   └─────────────────────┘ └─────────────────────┘ └─────────────────────┘
              │                      │                       │
              └──────────────────────┼───────────────────────┘
                                     │
                          ┌──────────▼──────────┐
                          │       HAProxy       │
                          │  :5000 → primary    │  GET /primary  → 200 on leader only
                          │  :5001 → replicas   │  GET /replica  → 200 on replicas only
                          │  :8404 → stats UI   │      round-robin across healthy replicas
                          └──────────┬──────────┘
                                     │
                          ┌──────────▼───────────┐
                          │    ASP.NET Core      │
                          │  WriteDb → :5000     │
                          │  ReadDb  → :5001     │
                          └──────────────────────┘

What Is Implemented

Component Detail
etcd cluster 3-node etcd (v3.5) for Patroni leader election via Raft consensus
Patroni cluster 3 Spilo nodes (Patroni + PostgreSQL 17 + WAL-G), automatic primary election & failover
post_init_wrapper.sh Runs Spilo's original post_init.sh first, then creates app_user + app_db (similar to POSTGRES_DB/POSTGRES_USER in the official postgres image)
HAProxy Single HAProxy instance; port 5000 → primary (write), port 5001 → replicas (read, round-robin), port 8404 → stats UI
Health checks HAProxy uses Patroni REST API (GET /primary, GET /replica) over HTTP on port 8008 while PostgreSQL traffic stays TCP
ASP.NET Core + EF Core Two DbContexts — ApplicationWriteDbContext (→ HAProxy :5000) and ApplicationReadDbContext (→ HAProxy :5001)

Default Local URLs

  • App API (HTTPS): https://localhost:7134
  • App API (HTTP): http://localhost:5014
  • HAProxy stats UI: http://localhost:8404

How to Run

docker compose up -d

# Patroni cluster status
# Note: no -c flag needed — Spilo pre-sets PATRONICTL_CONFIG_FILE env var inside the container
docker exec -it patroni1 patronictl list

# HAProxy stats (browser) — credentials: haproxy / haproxy
open http://localhost:8404

Demo Scenarios

1. 🔀 Manual Switchover ✅

A switchover is a graceful handover — the primary finishes in-flight transactions before stepping down. Use failover only when the primary is already dead.

Note: This repo sets stable Patroni member names via PATRONI_NAME (patroni1, patroni2, patroni3), so --leader should use the Member value from patronictl list (for example patroni1). --master was deprecated in Patroni v3.x.

# 1. Check current cluster state — note the Member name of the Leader
docker exec -it patroni1 patronictl list

# 2. Switchover — replace <leader-member-name> with the Member name from step 1
#    Omit --candidate to let Patroni pick the most up-to-date replica automatically
docker exec -it patroni1 patronictl switchover \
  --leader <leader-member-name> \
  --scheduled now \
  --force

# 3. Verify the new primary
docker exec -it patroni1 patronictl list

Terminal output:

hoc.nguyen@MBAM0187 % docker exec -it patroni1 patronictl list
+ Cluster: postgres-ha (7609605213415538750) -----+----+-----------+
| Member       | Host       | Role    | State     | TL | Lag in MB |
+--------------+------------+---------+-----------+----+-----------+
| 8366a0cc9c0b | 172.21.0.5 | Leader  | running   |  6 |           |
| a83e997af22c | 172.21.0.6 | Replica | streaming |  6 |         0 |
| da0b3dc3cce5 | 172.21.0.7 | Replica | streaming |  6 |         0 |
+--------------+------------+---------+-----------+----+-----------+

hoc.nguyen@MBAM0187 % docker exec -it patroni1 patronictl switchover \
  --leader 8366a0cc9c0b \
  --scheduled now \
  --force
Current cluster topology
+ Cluster: postgres-ha (7609605213415538750) -----+----+-----------+
| Member       | Host       | Role    | State     | TL | Lag in MB |
+--------------+------------+---------+-----------+----+-----------+
| 8366a0cc9c0b | 172.21.0.5 | Leader  | running   |  6 |           |
| a83e997af22c | 172.21.0.6 | Replica | streaming |  6 |         0 |
| da0b3dc3cce5 | 172.21.0.7 | Replica | streaming |  6 |         0 |
+--------------+------------+---------+-----------+----+-----------+
2026-02-28 07:41:29.48608 Successfully switched over to "da0b3dc3cce5"
+ Cluster: postgres-ha (7609605213415538750) ---+----+-----------+
| Member       | Host       | Role    | State   | TL | Lag in MB |
+--------------+------------+---------+---------+----+-----------+
| 8366a0cc9c0b | 172.21.0.5 | Replica | stopped |    |   unknown |
| a83e997af22c | 172.21.0.6 | Replica | running |  6 |         0 |
| da0b3dc3cce5 | 172.21.0.7 | Leader  | running |  7 |           |
+--------------+------------+---------+---------+----+-----------+

hoc.nguyen@MBAM0187 % docker exec -it patroni1 patronictl list
+ Cluster: postgres-ha (7609605213415538750) -----+----+-----------+
| Member       | Host       | Role    | State     | TL | Lag in MB |
+--------------+------------+---------+-----------+----+-----------+
| 8366a0cc9c0b | 172.21.0.5 | Replica | streaming |  7 |         0 |
| a83e997af22c | 172.21.0.6 | Replica | streaming |  7 |         0 |
| da0b3dc3cce5 | 172.21.0.7 | Leader  | running   |  7 |           |
+--------------+------------+---------+-----------+----+-----------+

HAProxy stats — before & after:

HAProxy stats before switchover

Before — Leader: patroni3 (8366a0cc9c0b) · patroni1 & patroni2 are replicas

HAProxy stats after switchover

After — Leader: patroni1 (da0b3dc3cce5) · patroni2 & patroni3 are replicas


2. 💥 Automatic Failover (kill the primary) ✅

chmod +x scripts/demo_auto_failover.sh scripts/check_replica_stuck.sh

# one-shot demo: kill leader -> auto failover -> write check -> rejoin check
bash scripts/demo_auto_failover.sh

# optional: skip HTTP write check when app API on :7134 is not running
SKIP_WRITE_CHECK=1 bash scripts/demo_auto_failover.sh

What to observe

  • Patroni elects a new leader within the TTL window (ttl: 30).
  • Writes through HAProxy :5000 continue to succeed after failover.
  • The old node when rejoining may temporarily show State=running, Lag in MB=1~2, then recover to streaming, lag 0.

Quick diagnosis script (when suspecting a stuck replica)

# diagnose one replica member (default: patroni1)
bash scripts/check_replica_stuck.sh patroni1

Recovery when truly stuck (>60s without entering streaming)

# Reinit only the stuck member (do not restart the entire cluster)
# run from any live Patroni node (e.g. patroni2)
docker exec <any-running-patroni-node> patronictl reinit postgres-ha <stuck-member-name> --force
docker exec <any-running-patroni-node> patronictl list

Read-path note

  • A 3-node cluster (1 primary + 2 replicas) still guarantees write HA.
  • Read HA may temporarily degrade during failover because the surviving replica must catch up on the timeline before serving traffic.

Self-healing options enabled in this repo

  • remove_data_directory_on_rewind_failure: true: if pg_rewind fails, Patroni deletes the local PGDATA and clones it fresh from the leader.
  • remove_data_directory_on_diverged_timelines: true: if a diverged timeline cannot be safely replayed, Patroni deletes PGDATA and takes a fresh base backup.
  • Trade-off: recovery takes more time/IO, but avoids prolonged stuck-lag states.

3. ⚖️ Read Load Balancing ✅

GET /products returns servedByNode — the IP of the PostgreSQL node that actually served the query (via inet_server_addr()).

After posting a few products and calling GET /products repeatedly, HAProxy round-robin distributed read queries across replicas:

  • First few requests → served by patroni3
  • Subsequent requests → served by patroni2

This confirms that:

  1. WAL streaming replication is working — replicas have up-to-date data written to the primary
  2. HAProxy round-robin across healthy replicas is working — servedByNode IP changes across requests
  3. ApplicationReadDbContext is correctly connecting through HAProxy :5001
# Write a product
curl -X POST https://localhost:7134/products \
  -H "Content-Type: application/json" \
  -d '{"name": "Apple", "price": 1.99}'

# Read products repeatedly — observe servedByNode changing between replica IPs
curl https://localhost:7134/products
curl https://localhost:7134/products
curl https://localhost:7134/products
  • Demonstrate read-your-writes edge case via /products/read-your-writes-demo
# Write to primary, then immediately read from replica and primary for the same product ID
curl -X POST https://localhost:7134/products/read-your-writes-demo \
  -H "Content-Type: application/json" \
  -d '{"name": "RYW demo", "price": 2.49}'

4. 🗄️ EF Core Migrations & CRUD Endpoints ✅

Entity: Product — uses a factory pattern (Product.Create(...)) with private setters to prevent invalid state. EF Core materializes it via a private parameterless constructor.

// Product.Create() — the only valid way to create a Product
Product.Create(name: "Apple", price: 1.99m, createdAtUtc: DateTimeOffset.UtcNow)

Write vs Read IEntityTypeConfiguration:

  • ProductWriteConfiguration — full DDL constraints: HasMaxLength, HasColumnType, IsRequired, UseIdentityByDefaultColumn. These drive migration-generated SQL.
  • ProductReadConfiguration — minimal: only HasKey. No DDL constraints needed since the Read context never runs migrations.
  • ApplicationReadDbContext registers with UseQueryTrackingBehavior(NoTracking) — no change tracking overhead for read-only queries.

Migration is applied automatically on startup via writeDb.Database.MigrateAsync() — runs only against the write DB (:5000 → primary). Schema propagates to replicas automatically via WAL streaming replication.

# Add a new migration manually (if needed)
dotnet ef migrations add <Name> --context ApplicationWriteDbContext --output-dir Data/Migrations

Endpoints:

Method Path DbContext Routes to
POST /products ApplicationWriteDbContext HAProxy :5000 → primary
POST /products/read-your-writes-demo ApplicationWriteDbContext + ApplicationReadDbContext write to primary (:5000), then immediate read from replica (:5001) and primary (:5000)
GET /products ApplicationReadDbContext HAProxy :5001 → replicas (round-robin)

🔒 Security — Non-superuser App User

  • Verify the app never uses the postgres superuser — app_user should only have CONNECT + USAGE + DML on app_db (follows Patroni docs recommendation)
  • Add explicit GRANT statements in post_init_wrapper.sh scoped to specific schemas/tables

🐛 Observability & Debugging

  • HAProxy stats page (:8404) — walk through important columns to understand backend health and load distribution:

    HAProxy stats — during switchover transition

    QueueCur / Max / Limit

    • Number of requests currently waiting in queue because all backend connections are busy.
    • In a healthy low-traffic system this is always 0. A non-zero Cur means the backend is saturated.

    SessionsCur / Max / Limit / Total / LbTot / Last

    • Cur — number of currently active sessions on this server.
    • Max — peak concurrent sessions since HAProxy started.
    • Total — cumulative sessions handled since HAProxy started.
    • LbTot — total times this server was selected by the load balancer (useful to verify round-robin is distributing evenly).
    • Last — time elapsed since the last session on this server (e.g. 55s, 13m15s). A very large value means the server hasn't received traffic recently.

    BytesIn / Out

    • Total bytes received from clients (In) and sent to clients (Out) for this server since HAProxy started.
    • If one node has significantly higher values than others, it may indicate uneven load distribution or a hotspot.

    Status

    • The most important column. Shows the current health state of each backend server:
      • UP (green) — healthy, receiving traffic.
      • DOWN (red) — failed health checks, no traffic routed to this server.
      • UP, going down / DOWN, going up — transitional states during fall/rise threshold counting.

    LastChk — result of the most recent health check

    • L7OK/200 in 3ms — Layer 7 (HTTP) check succeeded, Patroni returned HTTP 200, took 3 ms.
    • L7STS/503 in 5ms — Layer 7 check failed, Patroni returned HTTP 503 (node is not primary/replica), took 5 ms. HAProxy counts this toward the fall threshold.
    • * L7STS/0 in 3ms — the * prefix means this result caused a state change (server just went DOWN or came back UP).

    Wght — load balancing weight

    • All servers with equal weight receive equal share of traffic (pure round-robin).
    • If weights differ, traffic is distributed proportionally.

    Act / Bck

    • Act = Y — server is configured as an active server (receives traffic under normal conditions).
    • Bck = Y — server is a backup server, only used when all active servers are down.
    • - means not applicable for that role.

    Chk / Dwn / Dwntme

    • Chk — number of consecutive failed health checks since the last state change. Once this reaches the fall threshold (configured as fall 3), the server is marked DOWN.
    • Dwn — total number of times this server has been marked DOWN since HAProxy started. A high number indicates instability or frequent flapping.
    • Dwntme — total accumulated downtime. Useful for measuring overall availability.
  • Patroni REST API — use script to query /primary, /replica, /health, /patroni, /cluster:

    chmod +x scripts/patroni_rest_api.sh
    
    # Run against each node (default node is patroni1)
    bash scripts/patroni_rest_api.sh patroni1
    bash scripts/patroni_rest_api.sh patroni2
    bash scripts/patroni_rest_api.sh patroni3
  • etcd inspectionetcdctl get --prefix /service/postgres-ha to see the DCS keys Patroni writes (leader lock, member info, config)

7. 🌐 Connection Pooling

  • Add PgBouncer between the app and HAProxy; compare connection count with and without pooling under load
  • Show how transaction-mode pooling interacts with EF Core — session-level features (SET LOCAL, temp tables, advisory locks) won't work in this mode

8. 🏋️ Load Testing

  • k6 / pgbench — concurrent read + write load; observe HAProxy stats, pg_stat_activity, replication lag in real time
  • Chaos test — combine failover + load test; measure p99 latency and error rate during the election window

9. ☁️ Production Patterns (stretch goals)

  • WAL-G + MinIO — configure S3-compatible backup, demo PITR (Point-In-Time Recovery)
  • HA HAProxy — add a second HAProxy + Keepalived VIP to remove HAProxy as a single point of failure
  • Kubernetes — migrate to k8s using the Zalando Postgres Operator

Technical notes

Repository Policies

About

Production-minded PostgreSQL HA demo with Patroni, HAProxy, and EF Core, showcasing automatic failover, read/write split, and read-after-write consistency edge cases.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages