Skip to content

MdOmerFarooq/Zero-Downtime-Rolling-Patch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Zero-Downtime Rolling Patch System (AWS ALB + Ansible)

The Problem: Patching a Linux kernel requires a reboot. If you reboot a cluster of production servers all at once, your website goes down (502 Bad Gateway).

The Solution: An automated orchestration workflow that surgically removes a server from the Load Balancer, drains active connections, patches the OS, reboots, and verifies health—one server at a time.



WorkFlow & Logic

This isn't just a script; it's a state machine. I used Ansible's serial: 1 mode to ensure we never take down more than one node at a time. The playbook interacts with the AWS API to handle traffic routing while managing the Linux OS layer for patching.

Playbook Workflow

The "Why"

Why Ansible and not AWS Systems Manager (SSM)?

I chose Ansible because it is agentless. While SSM is great, it locks you into the AWS ecosystem. By using Ansible, I can port this exact logic to On-Premise servers, DigitalOcean, or Azure just by changing the load balancer module. It decouples the configuration management from the platform.

Why not just use User Data scripts?

User Data scripts are excellent for Day 0 (Bootstrapping), but they are useless for Day 2 (Operations). You can't "patch" a running server with User Data. This playbook is designed to manage the server's entire lifecycle, not just its birth.

The Trade-off: Speed vs. Availability

  • The Trade-off: By using serial: 1, the deployment is slow. Patching 10 servers takes 10x the time of patching one.
  • The Justification: In a production environment, Availability > Speed. I would rather the deployment take 20 minutes with 100% uptime than 2 minutes with a 502 outage.

Prerequisites

To run this in your own AWS environment, you need:

  1. Infrastructure:
    • An Application Load Balancer (ALB).
    • 2+ Ubuntu/Linux Web Servers.
    • Note: I automated the infrastructure creation using Python (Boto3). You can find that code here:
    • 👉 My AWS Infra Automation Repo (Boto3)
  2. Control Node:
    • Ansible installed.
    • boto3 and botocore python libraries installed.
    • Crucial: An IAM Role attached to the Control Node with elasticloadbalancing:* permissions.
  3. Connectivity:
    • SSH Access (Port 22) from Control Node to Web Servers.
    • Tip: If using private IPs across VPCs, ensure VPC Peering is active.

How to Run ?

  1. Clone the Repo:

    git clone [https://github.com/MdOmerFarooq/Zero-Downtime-Rolling-Patch.git](https://github.com/MdOmerFarooq/Zero-Downtime-Rolling-Patch.git)
    cd rolling-patch-project
  2. Update Inventory: Add your server IPs to inventory.ini.

  3. Run the Playbook:

    ansible-playbook -i inventory.ini rolling_patch.yml

👤 Author

Omer Farooq

About

Automated zero-downtime rolling patch orchestration for AWS EC2 clusters using Ansible. Features ALB target tracking, connection draining, and health-check verification to ensure 100% service availability during kernel updates.

Topics

Resources

Stars

Watchers

Forks

Contributors