The Problem: Patching a Linux kernel requires a reboot. If you reboot a cluster of production servers all at once, your website goes down (502 Bad Gateway).
The Solution: An automated orchestration workflow that surgically removes a server from the Load Balancer, drains active connections, patches the OS, reboots, and verifies health—one server at a time.
This isn't just a script; it's a state machine. I used Ansible's serial: 1 mode to ensure we never take down more than one node at a time. The playbook interacts with the AWS API to handle traffic routing while managing the Linux OS layer for patching.
I chose Ansible because it is agentless. While SSM is great, it locks you into the AWS ecosystem. By using Ansible, I can port this exact logic to On-Premise servers, DigitalOcean, or Azure just by changing the load balancer module. It decouples the configuration management from the platform.
User Data scripts are excellent for Day 0 (Bootstrapping), but they are useless for Day 2 (Operations). You can't "patch" a running server with User Data. This playbook is designed to manage the server's entire lifecycle, not just its birth.
- The Trade-off: By using
serial: 1, the deployment is slow. Patching 10 servers takes 10x the time of patching one. - The Justification: In a production environment, Availability > Speed. I would rather the deployment take 20 minutes with 100% uptime than 2 minutes with a 502 outage.
To run this in your own AWS environment, you need:
- Infrastructure:
- An Application Load Balancer (ALB).
- 2+ Ubuntu/Linux Web Servers.
- Note: I automated the infrastructure creation using Python (Boto3). You can find that code here:
- 👉 My AWS Infra Automation Repo (Boto3)
- Control Node:
- Ansible installed.
boto3andbotocorepython libraries installed.- Crucial: An IAM Role attached to the Control Node with
elasticloadbalancing:*permissions.
- Connectivity:
- SSH Access (Port 22) from Control Node to Web Servers.
- Tip: If using private IPs across VPCs, ensure VPC Peering is active.
-
Clone the Repo:
git clone [https://github.com/MdOmerFarooq/Zero-Downtime-Rolling-Patch.git](https://github.com/MdOmerFarooq/Zero-Downtime-Rolling-Patch.git) cd rolling-patch-project -
Update Inventory: Add your server IPs to
inventory.ini. -
Run the Playbook:
ansible-playbook -i inventory.ini rolling_patch.yml
Omer Farooq
