🚀 Zero-Downtime Rolling Patch System (AWS ALB + Ansible)

The Problem: Patching a Linux kernel requires a reboot. If you reboot a cluster of production servers all at once, your website goes down (502 Bad Gateway).

The Solution: An automated orchestration workflow that surgically removes a server from the Load Balancer, drains active connections, patches the OS, reboots, and verifies health—one server at a time.

📹 Watch the Project Demo (Video)

WorkFlow & Logic

This isn't just a script; it's a state machine. I used Ansible's serial: 1 mode to ensure we never take down more than one node at a time. The playbook interacts with the AWS API to handle traffic routing while managing the Linux OS layer for patching.

The "Why"

Why Ansible and not AWS Systems Manager (SSM)?

I chose Ansible because it is agentless. While SSM is great, it locks you into the AWS ecosystem. By using Ansible, I can port this exact logic to On-Premise servers, DigitalOcean, or Azure just by changing the load balancer module. It decouples the configuration management from the platform.

Why not just use `User Data` scripts?

User Data scripts are excellent for Day 0 (Bootstrapping), but they are useless for Day 2 (Operations). You can't "patch" a running server with User Data. This playbook is designed to manage the server's entire lifecycle, not just its birth.

The Trade-off: Speed vs. Availability

The Trade-off: By using serial: 1, the deployment is slow. Patching 10 servers takes 10x the time of patching one.
The Justification: In a production environment, Availability > Speed. I would rather the deployment take 20 minutes with 100% uptime than 2 minutes with a 502 outage.

Prerequisites

To run this in your own AWS environment, you need:

Infrastructure:
- An Application Load Balancer (ALB).
- 2+ Ubuntu/Linux Web Servers.
- Note: I automated the infrastructure creation using Python (Boto3). You can find that code here:
- 👉 My AWS Infra Automation Repo (Boto3)
Control Node:
- Ansible installed.
- boto3 and botocore python libraries installed.
- Crucial: An IAM Role attached to the Control Node with elasticloadbalancing:* permissions.
Connectivity:
- SSH Access (Port 22) from Control Node to Web Servers.
- Tip: If using private IPs across VPCs, ensure VPC Peering is active.

How to Run ?

Clone the Repo:

git clone [https://github.com/MdOmerFarooq/Zero-Downtime-Rolling-Patch.git](https://github.com/MdOmerFarooq/Zero-Downtime-Rolling-Patch.git)
cd rolling-patch-project

Update Inventory: Add your server IPs to inventory.ini.

Run the Playbook:

ansible-playbook -i inventory.ini rolling_patch.yml

👤 Author

Omer Farooq

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Flowchart_of_Playbook.png		Flowchart_of_Playbook.png
README.md		README.md
Result.png		Result.png
inventory.ini		inventory.ini
playbook.yml		playbook.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Zero-Downtime Rolling Patch System (AWS ALB + Ansible)

📹 Watch the Project Demo (Video)

WorkFlow & Logic

The "Why"

Why Ansible and not AWS Systems Manager (SSM)?

Why not just use `User Data` scripts?

The Trade-off: Speed vs. Availability

Prerequisites

How to Run ?

👤 Author

About

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🚀 Zero-Downtime Rolling Patch System (AWS ALB + Ansible)

📹 Watch the Project Demo (Video)

WorkFlow & Logic

The "Why"

Why Ansible and not AWS Systems Manager (SSM)?

Why not just use User Data scripts?

The Trade-off: Speed vs. Availability

Prerequisites

How to Run ?

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Why not just use `User Data` scripts?