Distributed messaging system for building real-time data pipelines and streaming applications.
Apache Kafka is a distributed streaming platform that:
- Publishes and subscribes to streams of records
- Stores records in a fault-tolerant way
- Processes streams of records as they occur
Think of it as a high-performance messaging queue on steroids.
A topic is a category or feed name to which records are published.
Topic: "order-events"
├─ Order Created
├─ Payment Processed
├─ Inventory Updated
└─ Notification Sent
Topic: "user-events"
├─ User Registered
├─ User Updated
└─ User Deleted
Applications that publish messages to topics.
Applications that subscribe to topics and process messages.
Topics are split into partitions for parallel processing.
Topic: "orders" (3 partitions)
Partition 0: Order#1, Order#4, Order#7, ...
Partition 1: Order#2, Order#5, Order#8, ...
Partition 2: Order#3, Order#6, Order#9, ...
Multiple consumers processing same topic independently.
Event-driven order flow with Kafka
| Feature | Kafka | RabbitMQ |
|---|---|---|
| Throughput | Very High | Medium |
| Scalability | Horizontal | Vertical |
| Persistence | Excellent | Good |
| Stream Processing | Native | No |
✅ Event Streaming
✅ Real-time Analytics
✅ Microservices Communication
✅ Log Aggregation
✅ Stream Processing
Kafka = Distributed event streaming platform for microservices