An equity order matching engine simulating US stock exchange mechanics with an interactive dashboard, built in Python. Features price-time priority matching, automated trade execution, real-time market data processing, advanced analytics, and full REST API access via FastAPI.
- Order matching engine – price-time priority matching for limit and market orders
- Real-time market data via WebSocket streaming – server-push delta fan-out at 2.7M msg/s with per-client backpressure, sequence-based reconnect recovery, and 86% bandwidth reduction over snapshot polling
- UDP multicast market data feed – O(1) fan-out with binary wire format, sequence gap detection, and deterministic recovery via HTTP delta catchup or full snapshot fallback, mirroring the architecture used by real exchanges (e.g. CME MDP, Nasdaq ITCH) to ensure fair, simultaneous delivery to all participants
- Delta publishing – sequence-based incremental order book updates with bounded buffer and snapshot fallback
- Advanced analytics – VWAP calculations, trade metrics, market activity tracking, and historical data analysis
- Interactive web-based Streamlit dashboard – market monitoring, analysis, and order submission
- REST API via FastAPI – full programmatic access with comprehensive documentation
- Market simulation tool – automated order generation for demonstration purposes
- Highly performant, scalable architecture – Redis caching and Kafka messaging for real-time processing
- Containerised deployment – easy deployment and local development via Docker Compose
The order book services handle order processing, matching, and market data dissemination. We expose the services via a FastAPI gateway service which enables the user to interact with the services via REST API calls.
The system consists of four main services:
- Gateway Service: REST API and WebSocket service that handles incoming orders, market data requests, and real-time streaming
- Matching Engine: Processes orders and executes trades using price-time priority
- Market Data Service: Manages market data dissemination and analytics
- Database: Stores order and trade history
- Streamlit UI: Interactive web dashboard providing real-time market data visualisation, order book analysis, trade history, and order submission
The market simulator is a tool that allows you to simulate market activity to demonstrate how the order book services work. We use the simulator to generate orders for the order book services to process.
You can find the MarketSimulator class in
src/order_book_simulator/simulator/market_simulator.py.
We've also provided an example script in
examples/market_simulator_usage.py that
shows how to use the MarketSimulator class to simulate market activity.
- FastAPI: REST API and WebSocket framework for the gateway service
- Polars: Data processing and analysis
- SQLAlchemy: ORM for the database
- Pydantic: Data modelling and validation
- Kafka: Message broker for order flow and market data
- PostgreSQL: Persistent storage for orders and trades
- Redis: Caching for real-time market data and Pub/Sub for WebSocket delta fan-out
- Streamlit: UI for the interactive web dashboard
- Docker: Containerisation and deployment
Run the following command from the project root directory:
uv sync --all-extras --devUse Docker Compose to build and run the services locally:
docker compose up --buildFrom there, you can interact with the services:
- Streamlit UI: http://localhost:8501 – Interactive dashboard for market monitoring, analysis, and user-friendly order submission
- FastAPI Documentation: http://localhost:8000/docs – REST API interface for programmatic, full-featured access to the order book services
To reset the order book services, you can stop the services and remove the containers, images, and volumes:
docker compose down -vTo run the market simulator, you can use the following command:
uv run python examples/market_simulator_usage.pyThe project includes two categories of benchmarks:
# Benchmark the core order book data structure (pure Python, synchronous).
uv run python benchmarks/unit/order_book_benchmark.py
# Benchmark the full matching engine with mocked I/O.
uv run python benchmarks/unit/matching_engine_benchmark.py
# Benchmark delta payload sizes vs full snapshots.
uv run python benchmarks/unit/delta_payload_benchmark.py
# Benchmark WebSocket fan-out throughput and push latency.
uv run python benchmarks/unit/websocket_benchmark.py
# Benchmark UDP multicast vs WebSocket fan-out scaling.
uv run python benchmarks/unit/multicast_benchmark.py
# Run all unit benchmarks together.
uv run python benchmarks/unit/run_all_benchmarks.pyThe order book benchmark measures the raw performance of the matching logic and data structures (higher throughput). The matching engine benchmark measures end-to-end throughput with mocked dependencies (lower throughput), showing async orchestration overhead.
Fan-out throughput and push latency from order book operation to broadcast completion, measured with mock WebSocket connections:
| Subscribers | Fan-Out (msg/s) | Push Latency p50 (μs) | Push Latency p99 (μs) |
|---|---|---|---|
| 1 | 2,103,421 | 10.1 | 31.5 |
| 10 | 2,689,678 | 13.2 | 26.5 |
| 50 | 2,775,645 | 27.9 | 52.3 |
| 100 | 2,753,206 | 45.9 | 61.6 |
| 500 | 2,694,437 | 191.3 | 246.1 |
| 1,000 | 2,714,265 | 374.0 | 516.0 |
Throughput stays flat at ~2.7M msg/s thanks to non-blocking put_nowait on
per-client bounded queues. Slow consumers get messages dropped instead of
blocking broadcast to other clients.
Push latency vs polling comparison (single subscriber):
| Polling Interval | Avg Polling Latency | WebSocket Speedup |
|---|---|---|
| 1ms | 0.5ms | 51x |
| 10ms | 5.0ms | 513x |
| 50ms | 25.0ms | 2,564x |
| 100ms | 50.0ms | 5,128x |
| 500ms | 250.0ms | 25,641x |
| 1,000ms | 500.0ms | 51,282x |
Delta streaming achieves an 86.1% bandwidth reduction over snapshot polling (269 bytes vs 1,938 bytes average per update).
Publisher-side cost per broadcast comparing UDP multicast (single sendto) vs
WebSocket (N put_nowait enqueues):
| Subscribers | UDP Multicast (μs/msg) | WebSocket (μs/msg) | Ratio |
|---|---|---|---|
| 1 | 15.4 | 0.5 | 0.03x |
| 10 | 17.4 | 3.8 | 0.22x |
| 50 | 16.4 | 18.3 | 1.12x |
| 100 | 15.0 | 37.6 | 2.51x |
| 500 | 17.0 | 190.9 | 11.22x |
| 1,000 | 8.9 | 414.5 | 46.44x |
| 5,000 | 9.1 | 2,396.9 | 264.6x |
| 10,000 | 12.4 | 5,447.5 | 440.4x |
UDP multicast publisher cost stays flat at ~12-17 μs regardless of subscriber count, while WebSocket scales linearly. At 10,000 subscribers, UDP multicast is ~440x cheaper per broadcast.
End-to-end publish latency (order book operation to deliver) shows the same O(1) vs O(N) pattern:
| Subscribers | UDP Multicast p50 (μs) | WebSocket p50 (μs) | Ratio |
|---|---|---|---|
| 1 | 23.1 | 13.0 | 0.57x |
| 10 | 22.3 | 16.4 | 0.74x |
| 50 | 22.6 | 31.7 | 1.40x |
| 100 | 21.6 | 50.2 | 2.32x |
| 500 | 22.5 | 200.2 | 8.91x |
| 1,000 | 21.2 | 382.3 | 18.03x |
| 5,000 | 22.5 | 2,147.7 | 95.63x |
| 10,000 | 22.3 | 4,785.3 | 214.3x |
UDP multicast holds steady at ~22 μs p50 while WebSocket reaches 4,785 μs at 10,000 subscribers.
# Integration benchmark with real Redis and Kafka.
docker compose up -d redis kafka
sleep 5
python benchmarks/integration/integration_benchmark.py
docker compose down -vThe integration benchmark uses real Redis and Kafka, reflecting true production performance (lowest throughput). This shows the impact of actual I/O operations and is the most realistic measure of production performance.





