Serverless Spark IoT Framework

Hierarchical, Edge-Aware Reinforcement Learning for Serverless IoT Resource Optimization

A novel framework that replaces traditional reactive auto-scaling with a proactive, RL-driven optimization engine for Apache Spark workloads processing high-velocity IoT data streams. The system jointly optimizes compute resources, storage tiering, and compression strategy using hierarchical reinforcement learning with edge-cloud state awareness.

Master's Thesis Project | Indian Institute of Information Technology Kottayam Author: Manvith M | Advisor: Dr. Shajulin Benedict

System Architecture

The framework implements a five-layer pipeline that transforms raw IoT telemetry into optimized Spark workloads through edge-aware reinforcement learning.

Data Flow: IoT sensors generate telemetry that passes through edge gateways (burst prediction), dual-lane ingestion (MQTT/Kafka), and into the RL optimization engine before reaching the Spark cluster. The RL engine continuously observes a 13-dimensional state vector and outputs resource allocation decisions in real time.

Key Contributions

1. Hierarchical Contextual RL (Meta-Controller)

Traditional RL agents struggle with conflicting optimization objectives. A single reward function with fixed weights cannot adapt to shifting workload priorities (e.g., cost-sensitive budget monitoring vs. latency-critical healthcare processing).

The framework introduces a two-level hierarchy:

Meta-Controller (TD3 Agent): Observes the current IoT scenario context and dynamically tunes the reward weights for the lower-level agent. In healthcare mode, it prioritizes latency minimization; in budget mode, it prioritizes cost reduction.
Resource Allocation Agent (PPO): Operates with the reward function defined by the Meta-Controller, optimizing three action dimensions: executor count, storage tier, and compression level.

This separation enables instant behavioral switching without retraining.

2. Adaptive Shuffle and Storage Tiering

Spark's shuffle operation is the primary performance bottleneck in distributed processing. The framework replaces the default single-tier storage with RL-driven dynamic tier selection based on data temperature analysis.

Tier	Medium	Latency	Cost	Use Case
Tier 0 (HOT)	Redis (In-Memory)	~50 us	Highest	Iterative ML, real-time bursts
Tier 1 (WARM)	NVMe SSD	~200 us	Moderate	Standard ETL workloads
Tier 2 (COLD)	S3 / HDD	~5 ms	Lowest	Log archival, batch processing

The RL agent classifies each data batch by reuse probability and assigns it to the optimal storage tier, achieving 2.8x throughput improvement while managing cost trade-offs.

3. Edge-Cloud State Awareness

Cloud auto-scaling suffers from a fundamental limitation: cold start latency. Booting new Spark executors takes 45-60 seconds, making reactive approaches inadequate for IoT burst patterns.

The framework solves this by incorporating edge signals into the RL state vector:

Edge gateways run lightweight burst prediction models on local traffic patterns
A burst_prediction signal is transmitted to the cloud via a low-latency control plane
The RL agent observes this signal and proactively pre-provisions executors before the data flood arrives
When the burst hits, executors are already warm and processing begins immediately

This eliminates the auto-scaling lag problem, reducing SLA violations from 54% (standard HPA) to 19%.

4. Cold Start Mitigation with Pattern-Aware Pre-Warming

Building on edge-cloud awareness, the system includes a dedicated cold start controller that uses IoT traffic pattern analysis to:

Extract temporal features (periodicity, trend, seasonality) from historical traffic
Feed pattern features into the RL agent's expanded state space
Execute proactive pre_warm actions to start executors ahead of predicted demand spikes
Coordinate with the serverless orchestration layer (OpenFaaS) for efficient resource lifecycle management

Evaluation Results

A head-to-head tournament was conducted over a stochastic 500-step IoT workload trace, comparing the proposed system against industry-standard baselines.

Benchmark Comparison

Policy	Description	SLA Violations	Cost	Verdict
Fixed	Static 15 Executors	90.0%	$2,251	Failed (overloaded under bursts)
Dexter (HPA)	Reactive Kubernetes Auto-Scaler	54.4%	$2,993	Too slow for IoT burst patterns
Seer	Linear Predictive Scaling	100.0%	$1,876	Under-provisioned shuffle storage
Edge-Aware RL	Proposed System	19.2%	$3,287	3x reliability improvement

Performance Attribution

Metric	Before (Legacy)	After (Proposed)	Innovation Responsible
CPU Utilization	~42% (idle)	~95% (saturated)	Adaptive Shuffle (Redis tier)
Data Throughput	850 MB/s	2,400 MB/s	Adaptive Shuffle (in-memory I/O)
SLA Violations	35% (reactive)	19% (proactive)	Edge-Cloud Awareness (pre-provisioning)
Cost	$2,993	$3,287 (+9%)	Both (Redis + extra VMs for reliability)

Conclusion: The system achieves a 3x improvement in service reliability at a 9% cost premium -- a trade-off justified in latency-critical IoT deployments such as healthcare monitoring and autonomous vehicle coordination.

Tech Stack

Component	Technology
Stream Processing	Apache Spark 3.5 (Structured Streaming)
RL Framework	Stable-Baselines3 (PPO, TD3), Gymnasium
Ingestion	Apache Kafka, Eclipse Mosquitto (MQTT)
Hot Storage	Redis (In-Memory Shuffle)
Warm Storage	NVMe SSD
Cold Storage	Amazon S3 / HDD
Serverless	OpenFaaS (Function Orchestration)
Infrastructure	Docker, Kubernetes
Monitoring	Flask API, Streamlit Dashboard
Language	Python 3.10+

Installation

Prerequisites

Python 3.10+
Java 11+ (for Apache Spark)
Docker (optional, for full-stack deployment)

Setup

git clone https://github.com/manvith2003/serverless-spark-iot-framework.git
cd serverless-spark-iot-framework
pip install -r requirements.txt

Environment (Conda)

conda env create -f environment.yml
conda activate serverless-spark-iot

Usage

Train the RL Agent

python optimization/resource_allocation/ppo_agent.py --train --timesteps 100000

Run the Two-RL Continuous Loop

python rl_core/continuous_two_rl_loop.py

Run Benchmark Tournament

python benchmarks/run_eval.py

Generates benchmarks/results_summary.csv and benchmark comparison plots.

Run End-to-End Demo

python demos/end_to_end_pipeline_demo.py

Run Cold Start Mitigation Demo

python demos/cold_start_demo.py

Launch Monitoring Dashboard

# Streamlit Dashboard
streamlit run dashboard/streamlit_app/app.py

# Or open the static HTML dashboard
open dashboard/two_rl_dashboard.html

Project Structure

serverless-spark-iot-framework/
|
|-- rl_core/                          # Core RL Engine
|   |-- continuous_two_rl_loop.py     # Two-RL continuous training loop
|   |-- continuous_scaler.py          # Dynamic executor scaling
|   |-- phase2_meta_controller.py     # TD3 Meta-Controller agent
|   |-- multi_objective_scheduler.py  # Multi-objective optimization
|   |-- pattern_feature_extractor.py  # IoT traffic pattern analysis
|
|-- optimization/                     # RL Optimization Modules
|   |-- resource_allocation/          # PPO resource agent
|   |-- meta_controller/              # TD3 meta-agent weights & config
|   |-- shuffle_optimization/         # Adaptive shuffle tier logic
|   |-- cost_models/                  # Cloud cost estimation
|   |-- workload_characterization/    # Workload profiling
|
|-- spark_core/                       # Apache Spark Integration
|   |-- streaming/                    # Structured streaming jobs
|   |-- batch/                        # Batch processing pipelines
|   |-- udfs/                         # Custom Spark UDFs
|   |-- deployment/                   # Spark deployment configs
|
|-- serverless/                       # Serverless Orchestration
|   |-- openfaas/                     # OpenFaaS function definitions
|   |   |-- rl-scaling-function/      # RL-driven scaling handler
|   |   |-- spark_driver_client.py    # Spark driver interface
|   |   |-- stack.yml                 # OpenFaaS stack config
|   |-- cold_start_controller.py      # Pre-warming controller
|
|-- ingestion/                        # Data Ingestion Layer
|   |-- kafka_ingestion/              # Kafka producers & consumers
|   |-- mqtt_ingestion/               # MQTT broker integration
|   |-- data_generators/              # Synthetic IoT data generators
|   |-- compression/                  # Data compression utilities
|
|-- edge_processing/                  # Edge Computing Layer
|   |-- edge_node/                    # Edge device simulation
|   |-- predictive_prefetch/          # Prefetch optimization
|   |-- federated_learning/           # Federated model updates
|   |-- state_sync/                   # Edge-cloud state sync
|
|-- cross_cloud/                      # Multi-Cloud Support
|   |-- abstraction_layer/            # Cloud-agnostic interface
|   |-- cost_aware_selection/         # Cross-cloud cost optimizer
|   |-- multi_cloud_orchestrator/     # Multi-cloud orchestration
|
|-- benchmarks/                       # Evaluation Framework
|   |-- policies.py                   # Baseline policy implementations
|   |-- run_eval.py                   # Tournament runner
|
|-- evaluation/                       # Metrics and Visualization
|   |-- benchmarks/                   # Extended benchmark suites
|   |-- metrics/                      # Performance metric collectors
|   |-- visualization/                # Result plotting utilities
|
|-- dashboard/                        # Monitoring and Visualization
|   |-- flask_api/                    # REST API for metrics
|   |-- streamlit_app/               # Interactive Streamlit dashboard
|   |-- index.html                    # Static monitoring dashboard
|   |-- two_rl_dashboard.html         # Two-RL system dashboard
|
|-- demos/                            # Demonstration Scripts
|   |-- end_to_end_pipeline_demo.py   # Full pipeline demonstration
|   |-- cold_start_demo.py            # Cold start mitigation demo
|
|-- configs/                          # Configuration Files
|-- data/                             # Training data and model weights
|-- tests/                            # Unit and integration tests
|-- docs/                             # Architecture diagrams
|-- docker-compose.yml                # Full-stack container setup
|-- environment.yml                   # Conda environment spec
|-- requirements.txt                  # Python dependencies

Research Information

This work is part of a Master's thesis at the Indian Institute of Information Technology Kottayam.


Author	Manvith M
Advisor	Dr. Shajulin Benedict
Institution	IIIT Kottayam
Contact	manvith131250@gmail.com

Related Reports

License

MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
benchmarks		benchmarks
configs		configs
cross_cloud		cross_cloud
dashboard		dashboard
data/models		data/models
demos		demos
docs		docs
edge_processing		edge_processing
evaluation		evaluation
ingestion		ingestion
optimization		optimization
rl_core		rl_core
scripts		scripts
serverless		serverless
spark_core		spark_core
state_management		state_management
tests		tests
utils		utils
.gitignore		.gitignore
CODE_WALKTHROUGH.md		CODE_WALKTHROUGH.md
DEMO_GUIDE.md		DEMO_GUIDE.md
README.md		README.md
adaptive_shuffle_report.txt		adaptive_shuffle_report.txt
benchmark_comparison_real.png		benchmark_comparison_real.png
benchmarking_report.txt		benchmarking_report.txt
chapter_conclusion.tex		chapter_conclusion.tex
chapter_hierarchical_rl_openfaas.tex		chapter_hierarchical_rl_openfaas.tex
chapter_introduction.tex		chapter_introduction.tex
chapter_literature_review.tex		chapter_literature_review.tex
chapter_results_discussion.tex		chapter_results_discussion.tex
chapter_rl_optimizer.tex		chapter_rl_optimizer.tex
chapter_system_architecture.tex		chapter_system_architecture.tex
docker-compose.yml		docker-compose.yml
edge_cloud_report.txt		edge_cloud_report.txt
environment.yml		environment.yml
final_achievements_table.txt		final_achievements_table.txt
final_review_presentation.tex		final_review_presentation.tex
final_two_rl_architecture.png		final_two_rl_architecture.png
generate_real_graphs.py		generate_real_graphs.py
ieee_conference_paper.tex		ieee_conference_paper.tex
incremental_benchmark_report.txt		incremental_benchmark_report.txt
inspect_model.py		inspect_model.py
main_thesis_final.tex		main_thesis_final.tex
master_project_report.txt		master_project_report.txt
meta_controller_report.txt		meta_controller_report.txt
ppo_reward_curve.png		ppo_reward_curve.png
presentation_slides.tex		presentation_slides.tex
requirements.txt		requirements.txt
rl_learning_curves.png		rl_learning_curves.png
serverless_spark_iot_thesis_report.md		serverless_spark_iot_thesis_report.md
td3_weights_curve.png		td3_weights_curve.png
thesis_300_page_master.txt		thesis_300_page_master.txt
thesis_full.tex		thesis_full.tex
verify_end_to_end_scenarios.py		verify_end_to_end_scenarios.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Serverless Spark IoT Framework

Table of Contents

System Architecture

Key Contributions

1. Hierarchical Contextual RL (Meta-Controller)

2. Adaptive Shuffle and Storage Tiering

3. Edge-Cloud State Awareness

4. Cold Start Mitigation with Pattern-Aware Pre-Warming

Evaluation Results

Benchmark Comparison

Performance Attribution

Tech Stack

Installation

Prerequisites

Setup

Environment (Conda)

Usage

Train the RL Agent

Run the Two-RL Continuous Loop

Run Benchmark Tournament

Run End-to-End Demo

Run Cold Start Mitigation Demo

Launch Monitoring Dashboard

Project Structure

Research Information

Related Reports

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Serverless Spark IoT Framework

Table of Contents

System Architecture

Key Contributions

1. Hierarchical Contextual RL (Meta-Controller)

2. Adaptive Shuffle and Storage Tiering

3. Edge-Cloud State Awareness

4. Cold Start Mitigation with Pattern-Aware Pre-Warming

Evaluation Results

Benchmark Comparison

Performance Attribution

Tech Stack

Installation

Prerequisites

Setup

Environment (Conda)

Usage

Train the RL Agent

Run the Two-RL Continuous Loop

Run Benchmark Tournament

Run End-to-End Demo

Run Cold Start Mitigation Demo

Launch Monitoring Dashboard

Project Structure

Research Information

Related Reports

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages