Price Prophet: An end-to-end MLOps Pipeline for House Price Prediction

Price Prophet is a production-grade machine learning pipeline built to predict house sale prices using a robust, scalable and reproducible MLOps framework.

FILES & STRUCTURE 📂

data/: Raw zipped data from Kaggle
extracted_data/: Ingested dataset
analysis/: Exploratory notebooks and analyzers
src/: Core modules for feature engineering, model building, evaluation
steps/: ZenML-defined step-wise modular pipeline stages
pipelines/: Training and deployment pipeline definitions
pipeline_runs: Pipeline Runs in form of DAG visualizations, from ZenML Dashboard
run_pipeline.py: Executes training pipeline
run_deployment.py: Executes deployment/inference pipeline
app.py: Streamlit interface for user-side predictions
sample_predict.py: Local REST inference - single sample
sample_batch_predict.py: Local REST inference - batch prediction
exported_model/: Artifacts of one of the best models manually saved via MLflow
requirements.txt: Python dependencies

IMPORTANT LINKS 🔗

📂 Dataset Utilized (Kaggle): Ames Housing Dataset
📄 Original Dataset (Kaggle): House Prices- Advanced Regression Techniques
📓 Notebook (Kaggle): House Pricing EDA and Extensive Modeling

INTRODUCTION

Accurate house price prediction is vital for real estate valuation, investment, and decision-making.
Traditional ML workflows often suffer from:
- Poor reproducibility and pipeline modularity
- Lack of production-readiness and deployment integration
- Minimal tracking or model lifecycle management
Price Prophet addresses this by building a clean, reproducible, and production-ready ML pipeline from ingestion to deployment.
Built with Python, ZenML, MLflow, and Streamlit, it ensures seamless orchestration, experiment tracking, deployment, and user-friendly inference.

PROBLEM DEFINITION

Manual Workflows: Traditional house price prediction lacks automation, requiring repetitive preprocessing, model training, and evaluation steps.
Pipeline Gaps: Most ML solutions stop at model accuracy, missing crucial components like deployment, tracking, and maintainability.
Lack of Production Readiness: Existing approaches don't support reproducible, scalable, or monitorable model deployment in real-world settings.
End-to-End MLOps: There is a clear need for a robust, automated pipeline integrating data handling, modeling, versioning, and serving with real-time inference.

OBJECTIVES 🧰

Ultimate Aim: Build an end-to-end MLOps pipeline.
Perform robust data processing and heavy feature engineering so as to get best model performance.
Utilize and compare multiple regression strategies for price prediction.
Integrate MLOps tools like ZenML and Mlflow.
Build a front-end application for user interaction and visualization.
Eensure production readiness by focusing on modularity, reproducibility, version control, and real-time prediction capability.

METHODOLOGY 🔧

Pipeline Workflow

Core ML Stages

Stage	Description
Data Ingestion	Loads and extracts raw housing data from compressed archives (`archive.zip`).
Initial Preprocessing	Cleans missing values and duplicates; prepares dataset for transformation.
Feature Engineering	Applies log-transformations and constructs domain-inspired features like `Porch`, `Bath_total`, and `FinSF`.
Outlier Handling	Identifies and removes extreme values from critical features such as `SalePrice`.
Data Splitting	Splits data into train/test using stratified sampling while preserving target distribution.
Model Building	Trains a stacked ensemble using base models (XGBoost, LightGBM) with meta-model (Linear Regression).
Model Evaluation	Computes RMSE, MSE, and R² metrics using MLflow logging and visualization.
Deployment Preparation	Logs the model artifacts and expected columns to MLflow for reproducible serving.

MLOps Stack

Model Deployment

Deployment via MLflow Model Deployer Service (Not suitable for Windows OS)
Manual MLflow Model Serving via REST API (Works for MAC/Windows OS)

Inference

Batch Inference (Local REST API):
- Once the model is served manually using MLflow, predictions can be made by sending input data (as JSON) via HTTP POST to the /invocations endpoint.
- A sample_batch_predict.py script is used to load a .csv file, send data to the model server, and save predictions in predictions.csv.
Real-Time Inference (Streamlit Application):
- A user-friendly UI built with Streamlit allows manual input or CSV uploads.
- Sends the data to the same REST endpoint and displays predicted house prices instantly.
- Supports downloading predictions and visualization inside the web app.

RESULTS 📊

High Level EDA
Comparative Model Evaluation Metrics
Streamlit Application (Interface)
Making Predictions on App using second mode

INSTALLATION 🤖

To set up the project on your local machine, follow these steps:

Clone the repository:

    https://github.com/krishnaura45/price-prophet.git
    cd price-prophet

Install dependencies:

    pip install -r requirements.txt

Run training pipeline:

    python run_pipeline.py

Serve model manually (use MLflow UI to fetch run ID):

    mlflow models serve -m "runs:/<your_run_id>/model" -p 1234 --no-conda

Run deployment pipeline:

    python run_deployment.py

Run the Streamlit app:

    streamlit run app.py

CONTRIBUTING

Fork the repository.
Create a new branch.
Commit changes with clear messages.
Submit a pull request.
Ensure new features are tested and documented.

TECH STACK

FEATURES 🚀

🔄 Modular ZenML Steps (each in steps/)
🧐 Advanced EDA and feature insights (analysis/)
🪤 Model evaluation with proper metrics
🚪 Manual model deployment (you control what gets served)
🔗 Streamlit App for UI-based input, visualization and download

FUTURE SCOPE 🔮

Cloud-Native Deployment: Containerize the pipeline using Docker and orchestrate via Kubernetes to enable scalable, consistent, and production-ready deployments across cloud platforms.
Drift Detection & AutoML: Implement data drift monitoring (e.g., with Evidently/WhyLabs) and integrate AutoML frameworks for continual model retraining and optimization.
Model Explainability: Enhance interpretability using SHAP or LIME and display visual explanations in Streamlit for better decision trust and transparency.

REFERENCES

ZenML Docs - https://docs.zenml.io/
MLflow Docs - https://mlflow.org/docs/latest/index.html
CatBoost Documentation - https://catboost.ai/en/docs/

Contributors 🧑‍💼

Krishna Dubey (Pipeline design, ML modeling, deployment, UI dev)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Price Prophet: An end-to-end MLOps Pipeline for House Price Prediction

FILES & STRUCTURE 📂

IMPORTANT LINKS 🔗

INTRODUCTION

PROBLEM DEFINITION

OBJECTIVES 🧰

METHODOLOGY 🔧

Pipeline Workflow

Core ML Stages

MLOps Stack

Model Deployment

Inference

RESULTS 📊

INSTALLATION 🤖

CONTRIBUTING

TECH STACK

FEATURES 🚀

FUTURE SCOPE 🔮

REFERENCES

Contributors 🧑‍💼

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
analysis		analysis
data		data
exported_model		exported_model
extracted_data		extracted_data
pipeline_runs		pipeline_runs
pipelines		pipelines
src		src
steps		steps
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
run_deployment.py		run_deployment.py
run_pipeline.py		run_pipeline.py
sample_batch_predict.py		sample_batch_predict.py
sample_predict.py		sample_predict.py
sample_test.csv		sample_test.csv

Folders and files

Latest commit

History

Repository files navigation

Price Prophet: An end-to-end MLOps Pipeline for House Price Prediction

FILES & STRUCTURE 📂

IMPORTANT LINKS 🔗

INTRODUCTION

PROBLEM DEFINITION

OBJECTIVES 🧰

METHODOLOGY 🔧

Pipeline Workflow

Core ML Stages

MLOps Stack

Model Deployment

Inference

RESULTS 📊

INSTALLATION 🤖

CONTRIBUTING

TECH STACK

FEATURES 🚀

FUTURE SCOPE 🔮

REFERENCES

Contributors 🧑‍💼

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages