Price Prophet is a production-grade machine learning pipeline built to predict house sale prices using a robust, scalable and reproducible MLOps framework.
data/: Raw zipped data from Kaggleextracted_data/: Ingested datasetanalysis/: Exploratory notebooks and analyzerssrc/: Core modules for feature engineering, model building, evaluationsteps/: ZenML-defined step-wise modular pipeline stagespipelines/: Training and deployment pipeline definitionspipeline_runs: Pipeline Runs in form of DAG visualizations, from ZenML Dashboardrun_pipeline.py: Executes training pipelinerun_deployment.py: Executes deployment/inference pipelineapp.py: Streamlit interface for user-side predictionssample_predict.py: Local REST inference - single samplesample_batch_predict.py: Local REST inference - batch predictionexported_model/: Artifacts of one of the best models manually saved via MLflowrequirements.txt: Python dependencies
- 📂 Dataset Utilized (Kaggle): Ames Housing Dataset
- 📄 Original Dataset (Kaggle): House Prices- Advanced Regression Techniques
- 📓 Notebook (Kaggle): House Pricing EDA and Extensive Modeling
- Accurate house price prediction is vital for real estate valuation, investment, and decision-making.
- Traditional ML workflows often suffer from:
- Poor reproducibility and pipeline modularity
- Lack of production-readiness and deployment integration
- Minimal tracking or model lifecycle management
- Price Prophet addresses this by building a clean, reproducible, and production-ready ML pipeline from ingestion to deployment.
- Built with Python, ZenML, MLflow, and Streamlit, it ensures seamless orchestration, experiment tracking, deployment, and user-friendly inference.
- Manual Workflows: Traditional house price prediction lacks automation, requiring repetitive preprocessing, model training, and evaluation steps.
- Pipeline Gaps: Most ML solutions stop at model accuracy, missing crucial components like deployment, tracking, and maintainability.
- Lack of Production Readiness: Existing approaches don't support reproducible, scalable, or monitorable model deployment in real-world settings.
- End-to-End MLOps: There is a clear need for a robust, automated pipeline integrating data handling, modeling, versioning, and serving with real-time inference.
- Ultimate Aim: Build an end-to-end MLOps pipeline.
- Perform robust data processing and heavy feature engineering so as to get best model performance.
- Utilize and compare multiple regression strategies for price prediction.
- Integrate MLOps tools like ZenML and Mlflow.
- Build a front-end application for user interaction and visualization.
- Eensure production readiness by focusing on modularity, reproducibility, version control, and real-time prediction capability.
| Stage | Description |
|---|---|
| Data Ingestion | Loads and extracts raw housing data from compressed archives (archive.zip). |
| Initial Preprocessing | Cleans missing values and duplicates; prepares dataset for transformation. |
| Feature Engineering | Applies log-transformations and constructs domain-inspired features like Porch, Bath_total, and FinSF. |
| Outlier Handling | Identifies and removes extreme values from critical features such as SalePrice. |
| Data Splitting | Splits data into train/test using stratified sampling while preserving target distribution. |
| Model Building | Trains a stacked ensemble using base models (XGBoost, LightGBM) with meta-model (Linear Regression). |
| Model Evaluation | Computes RMSE, MSE, and R² metrics using MLflow logging and visualization. |
| Deployment Preparation | Logs the model artifacts and expected columns to MLflow for reproducible serving. |
-
Deployment via MLflow Model Deployer Service (Not suitable for Windows OS)

-
Manual MLflow Model Serving via REST API (Works for MAC/Windows OS)

-
Batch Inference (Local REST API):
- Once the model is served manually using MLflow, predictions can be made by sending input data (as JSON) via HTTP POST to the /invocations endpoint.
- A sample_batch_predict.py script is used to load a .csv file, send data to the model server, and save predictions in predictions.csv.
-
Real-Time Inference (Streamlit Application):
- A user-friendly UI built with Streamlit allows manual input or CSV uploads.
- Sends the data to the same REST endpoint and displays predicted house prices instantly.
- Supports downloading predictions and visualization inside the web app.
To set up the project on your local machine, follow these steps:
- Clone the repository:
https://github.com/krishnaura45/price-prophet.git
cd price-prophet- Install dependencies:
pip install -r requirements.txt- Run training pipeline:
python run_pipeline.py- Serve model manually (use MLflow UI to fetch run ID):
mlflow models serve -m "runs:/<your_run_id>/model" -p 1234 --no-conda- Run deployment pipeline:
python run_deployment.py- Run the Streamlit app:
streamlit run app.py- Fork the repository.
- Create a new branch.
- Commit changes with clear messages.
- Submit a pull request.
- Ensure new features are tested and documented.
- 🔄 Modular ZenML Steps (each in steps/)
- 🧐 Advanced EDA and feature insights (analysis/)
- 🪤 Model evaluation with proper metrics
- 🚪 Manual model deployment (you control what gets served)
- 🔗 Streamlit App for UI-based input, visualization and download
- Cloud-Native Deployment: Containerize the pipeline using Docker and orchestrate via Kubernetes to enable scalable, consistent, and production-ready deployments across cloud platforms.
- Drift Detection & AutoML: Implement data drift monitoring (e.g., with Evidently/WhyLabs) and integrate AutoML frameworks for continual model retraining and optimization.
- Model Explainability: Enhance interpretability using SHAP or LIME and display visual explanations in Streamlit for better decision trust and transparency.
- ZenML Docs - https://docs.zenml.io/
- MLflow Docs - https://mlflow.org/docs/latest/index.html
- CatBoost Documentation - https://catboost.ai/en/docs/
- Krishna Dubey (Pipeline design, ML modeling, deployment, UI dev)



