"An MLOps pipeline is only as good as its organization."
Welcome to my end-to-end Machine Learning pipeline project! I built this repository to demonstrate how to implement production-grade MLOps best practices using ZenML and Scikit-Learn.
While many projects focus solely on hyper-tuning a model to get the highest accuracy, this project focuses heavily on Artifact Management, Metadata Tracking, and Code Maintainability.
It trains a regression model to predict customer review scores using the famous Olist E-commerce dataset, but the real star of the show is the architecture behind it.
As ML projects scale, pipelines often turn into a chaotic mess of untracked Jupyter notebooks. You end up with 50 models, 200 orphaned datasets, and no idea which model version is currently in production.
To solve this, I implemented the ZenML Tagging & Organization Framework from the ground up. If you explore the code, you'll find:
- Strict Artifact Tagging: Every piece of data, model, and metric is tagged at birth using a centralized
Tag Registry(using Python Enums to prevent typos!). - Dynamic Evaluation Tags: The pipeline automatically evaluates the model and tags the artifacts dynamically (e.g.,
[performance-high-r2]or[performance-low-r2]) based on the actual evaluation metrics. - The Strategy Pattern: The codebase uses clean OOP design. Data cleaning and model training algorithms can be easily swapped without ever touching the core ZenML
@stepdefinitions. - Model Control Plane: Models aren't just saved as
.pklfiles; they are tracked as first-class entities in the ZenML Model Registry.
I've strictly separated the ML business logic (src/) from the pipeline orchestration (steps/).
Mlops-project-zenml/
├── pipeline/
│ └── training_pipeline.py ← Connects the steps + links to Model Registry
├── steps/
│ ├── ingest_data.py ← Loads raw CSV → tags as "artifact-raw"
│ ├── clean_data.py ← Preprocesses + dynamic data quality tags
│ ├── train_model.py ← Trains model → tags as "artifact-model"
│ └── evaluation.py ← Evaluates + dynamic performance tags
├── src/
│ ├── data_cleaning.py ← DataStrategy ABC (OOP logic)
│ ├── model_development.py ← Model ABC (sklearn Pipeline + Imputers)
│ └── evaluation.py ← Evaluation ABC (RMSE, R2, MSE)
├── utils/
│ └── tag_manager.py ← Custom CLI to query tagged/orphaned resources
├── configuration/
│ └── config.yaml
├── tag_registry.py ← Central Enum registry for all metadata tags
└── run_pipeline.py ← Main entry point
Here is how data moves through the system, getting tagged at every stage:
Load Data ──→ Clean & Split ──→ Train Model ──→ Evaluate
│ │ │ │
▼ ▼ ▼ ▼
[artifact-raw] [artifact- [artifact- [artifact-metric]
[domain- processed] model] + dynamic tags:
ecommerce] + dynamic: [algorithm- [performance-
[quality-*] linear- high-r2]
regression]
I rely on a modern, lightweight MLOps stack:
- Orchestration: ZenML
- Package Management: uv (lightning fast!)
- ML Framework: Scikit-Learn
- Data Processing: Pandas, NumPy
Want to run this on your own machine? It takes less than 2 minutes.
This project uses uv for incredibly fast dependency management.
# Clone the repository
git clone https://github.com/abhilashpanda04/Mlops-project-zenml.git
cd Mlops-project-zenml
# Install dependencies and create a virtual environment instantly
uv sync
# Activate the virtual environment
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Initialize ZenML locally
zenml initExecute the full training pipeline:
python run_pipeline.pyI built a custom utility to showcase the power of ZenML tagging. Run the tag manager to easily find your best performing models and detect orphaned pipeline runs:
python -m utils.tag_managerTo see the visual DAG, your model artifacts, and metrics, start the local ZenML server:
zenml upIf you're interested in MLOps, system design, or have feedback on my implementation of the Strategy pattern, I'd love to connect! Feel free to open an issue, submit a PR, or reach out directly.
Abhilash Kumar Panda
- 📧 Email: abhilashk.isme1517@gmail.com
- 🔗 LinkedIn: Abhilash Kumar Panda
- 🌐 Portfolio: abhilashpanda04.github.io
- GitHub: @abhilashpanda04
If you found this architecture helpful or interesting, please consider giving the repo a star!