CardioRiskAI

Overview

CardioRiskAI is a machine learning–based clinical intelligence system designed to classify myocardial infarction (MI) outcomes using structured patient data.

The project focuses on responsibly applying supervised learning to high-dimensional, imbalanced medical data in order to support early risk identification and outcome awareness.
It demonstrates an end-to-end healthcare ML workflow — from domain-aware preprocessing to model deployment.

Clinical Problem Statement

Myocardial Infarction (MI), commonly known as a heart attack, remains one of the leading causes of mortality worldwide, particularly within the first year following the event.

Accurate classification of MI outcomes can assist clinicians and healthcare systems in:

Understanding patient risk profiles
Identifying high-risk cases early
Supporting data-driven clinical decision making

Objective:
Build a robust classification model to predict MI outcome categories (LET_IS) using patient-level clinical and demographic attributes.

Dataset Overview

Records: 1,700 patients
Features: 124 clinical, demographic, and diagnostic variables
Target Variable: LET_IS (MI outcome class)

The dataset includes variables related to:

Age and gender
Blood pressure measurements
Cholesterol levels
Diabetes and hypertension indicators
Other clinically relevant attributes

Due to the medical nature of the data, domain awareness was applied throughout preprocessing and feature selection.

Methodology & System Design

1. Data Understanding & Quality Assessment

Inspected dataset shape and structure
Analyzed missing values using heatmaps
Removed columns with excessive missingness only after clinical relevance review
Verified absence of duplicate records

2. Missing Value Handling

Imputation strategies were selected based on feature type:

Binary features: Mode
Categorical features: Mode
Numerical features: Mean

Post-imputation validation confirmed no remaining null values.

3. Exploratory Data Analysis (EDA)

Key analyses included:

Age distribution (majority between 50–70 years)
Gender distribution (male-dominant cohort)
Blood pressure and cholesterol trends
MI occurrence by:
- Age
- Gender
- Diabetes status
- Hypertension status

Blood pressure categories (normal, elevated, stage 1–5 hypertension) were analyzed to understand clinical severity distribution.

4. Class Imbalance Handling

The target variable exhibited significant class imbalance, with the majority class dominating outcome distribution.

To address this:

Applied SMOTE (Synthetic Minority Over-sampling Technique)
Balanced the dataset prior to model training
Ensured minority outcome classes were adequately represented

5. Model Development & Evaluation

Multiple classification models were trained and evaluated:

Decision Tree Classifier
Random Forest Classifier
XGBoost Classifier

Model selection was based on clinical relevance of metrics, not accuracy alone:

Precision
Recall
F1-score
Support per class

6. Model Selection & Optimization

XGBoost consistently outperformed other models across key metrics
Hyperparameter tuning performed using GridSearchCV
Final model achieved ~92% accuracy with strong precision-recall balance
Cross-validation used to ensure generalization stability

XGBoost was selected due to its ability to:

Handle high-dimensional data
Capture non-linear relationships
Remain robust on imbalanced datasets

Results & Insights

Class imbalance significantly affected baseline model performance
SMOTE improved recall for minority outcome classes
XGBoost provided the best trade-off between performance and interpretability
Feature importance analysis highlighted clinically meaningful predictors

The final model demonstrates strong potential for risk stratification support, not automated diagnosis.

Deployment

The trained model was deployed using Streamlit, enabling:

Interactive input of patient attributes
Real-time MI outcome classification
Easy demonstration of model behavior

Deployment logic is implemented in: MCI_Web_App.py

This completes the EDA → modeling → evaluation → deployment lifecycle.

Tech Stack

Language: Python
Data Processing: Pandas, NumPy
Visualization: Matplotlib, Seaborn
Modeling: Scikit-learn, XGBoost
Imbalance Handling: imbalanced-learn (SMOTE)
Evaluation Metrics: Precision, Recall, F1-score
Deployment: Streamlit

Repository Structure

cardio-risk-ai/
│
├── MCI_EDA.ipynb
├── MCI_Feature_Selection_Model_Building_Evaluation_techniques.ipynb
├── MCI_Web_App.py
├── Myocardial_attribute.txt
├── requirements.txt
├── .gitignore
├── LICENSE
└── README.md

Key Engineering Decisions

Applied domain awareness during feature removal and imputation
Addressed class imbalance before model training
Prioritized recall and F1-score over raw accuracy
Selected XGBoost for performance stability on complex medical data

Learnings

Importance of metric selection in healthcare ML
Risks of ignoring class imbalance in clinical datasets
Translating clinical intuition into data-driven features
Deploying ML responsibly for decision support use cases

Limitations & Ethical Considerations

This model is not a diagnostic tool
Intended strictly for educational and research purposes
Predictions should not replace professional medical judgment

Future Improvements

Incorporate SHAP-based explainability
Explore cost-sensitive learning approaches
Validate model on external clinical datasets
Add uncertainty estimation for predictions

License

This project is licensed under the MIT License.

Author

Kumaran Elumalai
AI / ML Engineer | Data Scientist

🔗 GitHub: https://github.com/Kumaran-Elumalai
🔗 LinkedIn: https://linkedin.com/in/kumaran-elumalai

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CardioRiskAI

Overview

Clinical Problem Statement

Dataset Overview

Methodology & System Design

1. Data Understanding & Quality Assessment

2. Missing Value Handling

3. Exploratory Data Analysis (EDA)

4. Class Imbalance Handling

5. Model Development & Evaluation

6. Model Selection & Optimization

Results & Insights

Deployment

Tech Stack

Repository Structure

Key Engineering Decisions

Learnings

Limitations & Ethical Considerations

Future Improvements

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE		LICENSE
MCI_EDA.ipynb		MCI_EDA.ipynb
MCI_Feature_Selection_Model_Building_Evaluation_techniques.ipynb		MCI_Feature_Selection_Model_Building_Evaluation_techniques.ipynb
MCI_Web_App.py		MCI_Web_App.py
Myocardial_attribute.txt		Myocardial_attribute.txt
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

CardioRiskAI

Overview

Clinical Problem Statement

Dataset Overview

Methodology & System Design

1. Data Understanding & Quality Assessment

2. Missing Value Handling

3. Exploratory Data Analysis (EDA)

4. Class Imbalance Handling

5. Model Development & Evaluation

6. Model Selection & Optimization

Results & Insights

Deployment

Tech Stack

Repository Structure

Key Engineering Decisions

Learnings

Limitations & Ethical Considerations

Future Improvements

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages