An interactive, browser-based machine learning education tool for healthcare professionals.
SENG 430 - Software Quality Assurance Cankaya University - Spring 2025-2026 Instructor: Dr. Sevgi Koyuncu Tunç
HealthWithSevgi guides clinicians through a complete ML pipeline in 7 steps — from selecting a medical specialty to training a model, interpreting predictions with SHAP, and auditing fairness — all with zero coding required.
Live Demo | Jira Board | Figma Designs | Setup Guide
- Overview
- The 7-Step Pipeline
- Supported Specialties
- ML Models
- Tech Stack
- Architecture
- Project Structure
- Getting Started
- API Reference
- Testing
- Deployment
- Branch Strategy
- Team
- License
Healthcare professionals increasingly encounter AI/ML in clinical settings but rarely get hands-on experience with how these systems work. HealthWithSevgi bridges that gap by providing an intuitive, wizard-style interface that walks users through every stage of the machine learning lifecycle using real clinical datasets.
Key capabilities:
- 20 medical specialties with real-world clinical datasets (Cardiology, Oncology, Nephrology, Neurology, ICU/Sepsis, Dermatology, and more)
- 8 ML classifiers with interactive hyperparameter tuning via sliders
- SHAP-based explainability — global feature importance and single-patient waterfall explanations
- Fairness auditing — subgroup performance analysis across demographics with bias detection
- EU AI Act compliance checklist with downloadable PDF certificate
- No server-side data storage — all session data is held in-memory and evicted automatically
| Step | Name | What Happens |
|---|---|---|
| 1 | Clinical Context | Introduces the medical problem the AI will address. Displays the clinical question, why it matters, and the 7-step roadmap. |
| 2 | Data Exploration | Upload a CSV file (up to 50 MB) or load a built-in clinical dataset. Inspect column statistics, missing values, and class distribution. Confirm the target variable. |
| 3 | Data Preparation | Configure preprocessing: train/test split ratio, missing value strategy (median/mode/drop), normalization (z-score/min-max), SMOTE for class imbalance, and outlier handling (IQR/z-score clipping). |
| 4 | Model & Parameters | Choose from 8 ML models. Adjust hyperparameters with intuitive sliders. Optionally enable hyperparameter tuning (RandomizedSearchCV) and feature selection (VarianceThreshold + SelectKBest). |
| 5 | Results & Evaluation | View accuracy, sensitivity, specificity, precision, F1, AUC-ROC, and MCC. Explore interactive ROC curves, precision-recall curves, and confusion matrices. Detect overfitting via cross-validation comparison. |
| 6 | Explainability | Global feature importance ranking with clinical name mapping. Single-patient SHAP waterfall charts with plain-language summaries (e.g., "High glucose increases diabetes risk by 0.23"). |
| 7 | Ethics & Bias | Subgroup fairness audit (by age, gender, ethnicity). Bias warnings for performance gaps >10%. EU AI Act compliance checklist. Real-world case studies of AI bias in healthcare. Downloadable PDF compliance certificate. |
| # | Specialty | Prediction Task | Dataset | Samples |
|---|---|---|---|---|
| 1 | Cardiology | 30-day heart failure mortality | Heart Failure Clinical Records | ~300 |
| 2 | Radiology | Pneumonia detection (chest X-ray metadata) | NIH Chest X-ray | 100K+ |
| 3 | Nephrology | Chronic kidney disease detection | UCI CKD | 400 |
| 4 | Oncology - Breast | Malignant vs. benign biopsy | Wisconsin Breast Cancer | 569 |
| 5 | Neurology - Parkinson's | Parkinson's from voice biomarkers | UCI Parkinson's | 195 |
| 6 | Endocrinology - Diabetes | Diabetes onset within 5 years | Pima Indians | 768 |
| 7 | Hepatology - Liver | Liver disease detection | Indian Liver Patient | 583 |
| 8 | Cardiology - Stroke | Stroke risk prediction | Kaggle Stroke Prediction | 5,110 |
| 9 | Mental Health | Depression severity (PHQ-9) | Kaggle Depression | ~1,000 |
| 10 | Pulmonology - COPD | COPD exacerbation risk | PhysioNet + Kaggle | ~1,000 |
| 11 | Haematology - Anaemia | Anaemia type classification | Kaggle Anaemia | ~400 |
| 12 | Dermatology | Benign vs. malignant skin lesion | HAM10000 metadata | ~10K |
| 13 | Ophthalmology | Diabetic retinopathy detection | UCI Diabetic Retinopathy | 1,151 |
| 14 | Orthopaedics - Spine | Disc herniation / spondylolisthesis | UCI Vertebral Column | 310 |
| 15 | ICU / Sepsis | Sepsis onset within 6 hours | PhysioNet Sepsis | ~40K |
| 16 | Obstetrics - Fetal Health | Fetal health classification (CTG) | UCI Fetal Health | 2,126 |
| 17 | Cardiology - Arrhythmia | Arrhythmia detection (ECG) | UCI Arrhythmia | 452 |
| 18 | Oncology - Cervical | Cervical cancer risk | UCI Cervical Cancer | 858 |
| 19 | Thyroid / Endocrinology | Thyroid function classification | UCI Thyroid | 9,172 |
| 20 | Pharmacy - Readmission | Hospital readmission risk | UCI Diabetes 130-US | 101,766 |
| Model | Category | Key Hyperparameters |
|---|---|---|
| K-Nearest Neighbors | Instance-based | k (1-25), distance metric |
| Support Vector Machine | Boundary-based | C (0.01-100), kernel (linear/rbf/poly) |
| Decision Tree | Tree-based | max_depth (1-20), criterion (gini/entropy) |
| Random Forest | Ensemble | n_estimators (10-500), max_depth |
| Logistic Regression | Linear | C (0.001-100), solver (lbfgs/saga) |
| Naive Bayes | Probabilistic | var_smoothing (1e-12 to 1e-3) |
| XGBoost | Gradient Boosting | n_estimators, max_depth, learning_rate |
| LightGBM | Gradient Boosting | n_estimators, max_depth, learning_rate |
All models are trained with balanced class weights where supported. Optional hyperparameter tuning uses RandomizedSearchCV (20 iterations, 3-fold CV). Feature selection combines VarianceThreshold with SelectKBest (mutual information).
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | React 18, TypeScript, Vite | Single-page wizard application |
| UI Components | Recharts, Lucide Icons, react-dropzone | Charts, icons, file uploads |
| State Management | TanStack React Query | Server state caching and synchronization |
| Backend | FastAPI, Python 3.12 | REST API with auto-generated OpenAPI docs |
| ML Engine | scikit-learn, XGBoost, LightGBM | Model training, evaluation, cross-validation |
| Explainability | SHAP | TreeExplainer (tree models), KernelExplainer (linear), permutation importance |
| Data Processing | pandas, numpy, imbalanced-learn | Data cleaning, normalization, SMOTE |
| PDF Generation | ReportLab | Compliance certificate export |
| Containerization | Docker (multi-stage) | Production deployment |
| Hosting | HuggingFace Spaces | Live demo environment |
| Package Manager | pnpm (frontend), pip (backend) | Dependency management |
📐 Full Architecture Diagrams (Google Drive) — C4 model diagrams (System Context, Container, Component, Code levels), toolchain diagrams, and data flow sequences.
+---------------------+
| Browser (React) |
| Wizard UI (SPA) |
+----------+----------+
|
HTTP/REST (JSON)
|
+----------v----------+
| FastAPI Backend |
+----------+----------+
|
+----------------------+----------------------+
| | | |
+--------v---+ +------v-----+ +-----v------+ +-----v--------+
| DataService| | MLService | |ExplainSvc | | EthicsService|
| | | | | | | |
| - Explore | | - Train | | - SHAP | | - Subgroup |
| - Prepare | | - Evaluate | | - Waterfall| | - Bias detect|
| - SMOTE | | - Compare | | - Clinical | | - EU AI Act |
+-----+------+ +------+-----+ +------+-----+ +------+-------+
| | | |
v v v v
+-----------+ +------------+ +------------+ +-----------+
| In-Memory | | In-Memory | | SHAP | | ReportLab |
| Sessions | | Models | | Library | | PDF Gen |
| (LRU 50) | | (LRU 100+)| | | | |
+-----------+ +------------+ +------------+ +-----------+
Data flow: Upload CSV -> Explore columns -> Preprocess (split, normalize, SMOTE) -> Train model -> Evaluate metrics -> SHAP explanations -> Fairness audit -> PDF certificate
HealthWithSevgi/
|
+-- frontend/ # React 18 + Vite + TypeScript
| +-- src/
| | +-- pages/ # Step 1-7 wizard pages
| | | +-- Step1ClinicalContext.tsx
| | | +-- Step2DataExploration.tsx
| | | +-- Step3DataPreparation.tsx
| | | +-- Step4ModelParameters.tsx
| | | +-- Step5Results.tsx
| | | +-- Step6Explainability.tsx
| | | +-- Step7Ethics.tsx
| | +-- components/ # Reusable UI components
| | | +-- NavBar.tsx # Specialty switcher, glossary
| | | +-- WizardProgress.tsx # Step progress tracker
| | | +-- SpecialtySelector.tsx # 20-specialty grid
| | | +-- ColumnMapperModal.tsx # Target column confirmation
| | | +-- ErrorModal.tsx # Error display modal
| | | +-- charts/ # Visualization components
| | | +-- ConfusionMatrixChart.tsx # 2x2 confusion matrix
| | | +-- KNNScatterCanvas.tsx # KNN decision boundary
| | | +-- PRCurveChart.tsx # Precision-Recall curve
| | | +-- ROCCurveChart.tsx # ROC curve with AUC badge
| | +-- api/ # API client layer
| | | +-- client.ts # Axios instance + interceptors
| | | +-- specialties.ts # Specialty endpoints
| | | +-- data.ts # Explore + Prepare endpoints
| | | +-- ml.ts # Train + Compare endpoints
| | | +-- explain.ts # Explainability + Ethics + Certificate
| | +-- types/index.ts # Shared TypeScript interfaces
| | +-- styles/globals.css # Global CSS + theme variables
| | +-- App.tsx # Main wizard state manager
| | +-- main.tsx # Application entry point
| +-- package.json
| +-- vite.config.ts
|
+-- backend/ # FastAPI REST API + ML engine
| +-- app/
| | +-- main.py # FastAPI setup, CORS, routers
| | +-- routers/
| | | +-- data_router.py # /specialties, /explore, /prepare
| | | +-- ml_router.py # /train, /compare, /models
| | | +-- explain_router.py # /explain/*, /ethics, /certificate
| | +-- services/
| | | +-- data_service.py # Dataset loading, exploration, preprocessing
| | | +-- ml_service.py # Model building, training, evaluation
| | | +-- explain_service.py # SHAP explanations, clinical mapping
| | | +-- ethics_service.py # Fairness audit, bias detection
| | | +-- certificate_service.py # PDF certificate generation
| | | +-- specialty_registry.py # 20 specialty definitions + datasets
| | +-- models/
| | | +-- schemas.py # Data exploration/preparation DTOs
| | | +-- ml_schemas.py # Training/evaluation DTOs
| | | +-- explain_schemas.py # Explainability/ethics DTOs
| | +-- utils/ # Utility modules
| +-- data_cache/ # Cached clinical CSV datasets
| +-- datasets/ # Additional dataset storage
| +-- tests/ # pytest test suite (178 tests)
| | +-- conftest.py # Shared fixtures
| | +-- test_step1_clinical_context.py
| | +-- test_step2_data_exploration.py
| | +-- test_step3_data_preparation.py
| | +-- test_step6_explainability.py
| | +-- test_step7_ethics.py
| | +-- test_certificate.py
| +-- pytest.ini
| +-- requirements.txt
|
+-- hf-space/ # HuggingFace Spaces deployment
| +-- main_hf.py # Combined API + SPA entrypoint
| +-- Dockerfile # HF-specific Docker build
| +-- README.md # HF Space metadata
|
+-- docs/ # Documentation & design specs
| +-- ML_Tool_User_Guide.md # Course user manual
| +-- Sprint_1_Assignment.md # Sprint 1 requirements
| +-- Clinical_Specialties_Dataset_Collection.pdf
| +-- diagrams/ # C4 architecture + toolchain PDFs
| +-- drawio/ # Editable draw.io source files
| +-- mermaid/ # C4 architecture (Mermaid source)
| +-- iso42001/ # ISO 42001 AI governance report
| +-- seng430-sprints/ # Sprint requirements from instructor
| +-- qa/ # QA test reports (PDF)
| +-- reports/ # Progress reports + screenshots
|
+-- jira/ # Jira backlog documentation
| +-- JIRA.md # Product backlog report
| +-- SPRINT_1_TASK_BOARD.md # Sprint 1 task breakdown
|
+-- local/ # Local-only extensions
| +-- model-arena/ # Model Arena comparison feature
| +-- arena/ # Backend (router, service, schemas)
| +-- frontend/ # Frontend (ArenaPage, charts, hooks)
|
+-- .github/
| +-- pull_request_template.md # PR template linked to Jira
| +-- workflows/deploy-hf.yml # Auto-deploy to HuggingFace on release
|
+-- Dockerfile # Multi-stage build (Node + Python)
+-- docker-compose.yml # Local development orchestration
+-- .dockerignore
+-- .gitignore
+-- CLAUDE.md # AI coding assistant context
+-- SETUP.md # Local development setup guide
+-- README.md
The application is deployed on HuggingFace Spaces — no installation required:
➡️ 0xbatuhan4-healthwithsevgi.hf.space
Pull and run the pre-built container image from GitHub Container Registry:
docker run -p 7860:7860 ghcr.io/eudalabs/healthwithsevgi:latestOpen http://localhost:7860 — that's it.
Alternatively, build from source:
git clone https://github.com/EudaLabs/HealthWithSevgi.git
cd HealthWithSevgi
docker build -t healthwithsevgi .
docker run -p 7860:7860 healthwithsevgigit clone https://github.com/EudaLabs/HealthWithSevgi.git
cd HealthWithSevgi
docker compose up -ddocker-compose.yml pulls the pre-built ghcr.io/eudalabs/healthwithsevgi:latest image when available and falls back to a local multi-stage build (Node → Vite → Python). Either way, the full stack — React SPA and FastAPI — is served from a single container on http://localhost:7860.
Measured startup (pre-built image, warm Docker daemon): ~8 seconds from docker compose up -d to HTTP 200 on /api/specialties — well inside the Sprint 5 30-second target (see docs/reports/Sprint5_Docker_Running.png).
First-time local build: ~3–6 minutes (installs pnpm + pip dependencies). Force a rebuild with docker compose up --build.
Container name is healthwithsevgi; the compose file also wires a healthcheck that probes /api/specialties every 10s.
To stop: docker compose down.
| Tool | Version | Required For |
|---|---|---|
| Python | >= 3.10 | Backend |
| Node.js | >= 18 | Frontend |
| Git | latest | Version control |
Backend:
cd backend
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # macOS / Linux
# venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Start the API server
uvicorn app.main:app --reload --port 8001API docs available at: http://localhost:8001/docs (Swagger UI)
Frontend (in a separate terminal):
cd frontend
# Install dependencies
pnpm install
# Start the dev server
pnpm devApp available at: http://localhost:5173 (proxies /api requests to port 8001)
Create a .env file in the project root:
# Backend
BACKEND_PORT=8001
DEBUG=true
# Frontend (Vite uses VITE_ prefix)
VITE_API_URL=http://localhost:8001All endpoints are prefixed with /api. Full interactive documentation is available at /docs when the backend is running.
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/specialties |
List all 20 specialties |
GET |
/api/specialties/{id} |
Get specialty details (description, features, clinical context) |
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/explore |
Upload CSV or load built-in dataset; returns column stats + class distribution |
POST |
/api/prepare |
Preprocess data (split, normalize, SMOTE); returns session_id |
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/train |
Train a model; returns model_id + evaluation metrics |
POST |
/api/compare/{model_id} |
Add model to comparison table |
GET |
/api/compare/{session_id} |
Get all compared models for a session |
DELETE |
/api/compare/{session_id} |
Clear comparison table |
GET |
/api/models/{model_id} |
Get model metadata |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/explain/global/{model_id} |
Global feature importance (top 10 features + clinical names) |
GET |
/api/explain/patient/{model_id}/{index} |
Single-patient SHAP waterfall explanation |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/ethics/{model_id} |
Subgroup fairness audit + bias warnings + checklist |
POST |
/api/ethics/checklist |
Update EU AI Act checklist item |
POST |
/api/generate-certificate |
Generate and download PDF compliance certificate |
Full endpoint reference (request/response schemas, error codes, typical flow) lives on the wiki: API.
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Status check ({status: "ok"}) |
GET |
/health |
Health probe ({status: "healthy"}) |
The project includes a comprehensive pytest suite covering all 7 steps of the pipeline — 191 tests across 7 test files.
cd backend
# Run all tests
pytest -v
# Run a specific test file
pytest -v tests/test_step1_clinical_context.py
# Run only slow tests (domain context validation)
pytest -v -m slowTest coverage:
| Test File | Covers | Key Assertions |
|---|---|---|
test_step1_clinical_context.py |
Specialty registry | All 20 specialties present, required fields non-empty, clinical context > 50 chars, 404 handling |
test_step2_data_exploration.py |
Data exploration | CSV upload validation, missing value detection, class distribution, imbalance warnings |
test_step3_data_preparation.py |
Preprocessing | Missing strategies (median/mode/drop), normalization, train/test split, SMOTE, data leakage prevention |
test_step4_arena_latency.py |
Model Arena | Training latency, cross-model comparison, session consistency |
test_step6_explainability.py |
SHAP explanations | Global importance, patient explanation, What-If analysis, sample patient selection |
test_step7_ethics.py |
Fairness audit | Ethics endpoint, case study severity, checklist toggle, bias detection thresholds |
test_certificate.py |
PDF generation | Certificate content type, PDF magic bytes, checklist state persistence |
Total: 191 tests — all passing.
The production deployment runs on HuggingFace Spaces as a Docker container. The multi-stage Dockerfile:
- Stage 1 — Builds the React frontend with pnpm
- Stage 2 — Installs Python dependencies
- Stage 3 — Combines both into a slim Python 3.12 runtime serving the SPA + API on port 7860
hf-space/main_hf.py serves both the FastAPI backend and the static React build from a single process.
Live demo: 0xbatuhan4-healthwithsevgi.hf.space
| Branch | Purpose |
|---|---|
main |
Production-ready, protected |
develop |
Integration branch for sprint work |
feature/US-XXX |
One branch per user story |
Rules:
- All changes go through Pull Requests (use the PR template)
- PRs require at least 1 approval
mainanddevelopare protected — no direct pushes- PR titles follow:
feat/fix/docs(US-XXX): description
| Role | Name | Student ID |
|---|---|---|
| Product Owner + Developer | Efe Çelik | 202128016 |
| UX Designer | Burak Aydoğmuş | 202128028 |
| Lead Developer + Scrum Master | Batuhan Bayazıt | 202228008 |
| Developer | Berat Mert Gökkaya | 202228019 |
| QA / Documentation Lead | Berfin Duru Alkan | 202228005 |
- Live Demo: 0xbatuhan4-healthwithsevgi.hf.space
- Jira Board: Jira
- Figma Designs: Figma
- GitHub Wiki: Wiki
- API Docs:
http://localhost:8001/docs(when running locally)
Released under the MIT License — you are free to use, copy, modify, and distribute this software with attribution.
Developed as part of the SENG 430 Software Quality Assurance course at Cankaya University by the EudaLabs team.