HealthWithSevgi

An interactive, browser-based machine learning education tool for healthcare professionals.

SENG 430 - Software Quality Assurance Cankaya University - Spring 2025-2026 Instructor: Dr. Sevgi Koyuncu Tunç

HealthWithSevgi guides clinicians through a complete ML pipeline in 7 steps — from selecting a medical specialty to training a model, interpreting predictions with SHAP, and auditing fairness — all with zero coding required.

Live Demo | Jira Board | Figma Designs | Setup Guide

Overview

Healthcare professionals increasingly encounter AI/ML in clinical settings but rarely get hands-on experience with how these systems work. HealthWithSevgi bridges that gap by providing an intuitive, wizard-style interface that walks users through every stage of the machine learning lifecycle using real clinical datasets.

Key capabilities:

20 medical specialties with real-world clinical datasets (Cardiology, Oncology, Nephrology, Neurology, ICU/Sepsis, Dermatology, and more)
8 ML classifiers with interactive hyperparameter tuning via sliders
SHAP-based explainability — global feature importance and single-patient waterfall explanations
Fairness auditing — subgroup performance analysis across demographics with bias detection
EU AI Act compliance checklist with downloadable PDF certificate
No server-side data storage — all session data is held in-memory and evicted automatically

The 7-Step Pipeline

Step	Name	What Happens
1	Clinical Context	Introduces the medical problem the AI will address. Displays the clinical question, why it matters, and the 7-step roadmap.
2	Data Exploration	Upload a CSV file (up to 50 MB) or load a built-in clinical dataset. Inspect column statistics, missing values, and class distribution. Confirm the target variable.
3	Data Preparation	Configure preprocessing: train/test split ratio, missing value strategy (median/mode/drop), normalization (z-score/min-max), SMOTE for class imbalance, and outlier handling (IQR/z-score clipping).
4	Model & Parameters	Choose from 8 ML models. Adjust hyperparameters with intuitive sliders. Optionally enable hyperparameter tuning (RandomizedSearchCV) and feature selection (VarianceThreshold + SelectKBest).
5	Results & Evaluation	View accuracy, sensitivity, specificity, precision, F1, AUC-ROC, and MCC. Explore interactive ROC curves, precision-recall curves, and confusion matrices. Detect overfitting via cross-validation comparison.
6	Explainability	Global feature importance ranking with clinical name mapping. Single-patient SHAP waterfall charts with plain-language summaries (e.g., "High glucose increases diabetes risk by 0.23").
7	Ethics & Bias	Subgroup fairness audit (by age, gender, ethnicity). Bias warnings for performance gaps >10%. EU AI Act compliance checklist. Real-world case studies of AI bias in healthcare. Downloadable PDF compliance certificate.

Supported Specialties

#	Specialty	Prediction Task	Dataset	Samples
1	Cardiology	30-day heart failure mortality	Heart Failure Clinical Records	~300
2	Radiology	Pneumonia detection (chest X-ray metadata)	NIH Chest X-ray	100K+
3	Nephrology	Chronic kidney disease detection	UCI CKD	400
4	Oncology - Breast	Malignant vs. benign biopsy	Wisconsin Breast Cancer	569
5	Neurology - Parkinson's	Parkinson's from voice biomarkers	UCI Parkinson's	195
6	Endocrinology - Diabetes	Diabetes onset within 5 years	Pima Indians	768
7	Hepatology - Liver	Liver disease detection	Indian Liver Patient	583
8	Cardiology - Stroke	Stroke risk prediction	Kaggle Stroke Prediction	5,110
9	Mental Health	Depression severity (PHQ-9)	Kaggle Depression	~1,000
10	Pulmonology - COPD	COPD exacerbation risk	PhysioNet + Kaggle	~1,000
11	Haematology - Anaemia	Anaemia type classification	Kaggle Anaemia	~400
12	Dermatology	Benign vs. malignant skin lesion	HAM10000 metadata	~10K
13	Ophthalmology	Diabetic retinopathy detection	UCI Diabetic Retinopathy	1,151
14	Orthopaedics - Spine	Disc herniation / spondylolisthesis	UCI Vertebral Column	310
15	ICU / Sepsis	Sepsis onset within 6 hours	PhysioNet Sepsis	~40K
16	Obstetrics - Fetal Health	Fetal health classification (CTG)	UCI Fetal Health	2,126
17	Cardiology - Arrhythmia	Arrhythmia detection (ECG)	UCI Arrhythmia	452
18	Oncology - Cervical	Cervical cancer risk	UCI Cervical Cancer	858
19	Thyroid / Endocrinology	Thyroid function classification	UCI Thyroid	9,172
20	Pharmacy - Readmission	Hospital readmission risk	UCI Diabetes 130-US	101,766

ML Models

Model	Category	Key Hyperparameters
K-Nearest Neighbors	Instance-based	k (1-25), distance metric
Support Vector Machine	Boundary-based	C (0.01-100), kernel (linear/rbf/poly)
Decision Tree	Tree-based	max_depth (1-20), criterion (gini/entropy)
Random Forest	Ensemble	n_estimators (10-500), max_depth
Logistic Regression	Linear	C (0.001-100), solver (lbfgs/saga)
Naive Bayes	Probabilistic	var_smoothing (1e-12 to 1e-3)
XGBoost	Gradient Boosting	n_estimators, max_depth, learning_rate
LightGBM	Gradient Boosting	n_estimators, max_depth, learning_rate

All models are trained with balanced class weights where supported. Optional hyperparameter tuning uses RandomizedSearchCV (20 iterations, 3-fold CV). Feature selection combines VarianceThreshold with SelectKBest (mutual information).

Tech Stack

Layer	Technology	Purpose
Frontend	React 18, TypeScript, Vite	Single-page wizard application
UI Components	Recharts, Lucide Icons, react-dropzone	Charts, icons, file uploads
State Management	TanStack React Query	Server state caching and synchronization
Backend	FastAPI, Python 3.12	REST API with auto-generated OpenAPI docs
ML Engine	scikit-learn, XGBoost, LightGBM	Model training, evaluation, cross-validation
Explainability	SHAP	TreeExplainer (tree models), KernelExplainer (linear), permutation importance
Data Processing	pandas, numpy, imbalanced-learn	Data cleaning, normalization, SMOTE
PDF Generation	ReportLab	Compliance certificate export
Containerization	Docker (multi-stage)	Production deployment
Hosting	HuggingFace Spaces	Live demo environment
Package Manager	pnpm (frontend), pip (backend)	Dependency management

Architecture

📐 Full Architecture Diagrams (Google Drive) — C4 model diagrams (System Context, Container, Component, Code levels), toolchain diagrams, and data flow sequences.

                          +---------------------+
                          |   Browser (React)   |
                          |   Wizard UI (SPA)   |
                          +----------+----------+
                                     |
                            HTTP/REST (JSON)
                                     |
                          +----------v----------+
                          |   FastAPI Backend    |
                          +----------+----------+
                                     |
              +----------------------+----------------------+
              |              |              |                |
     +--------v---+  +------v-----+  +-----v------+  +-----v--------+
     | DataService|  | MLService  |  |ExplainSvc  |  | EthicsService|
     |            |  |            |  |            |  |              |
     | - Explore  |  | - Train    |  | - SHAP     |  | - Subgroup   |
     | - Prepare  |  | - Evaluate |  | - Waterfall|  | - Bias detect|
     | - SMOTE    |  | - Compare  |  | - Clinical |  | - EU AI Act  |
     +-----+------+  +------+-----+  +------+-----+  +------+-------+
           |                |                |                |
           v                v                v                v
     +-----------+   +------------+   +------------+   +-----------+
     | In-Memory |   | In-Memory  |   |   SHAP     |   | ReportLab |
     | Sessions  |   | Models     |   |  Library   |   |  PDF Gen  |
     | (LRU 50)  |   | (LRU 100+)|   |            |   |           |
     +-----------+   +------------+   +------------+   +-----------+

Data flow: Upload CSV -> Explore columns -> Preprocess (split, normalize, SMOTE) -> Train model -> Evaluate metrics -> SHAP explanations -> Fairness audit -> PDF certificate

Project Structure

HealthWithSevgi/
|
+-- frontend/                         # React 18 + Vite + TypeScript
|   +-- src/
|   |   +-- pages/                    # Step 1-7 wizard pages
|   |   |   +-- Step1ClinicalContext.tsx
|   |   |   +-- Step2DataExploration.tsx
|   |   |   +-- Step3DataPreparation.tsx
|   |   |   +-- Step4ModelParameters.tsx
|   |   |   +-- Step5Results.tsx
|   |   |   +-- Step6Explainability.tsx
|   |   |   +-- Step7Ethics.tsx
|   |   +-- components/               # Reusable UI components
|   |   |   +-- NavBar.tsx            # Specialty switcher, glossary
|   |   |   +-- WizardProgress.tsx    # Step progress tracker
|   |   |   +-- SpecialtySelector.tsx # 20-specialty grid
|   |   |   +-- ColumnMapperModal.tsx # Target column confirmation
|   |   |   +-- ErrorModal.tsx       # Error display modal
|   |   |   +-- charts/              # Visualization components
|   |   |       +-- ConfusionMatrixChart.tsx  # 2x2 confusion matrix
|   |   |       +-- KNNScatterCanvas.tsx     # KNN decision boundary
|   |   |       +-- PRCurveChart.tsx         # Precision-Recall curve
|   |   |       +-- ROCCurveChart.tsx        # ROC curve with AUC badge
|   |   +-- api/                      # API client layer
|   |   |   +-- client.ts            # Axios instance + interceptors
|   |   |   +-- specialties.ts       # Specialty endpoints
|   |   |   +-- data.ts              # Explore + Prepare endpoints
|   |   |   +-- ml.ts                # Train + Compare endpoints
|   |   |   +-- explain.ts           # Explainability + Ethics + Certificate
|   |   +-- types/index.ts           # Shared TypeScript interfaces
|   |   +-- styles/globals.css        # Global CSS + theme variables
|   |   +-- App.tsx                   # Main wizard state manager
|   |   +-- main.tsx                  # Application entry point
|   +-- package.json
|   +-- vite.config.ts
|
+-- backend/                          # FastAPI REST API + ML engine
|   +-- app/
|   |   +-- main.py                   # FastAPI setup, CORS, routers
|   |   +-- routers/
|   |   |   +-- data_router.py        # /specialties, /explore, /prepare
|   |   |   +-- ml_router.py          # /train, /compare, /models
|   |   |   +-- explain_router.py     # /explain/*, /ethics, /certificate
|   |   +-- services/
|   |   |   +-- data_service.py       # Dataset loading, exploration, preprocessing
|   |   |   +-- ml_service.py         # Model building, training, evaluation
|   |   |   +-- explain_service.py    # SHAP explanations, clinical mapping
|   |   |   +-- ethics_service.py     # Fairness audit, bias detection
|   |   |   +-- certificate_service.py # PDF certificate generation
|   |   |   +-- specialty_registry.py # 20 specialty definitions + datasets
|   |   +-- models/
|   |   |   +-- schemas.py            # Data exploration/preparation DTOs
|   |   |   +-- ml_schemas.py         # Training/evaluation DTOs
|   |   |   +-- explain_schemas.py    # Explainability/ethics DTOs
|   |   +-- utils/                    # Utility modules
|   +-- data_cache/                   # Cached clinical CSV datasets
|   +-- datasets/                     # Additional dataset storage
|   +-- tests/                        # pytest test suite (178 tests)
|   |   +-- conftest.py              # Shared fixtures
|   |   +-- test_step1_clinical_context.py
|   |   +-- test_step2_data_exploration.py
|   |   +-- test_step3_data_preparation.py
|   |   +-- test_step6_explainability.py
|   |   +-- test_step7_ethics.py
|   |   +-- test_certificate.py
|   +-- pytest.ini
|   +-- requirements.txt
|
+-- hf-space/                         # HuggingFace Spaces deployment
|   +-- main_hf.py                    # Combined API + SPA entrypoint
|   +-- Dockerfile                    # HF-specific Docker build
|   +-- README.md                     # HF Space metadata
|
+-- docs/                             # Documentation & design specs
|   +-- ML_Tool_User_Guide.md         # Course user manual
|   +-- Sprint_1_Assignment.md        # Sprint 1 requirements
|   +-- Clinical_Specialties_Dataset_Collection.pdf
|   +-- diagrams/                     # C4 architecture + toolchain PDFs
|   +-- drawio/                       # Editable draw.io source files
|   +-- mermaid/                      # C4 architecture (Mermaid source)
|   +-- iso42001/                     # ISO 42001 AI governance report
|   +-- seng430-sprints/              # Sprint requirements from instructor
|   +-- qa/                           # QA test reports (PDF)
|   +-- reports/                      # Progress reports + screenshots
|
+-- jira/                             # Jira backlog documentation
|   +-- JIRA.md                       # Product backlog report
|   +-- SPRINT_1_TASK_BOARD.md        # Sprint 1 task breakdown
|
+-- local/                            # Local-only extensions
|   +-- model-arena/                  # Model Arena comparison feature
|       +-- arena/                    # Backend (router, service, schemas)
|       +-- frontend/                 # Frontend (ArenaPage, charts, hooks)
|
+-- .github/
|   +-- pull_request_template.md      # PR template linked to Jira
|   +-- workflows/deploy-hf.yml      # Auto-deploy to HuggingFace on release
|
+-- Dockerfile                        # Multi-stage build (Node + Python)
+-- docker-compose.yml                # Local development orchestration
+-- .dockerignore
+-- .gitignore
+-- CLAUDE.md                         # AI coding assistant context
+-- SETUP.md                          # Local development setup guide
+-- README.md

Live Demo & Docker

🌐 Live Demo

The application is deployed on HuggingFace Spaces — no installation required:

➡️ 0xbatuhan4-healthwithsevgi.hf.space

🐳 Docker (single command)

Pull and run the pre-built container image from GitHub Container Registry:

docker run -p 7860:7860 ghcr.io/eudalabs/healthwithsevgi:latest

Open http://localhost:7860 — that's it.

Alternatively, build from source:

git clone https://github.com/EudaLabs/HealthWithSevgi.git
cd HealthWithSevgi
docker build -t healthwithsevgi .
docker run -p 7860:7860 healthwithsevgi

Docker Compose (one-command start)

git clone https://github.com/EudaLabs/HealthWithSevgi.git
cd HealthWithSevgi
docker compose up -d

docker-compose.yml pulls the pre-built ghcr.io/eudalabs/healthwithsevgi:latest image when available and falls back to a local multi-stage build (Node → Vite → Python). Either way, the full stack — React SPA and FastAPI — is served from a single container on http://localhost:7860.

Measured startup (pre-built image, warm Docker daemon): ~8 seconds from docker compose up -d to HTTP 200 on /api/specialties — well inside the Sprint 5 30-second target (see docs/reports/Sprint5_Docker_Running.png).

First-time local build: ~3–6 minutes (installs pnpm + pip dependencies). Force a rebuild with docker compose up --build.

Container name is healthwithsevgi; the compose file also wires a healthcheck that probes /api/specialties every 10s.

To stop: docker compose down.

Quick Start

Prerequisites (for local development)

Tool	Version	Required For
Python	>= 3.10	Backend
Node.js	>= 18	Frontend
Git	latest	Version control

Local Development

Backend:

cd backend

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate        # macOS / Linux
# venv\Scripts\activate         # Windows

# Install dependencies
pip install -r requirements.txt

# Start the API server
uvicorn app.main:app --reload --port 8001

API docs available at: http://localhost:8001/docs (Swagger UI)

Frontend (in a separate terminal):

cd frontend

# Install dependencies
pnpm install

# Start the dev server
pnpm dev

App available at: http://localhost:5173 (proxies /api requests to port 8001)

Environment Variables

Create a .env file in the project root:

# Backend
BACKEND_PORT=8001
DEBUG=true

# Frontend (Vite uses VITE_ prefix)
VITE_API_URL=http://localhost:8001

API Reference

All endpoints are prefixed with /api. Full interactive documentation is available at /docs when the backend is running.

Specialties

Method	Endpoint	Description
`GET`	`/api/specialties`	List all 20 specialties
`GET`	`/api/specialties/{id}`	Get specialty details (description, features, clinical context)

Data

Method	Endpoint	Description
`POST`	`/api/explore`	Upload CSV or load built-in dataset; returns column stats + class distribution
`POST`	`/api/prepare`	Preprocess data (split, normalize, SMOTE); returns `session_id`

ML Training

Method	Endpoint	Description
`POST`	`/api/train`	Train a model; returns `model_id` + evaluation metrics
`POST`	`/api/compare/{model_id}`	Add model to comparison table
`GET`	`/api/compare/{session_id}`	Get all compared models for a session
`DELETE`	`/api/compare/{session_id}`	Clear comparison table
`GET`	`/api/models/{model_id}`	Get model metadata

Explainability

Method	Endpoint	Description
`GET`	`/api/explain/global/{model_id}`	Global feature importance (top 10 features + clinical names)
`GET`	`/api/explain/patient/{model_id}/{index}`	Single-patient SHAP waterfall explanation

Ethics & Certificate

Method	Endpoint	Description
`GET`	`/api/ethics/{model_id}`	Subgroup fairness audit + bias warnings + checklist
`POST`	`/api/ethics/checklist`	Update EU AI Act checklist item
`POST`	`/api/generate-certificate`	Generate and download PDF compliance certificate

Full endpoint reference (request/response schemas, error codes, typical flow) lives on the wiki: API.

Health

Method	Endpoint	Description
`GET`	`/`	Status check (`{status: "ok"}`)
`GET`	`/health`	Health probe (`{status: "healthy"}`)

Testing

The project includes a comprehensive pytest suite covering all 7 steps of the pipeline — 191 tests across 7 test files.

cd backend

# Run all tests
pytest -v

# Run a specific test file
pytest -v tests/test_step1_clinical_context.py

# Run only slow tests (domain context validation)
pytest -v -m slow

Test coverage:

Test File	Covers	Key Assertions
`test_step1_clinical_context.py`	Specialty registry	All 20 specialties present, required fields non-empty, clinical context > 50 chars, 404 handling
`test_step2_data_exploration.py`	Data exploration	CSV upload validation, missing value detection, class distribution, imbalance warnings
`test_step3_data_preparation.py`	Preprocessing	Missing strategies (median/mode/drop), normalization, train/test split, SMOTE, data leakage prevention
`test_step4_arena_latency.py`	Model Arena	Training latency, cross-model comparison, session consistency
`test_step6_explainability.py`	SHAP explanations	Global importance, patient explanation, What-If analysis, sample patient selection
`test_step7_ethics.py`	Fairness audit	Ethics endpoint, case study severity, checklist toggle, bias detection thresholds
`test_certificate.py`	PDF generation	Certificate content type, PDF magic bytes, checklist state persistence

Total: 191 tests — all passing.

Deployment

HuggingFace Spaces

The production deployment runs on HuggingFace Spaces as a Docker container. The multi-stage Dockerfile:

Stage 1 — Builds the React frontend with pnpm
Stage 2 — Installs Python dependencies
Stage 3 — Combines both into a slim Python 3.12 runtime serving the SPA + API on port 7860

hf-space/main_hf.py serves both the FastAPI backend and the static React build from a single process.

Live demo: 0xbatuhan4-healthwithsevgi.hf.space

Branch Strategy

Branch	Purpose
`main`	Production-ready, protected
`develop`	Integration branch for sprint work
`feature/US-XXX`	One branch per user story

Rules:

All changes go through Pull Requests (use the PR template)
PRs require at least 1 approval
main and develop are protected — no direct pushes
PR titles follow: feat/fix/docs(US-XXX): description

Team

Role	Name	Student ID
Product Owner + Developer	Efe Çelik	202128016
UX Designer	Burak Aydoğmuş	202128028
Lead Developer + Scrum Master	Batuhan Bayazıt	202228008
Developer	Berat Mert Gökkaya	202228019
QA / Documentation Lead	Berfin Duru Alkan	202228005

Links

Live Demo: 0xbatuhan4-healthwithsevgi.hf.space
Jira Board: Jira
Figma Designs: Figma
GitHub Wiki: Wiki
API Docs: http://localhost:8001/docs (when running locally)

License

Released under the MIT License — you are free to use, copy, modify, and distribute this software with attribution.

Developed as part of the SENG 430 Software Quality Assurance course at Cankaya University by the EudaLabs team.

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
.github		.github
backend		backend
docs		docs
frontend		frontend
hf-space		hf-space
jira		jira
local/model-arena		local/model-arena
qa_screenshots		qa_screenshots
sprint4_submission		sprint4_submission
.dockerignore		.dockerignore
.gitignore		.gitignore
ATTRIBUTION.md		ATTRIBUTION.md
CLAUDE.md		CLAUDE.md
DATA_LICENSES.md		DATA_LICENSES.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SETUP.md		SETUP.md
Sprint_4_QA_Final_Test_Cases_Completed.docx		Sprint_4_QA_Final_Test_Cases_Completed.docx
Sprint_4_QA_Final_Test_Cases_VERIFIED.docx		Sprint_4_QA_Final_Test_Cases_VERIFIED.docx
build_qa_docx.py		build_qa_docx.py
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

HealthWithSevgi

Table of Contents

Overview

The 7-Step Pipeline

Supported Specialties

ML Models

Tech Stack

Architecture

Project Structure

Live Demo & Docker

🌐 Live Demo

🐳 Docker (single command)

Docker Compose (one-command start)

Quick Start

Prerequisites (for local development)

Local Development

Environment Variables

API Reference

Specialties

Data

ML Training

Explainability

Ethics & Certificate

Health

Testing

Deployment

HuggingFace Spaces

Branch Strategy

Team

Links

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 20

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages