-
Notifications
You must be signed in to change notification settings - Fork 0
API
REST API for the HealthWithSevgi ML Visualization Tool. All endpoints are served by FastAPI at http://localhost:8001 (dev) or http://localhost:7860 (HuggingFace Spaces / Docker).
-
Base URL:
/api(all endpoints below are prefixed) -
Content type:
application/jsonunless noted -
Live OpenAPI schema:
GET /openapi.json -
Interactive Swagger UI:
GET /docs -
ReDoc:
GET /redoc - API version: 1.3.1
Session model: all state (datasets, trained models, SHAP values) is kept in-memory on the backend with LRU eviction. No database. Call
/preparefirst to receive asession_id, then pass it to/train, which returns amodel_idused by explain/ethics/certificate endpoints.
- Health & Root
- Specialties
- Data — Explore & Prepare
- ML — Train & Compare
- Explainability
- Ethics & Bias
- Insights (LLM)
- Certificate (PDF)
- Error Format
| Method | Path | Response |
|---|---|---|
GET |
/ |
{ "status": "ok", "project": "HealthWithSevgi", "version": "1.3.1" } |
GET |
/health |
{ "status": "healthy" } |
These endpoints are not prefixed with /api.
Registry of the 20 supported medical specialties.
| Method | Path | Response schema | Description |
|---|---|---|---|
GET |
/api/specialties |
list[SpecialtyInfo] |
List every registered specialty |
GET |
/api/specialties/{specialty_id} |
SpecialtyInfo |
Fetch a specialty by id (endocrinology_diabetes, cardiology_heart_failure, …) |
SpecialtyInfo includes: id, name, clinical_context, target_variable, data_source, what_ai_predicts, feature metadata.
Errors: 404 if specialty_id is unknown.
Both endpoints accept multipart/form-data. file is optional; if omitted, the built-in dataset for specialty_id is loaded.
Validates the dataset and returns column-level stats used by Step 2 (Data Exploration).
| Form field | Type | Required | Default |
|---|---|---|---|
specialty_id |
string | yes | |
target_col |
string | yes | |
file |
CSV (≤ 50 MB, ≥ 10 rows, ≥ 2 columns) | no | uses built-in dataset |
Response: DataExplorationResponse — per-column types, null counts, class balance, summary stats, sample rows.
Errors:
-
422— non-CSV extension, parse failure, fewer than 10 rows / 2 columns, or unknowntarget_col -
413— file exceeds 50 MB
Applies the Step 3 preprocessing pipeline (split / missing / normalize / SMOTE / outliers) and returns a session_id for use by /train.
| Form field | Type | Default |
|---|---|---|
specialty_id |
string | — |
target_col |
string | — |
test_size |
float (0.1–0.4) | 0.2 |
missing_strategy |
median | mode | drop
|
median |
normalization |
zscore | minmax | none
|
zscore |
use_smote |
bool | false |
outlier_handling |
none | iqr | zscore_clip
|
none |
session_id |
string (reuses existing session if provided) | auto-generated UUID |
file |
CSV (same limits as /explore) |
built-in dataset |
Response: PrepResponse — session_id, train_size, test_size, features_count, class_distribution_before, class_distribution_after, smote_applied, normalization_applied, norm_samples (before/after values for a few features).
Trains one of eight models on the prepared session.
Request body (TrainRequest):
{
"session_id": "uuid-from-/prepare",
"model_type": "knn",
"params": { "n_neighbors": 5, "metric": "euclidean" },
"tune": false,
"use_feature_selection": false
}model_type enum: knn, svm, decision_tree, random_forest, logistic_regression, naive_bayes, xgboost, lightgbm.
Parameter schemas per model:
| Model | Params |
|---|---|
knn |
n_neighbors (1–25), metric (euclidean/manhattan) |
svm |
kernel (linear/rbf/poly/sigmoid), C (0.01–100) |
decision_tree |
max_depth (1–20), criterion (gini/entropy) |
random_forest |
n_estimators (10–500), max_depth (1–20) |
logistic_regression |
C (0.001–100), max_iter (50–2000) |
naive_bayes |
var_smoothing (1e-12–1e-3) |
xgboost |
n_estimators (10–500), max_depth (1–15), learning_rate (0.01–0.5) |
lightgbm |
n_estimators (10–500), max_depth (-1–15), learning_rate (0.01–0.5) |
Response: TrainResponse — model_id, metrics (accuracy, sensitivity, specificity, precision, F1, AUC-ROC, MCC), confusion matrix, ROC/PR curves, feature names, training time.
Errors: 404 if session_id unknown · 422 on training failure.
| Method | Path | Description |
|---|---|---|
POST |
/api/compare/{model_id} |
Add a trained model to the comparison list |
GET |
/api/compare/{session_id} |
Get the current comparison list (sorted by AUC-ROC) |
DELETE |
/api/compare/{session_id} |
Clear the comparison list (returns 204) |
GET |
/api/models/{model_id} |
Get minimal model metadata (model_type, params, feature_names, classes) |
SHAP-based global feature importance (descending), clinical names, top-feature clinical note, and cumulative explained variance for Step 6.
SHAP waterfall for a single patient — base value, predicted class/probability, and per-feature shap_value with plain-language narration. patient_index must be within [0, len(X_test)-1].
Returns three representative patients from the test set — low-risk (min predicted probability), mid-risk (closest to 0.5), and high-risk (max probability). Each entry carries index, risk_level, probability, and a one-line summary used in the Step 6 patient dropdown.
Recomputes predicted probability when a single feature is overridden.
{
"model_id": "…",
"patient_index": 12,
"feature_name": "serum_creatinine",
"new_value": 1.4
}Errors: 400 if patient_index out of range or feature_name not in the trained feature list.
Subgroup fairness table (by gender + age bands), bias warnings (sensitivity gap > 10pp), representation warnings (demographic gap > 15pp), overall sensitivity, and EU AI Act checklist state.
Toggles one of the eight EU AI Act checklist items for a given model.
{ "model_id": "…", "item_id": "model_explainability", "checked": true }Calls the InsightService (MedGemma / Gemini) with a fully-assembled clinical context (specialty, metrics, SHAP, fairness data, sample patients) and returns three parallel outputs:
{
"ethics_insight": "...",
"case_studies": [ ... ],
"eu_ai_act_insights": [ ... ]
}Errors: 422 if metrics are not available (model never trained) · 500 on LLM failure.
Returns a ReportLab-rendered PDF (application/pdf, Content-Disposition: attachment) with the active domain, model, six core metrics, bias findings, and checklist state.
Request body (CertificateRequest):
{
"model_id": "…",
"session_id": "…",
"checklist_state": { "model_explainability": true, "data_transparency": true },
"clinician_name": "Healthcare Professional",
"institution": "Healthcare Institution"
}clinician_name and institution are optional (defaults shown). Typical generation time: < 1 s (measured 0.69 s in Sprint 4 QA).
FastAPI HTTPException responses share the same JSON shape:
{ "detail": "Target column 'age' not found. Available: [\"glucose\", \"bmi\", ...]" }| Status | Meaning |
|---|---|
400 |
Malformed request (bad patient index, missing feature name) |
404 |
Unknown specialty_id, session_id, or model_id
|
413 |
Uploaded CSV exceeds 50 MB |
422 |
Dataset validation failure, training failure, unknown target column |
500 |
Unhandled server-side error (explainability, insights, certificate generation) |
POST /api/explore → validate + stats (Step 2)
POST /api/prepare → session_id (Step 3)
POST /api/train → model_id (Step 4)
GET /api/explain/global/{model_id} (Step 6)
GET /api/explain/sample-patients/{model_id} (Step 6)
GET /api/explain/patient/{model_id}/{idx} (Step 6 waterfall)
POST /api/explain/what-if (Step 6 what-if)
GET /api/ethics/{model_id} (Step 7)
POST /api/ethics/checklist (Step 7 checklist toggle)
POST /api/generate-certificate (Step 7 download)
See Architecture for the layered view and service map.