Skip to content
Batuhan4 edited this page Apr 20, 2026 · 2 revisions

API Reference

REST API for the HealthWithSevgi ML Visualization Tool. All endpoints are served by FastAPI at http://localhost:8001 (dev) or http://localhost:7860 (HuggingFace Spaces / Docker).

  • Base URL: /api (all endpoints below are prefixed)
  • Content type: application/json unless noted
  • Live OpenAPI schema: GET /openapi.json
  • Interactive Swagger UI: GET /docs
  • ReDoc: GET /redoc
  • API version: 1.3.1

Session model: all state (datasets, trained models, SHAP values) is kept in-memory on the backend with LRU eviction. No database. Call /prepare first to receive a session_id, then pass it to /train, which returns a model_id used by explain/ethics/certificate endpoints.

Table of Contents


Health & Root

Method Path Response
GET / { "status": "ok", "project": "HealthWithSevgi", "version": "1.3.1" }
GET /health { "status": "healthy" }

These endpoints are not prefixed with /api.


Specialties

Registry of the 20 supported medical specialties.

Method Path Response schema Description
GET /api/specialties list[SpecialtyInfo] List every registered specialty
GET /api/specialties/{specialty_id} SpecialtyInfo Fetch a specialty by id (endocrinology_diabetes, cardiology_heart_failure, …)

SpecialtyInfo includes: id, name, clinical_context, target_variable, data_source, what_ai_predicts, feature metadata.

Errors: 404 if specialty_id is unknown.


Data — Explore & Prepare

Both endpoints accept multipart/form-data. file is optional; if omitted, the built-in dataset for specialty_id is loaded.

POST /api/explore

Validates the dataset and returns column-level stats used by Step 2 (Data Exploration).

Form field Type Required Default
specialty_id string yes
target_col string yes
file CSV (≤ 50 MB, ≥ 10 rows, ≥ 2 columns) no uses built-in dataset

Response: DataExplorationResponse — per-column types, null counts, class balance, summary stats, sample rows.

Errors:

  • 422 — non-CSV extension, parse failure, fewer than 10 rows / 2 columns, or unknown target_col
  • 413 — file exceeds 50 MB

POST /api/prepare

Applies the Step 3 preprocessing pipeline (split / missing / normalize / SMOTE / outliers) and returns a session_id for use by /train.

Form field Type Default
specialty_id string
target_col string
test_size float (0.1–0.4) 0.2
missing_strategy median | mode | drop median
normalization zscore | minmax | none zscore
use_smote bool false
outlier_handling none | iqr | zscore_clip none
session_id string (reuses existing session if provided) auto-generated UUID
file CSV (same limits as /explore) built-in dataset

Response: PrepResponsesession_id, train_size, test_size, features_count, class_distribution_before, class_distribution_after, smote_applied, normalization_applied, norm_samples (before/after values for a few features).


ML — Train & Compare

POST /api/train

Trains one of eight models on the prepared session.

Request body (TrainRequest):

{
  "session_id": "uuid-from-/prepare",
  "model_type": "knn",
  "params": { "n_neighbors": 5, "metric": "euclidean" },
  "tune": false,
  "use_feature_selection": false
}

model_type enum: knn, svm, decision_tree, random_forest, logistic_regression, naive_bayes, xgboost, lightgbm.

Parameter schemas per model:

Model Params
knn n_neighbors (1–25), metric (euclidean/manhattan)
svm kernel (linear/rbf/poly/sigmoid), C (0.01–100)
decision_tree max_depth (1–20), criterion (gini/entropy)
random_forest n_estimators (10–500), max_depth (1–20)
logistic_regression C (0.001–100), max_iter (50–2000)
naive_bayes var_smoothing (1e-12–1e-3)
xgboost n_estimators (10–500), max_depth (1–15), learning_rate (0.01–0.5)
lightgbm n_estimators (10–500), max_depth (-1–15), learning_rate (0.01–0.5)

Response: TrainResponsemodel_id, metrics (accuracy, sensitivity, specificity, precision, F1, AUC-ROC, MCC), confusion matrix, ROC/PR curves, feature names, training time.

Errors: 404 if session_id unknown · 422 on training failure.

Compare

Method Path Description
POST /api/compare/{model_id} Add a trained model to the comparison list
GET /api/compare/{session_id} Get the current comparison list (sorted by AUC-ROC)
DELETE /api/compare/{session_id} Clear the comparison list (returns 204)
GET /api/models/{model_id} Get minimal model metadata (model_type, params, feature_names, classes)

Explainability

GET /api/explain/global/{model_id}GlobalExplainabilityResponse

SHAP-based global feature importance (descending), clinical names, top-feature clinical note, and cumulative explained variance for Step 6.

GET /api/explain/patient/{model_id}/{patient_index}SinglePatientExplainResponse

SHAP waterfall for a single patient — base value, predicted class/probability, and per-feature shap_value with plain-language narration. patient_index must be within [0, len(X_test)-1].

GET /api/explain/sample-patients/{model_id}SamplePatientsResponse

Returns three representative patients from the test set — low-risk (min predicted probability), mid-risk (closest to 0.5), and high-risk (max probability). Each entry carries index, risk_level, probability, and a one-line summary used in the Step 6 patient dropdown.

POST /api/explain/what-ifWhatIfResponse

Recomputes predicted probability when a single feature is overridden.

{
  "model_id": "",
  "patient_index": 12,
  "feature_name": "serum_creatinine",
  "new_value": 1.4
}

Errors: 400 if patient_index out of range or feature_name not in the trained feature list.


Ethics & Bias

GET /api/ethics/{model_id}EthicsResponse

Subgroup fairness table (by gender + age bands), bias warnings (sensitivity gap > 10pp), representation warnings (demographic gap > 15pp), overall sensitivity, and EU AI Act checklist state.

POST /api/ethics/checklist

Toggles one of the eight EU AI Act checklist items for a given model.

{ "model_id": "", "item_id": "model_explainability", "checked": true }

Insights (LLM)

GET /api/insights/{model_id}

Calls the InsightService (MedGemma / Gemini) with a fully-assembled clinical context (specialty, metrics, SHAP, fairness data, sample patients) and returns three parallel outputs:

{
  "ethics_insight":    "...",
  "case_studies":      [ ... ],
  "eu_ai_act_insights": [ ... ]
}

Errors: 422 if metrics are not available (model never trained) · 500 on LLM failure.


Certificate (PDF)

POST /api/generate-certificate

Returns a ReportLab-rendered PDF (application/pdf, Content-Disposition: attachment) with the active domain, model, six core metrics, bias findings, and checklist state.

Request body (CertificateRequest):

{
  "model_id": "",
  "session_id": "",
  "checklist_state": { "model_explainability": true, "data_transparency": true },
  "clinician_name": "Healthcare Professional",
  "institution": "Healthcare Institution"
}

clinician_name and institution are optional (defaults shown). Typical generation time: < 1 s (measured 0.69 s in Sprint 4 QA).


Error Format

FastAPI HTTPException responses share the same JSON shape:

{ "detail": "Target column 'age' not found. Available: [\"glucose\", \"bmi\", ...]" }
Status Meaning
400 Malformed request (bad patient index, missing feature name)
404 Unknown specialty_id, session_id, or model_id
413 Uploaded CSV exceeds 50 MB
422 Dataset validation failure, training failure, unknown target column
500 Unhandled server-side error (explainability, insights, certificate generation)

Typical End-to-End Flow

POST /api/explore        → validate + stats      (Step 2)
POST /api/prepare        → session_id            (Step 3)
POST /api/train          → model_id              (Step 4)
GET  /api/explain/global/{model_id}              (Step 6)
GET  /api/explain/sample-patients/{model_id}     (Step 6)
GET  /api/explain/patient/{model_id}/{idx}       (Step 6 waterfall)
POST /api/explain/what-if                        (Step 6 what-if)
GET  /api/ethics/{model_id}                      (Step 7)
POST /api/ethics/checklist                       (Step 7 checklist toggle)
POST /api/generate-certificate                   (Step 7 download)

See Architecture for the layered view and service map.

Clone this wiki locally