This tutorial demonstrates how to deploy MONAI model bundles as production-ready REST APIs using FastAPI.
Learn how to:
- Load and serve MONAI model bundles
- Create FastAPI endpoints for medical image inference
- Handle medical image uploads (NIfTI format)
- Deploy with Docker for production
- Test and monitor your deployed model
A complete REST API service that:
- β Loads a pre-trained MONAI model (spleen CT segmentation)
- β Accepts medical image uploads via HTTP
- β Returns inference results in JSON format
- β Includes auto-generated API documentation
- β Runs in Docker containers for easy deployment
- Python 3.9+ installed
- Docker installed (for containerization)
- Basic knowledge of Python and REST APIs
- Familiarity with medical imaging (helpful but not required)
pip install -r requirements.txt# From the fastapi_inference directory
python -m uvicorn app.main:app --reloadThe API will be available at http://localhost:8000
Health Check:
curl http://localhost:8000/healthView API Documentation:
Open http://localhost:8000/docs in your browser
Make a Prediction:
curl -X POST http://localhost:8000/predict \
-F "file=@path/to/your/image.nii.gz"fastapi_inference/
βββ README.md # This file
βββ requirements.txt # Python dependencies
βββ app/ # FastAPI application
β βββ __init__.py
β βββ main.py # FastAPI app and routes
β βββ model_loader.py # MONAI model loading (singleton)
β βββ inference.py # Inference logic
β βββ schemas.py # Pydantic models for validation
βββ tests/ # Unit tests
β βββ __init__.py
β βββ test_api.py # API endpoint tests
βββ docker/ # Docker configuration
β βββ Dockerfile # Container definition
β βββ docker-compose.yml # Orchestration
βββ notebooks/ # Interactive tutorials
β βββ fastapi_tutorial.ipynb # Step-by-step walkthrough
βββ examples/ # Usage examples
βββ client.py # Python client example
βββ sample_requests.http # HTTP request examples
Returns API information
Health check endpoint
- Returns service status
- Indicates if model is loaded
- Shows computation device (CPU/GPU)
Example Response:
{
"status": "healthy",
"model_loaded": true,
"device": "cuda"
}Run inference on uploaded medical image
Request:
- Method: POST
- Content-Type: multipart/form-data
- Body: file (NIfTI format: .nii or .nii.gz)
Response:
{
"success": true,
"prediction": {
"shape": [1, 2, 96, 96, 96],
"min_value": 0.0,
"max_value": 1.0,
"unique_labels": [0, 1],
"num_labels": 2
},
"segmentation_shape": [1, 2, 96, 96, 96],
"metadata": {
"image_shape": [1, 1, 96, 96, 96],
"processing_time": 2.345,
"device": "cuda"
},
"message": "Inference completed successfully in 2.345s"
}Interactive API documentation (Swagger UI)
Alternative API documentation (ReDoc)
# Build the image
docker build -t monai-fastapi -f docker/Dockerfile .
# Run the container
docker run -p 8000:8000 monai-fastapi# Start the service
docker-compose -f docker/docker-compose.yml up -d
# View logs
docker-compose -f docker/docker-compose.yml logs -f
# Stop the service
docker-compose -f docker/docker-compose.yml downfrom examples.client import MONAIClient
# Initialize client
client = MONAIClient(base_url="http://localhost:8000")
# Check health
health = client.health_check()
print(health)
# Make prediction
result = client.predict("path/to/image.nii.gz")
print(result)# Check health
python examples/client.py --health
# Run prediction
python examples/client.py --image path/to/image.nii.gz# Health check
curl http://localhost:8000/health
# Prediction
curl -X POST http://localhost:8000/predict \
-F "file=@tests/sample_image.nii.gz"# Install test dependencies
pip install pytest pytest-asyncio httpx
# Run all tests
pytest tests/
# Run with coverage
pytest tests/ --cov=app --cov-report=htmlDefault Model: spleen_ct_segmentation
This tutorial uses MONAI's spleen CT segmentation bundle, which:
- Segments spleen from CT scans
- Pre-trained on Medical Segmentation Decathlon dataset
- Fast inference (~2-3 seconds on GPU)
- Good starting point for learning deployment
To use a different model:
Edit app/main.py and change the model name in the lifespan function:
model_loader.load_model(
model_name="your_model_name", # Change this
bundle_dir="./models"
)Create a .env file for configuration:
# Server configuration
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=info
# Model configuration
MODEL_NAME=spleen_ct_segmentation
MODEL_DIR=./models
# Performance
WORKERS=1The application automatically detects and uses GPU if available:
- With GPU: Faster inference, handles larger images
- Without GPU: Runs on CPU (slower but works)
- Add authentication (JWT, API keys)
- Validate file sizes and types
- Use HTTPS in production
- Set CORS origins explicitly
- Use multiple worker processes for scaling
- Add caching for frequently used models
- Implement request queuing for high load
- Consider model quantization for speed
- Add logging and metrics
- Track inference times
- Monitor memory usage
- Set up health check endpoints
uvicorn app.main:app \
--host 0.0.0.0 \
--port 8000 \
--workers 4 \
--log-level info \
--proxy-headers \
--forwarded-allow-ips='*'Error: Failed to download model bundle
Solution: Check internet connection and MONAI bundle name
Error: CUDA out of memory
Solution: Reduce batch size or use CPU with smaller model
Error: Invalid file format
Solution: Ensure file is NIfTI format (.nii or .nii.gz)
Error: Address already in use
Solution: Change port or kill process using port 8000
This tutorial is part of the MONAI tutorials collection. Contributions welcome!
Copyright 2025 MONAI Consortium Licensed under the Apache License, Version 2.0
For questions about this tutorial:
- Open an issue on GitHub
- Visit MONAI community forums
- Check existing tutorials for similar examples
Next Steps:
- β Run through the tutorial
- β Experiment with different models
- β Deploy to your infrastructure
- β Build your own medical AI application!