_____ _____ _____ _____ _____ _____ _____
/\ \ /\ \ /\ \ /\ \ /\ \ /\ \ /\ \
/::\____\ /::\____\ /::\ \ /::\____\ /::\ \ /::\ \ /::\ \
/::::| | /:::/ / /::::\ \ /:::/ / /::::\ \ /::::\ \ /::::\ \
/:::::| | /:::/ / /::::::\ \ /:::/ / /::::::\ \ /::::::\ \ /::::::\ \
/::::::| | /:::/ / /:::/\:::\ \ /:::/ / /:::/\:::\ \ /:::/\:::\ \ /:::/\:::\ \
/:::/|::| | /:::/ / /:::/__\:::\ \ /:::/____/ /:::/__\:::\ \ /:::/__\:::\ \ /:::/__\:::\ \
/:::/ |::| | /:::/ / \:::\ \:::\ \ /::::\ \ /::::\ \:::\ \ /::::\ \:::\ \ /::::\ \:::\ \
/:::/ |::|___|______ /:::/ / ___\:::\ \:::\ \ /::::::\ \ _____ /::::::\ \:::\ \ /::::::\ \:::\ \ /::::::\ \:::\ \
/:::/ |::::::::\ \ /:::/ / /\ \:::\ \:::\ \ /:::/\:::\ \ /\ \ /:::/\:::\ \:::\ \ /:::/\:::\ \:::\____\ /:::/\:::\ \:::\____\
/:::/ |:::::::::\____\/:::/____/ /::\ \:::\ \:::\____\/:::/ \:::\ /::\____\/:::/ \:::\ \:::\____\/:::/ \:::\ \:::| |/:::/ \:::\ \:::| |
\::/ / ~~~~~/:::/ /\:::\ \ \:::\ \:::\ \::/ /\::/ \:::\ /:::/ /\::/ \:::\ /:::/ /\::/ |::::\ /:::|____|\::/ \:::\ /:::|____|
\/____/ /:::/ / \:::\ \ \:::\ \:::\ \/____/ \/____/ \:::\/:::/ / \/____/ \:::\/:::/ / \/____|:::::\/:::/ / \/_____/\:::\/:::/ /
/:::/ / \:::\ \ \:::\ \:::\ \ \::::::/ / \::::::/ / |:::::::::/ / \::::::/ /
/:::/ / \:::\ \ \:::\ \:::\____\ \::::/ / \::::/ / |::|\::::/ / \::::/ /
/:::/ / \:::\ \ \:::\ /:::/ / /:::/ / /:::/ / |::| \::/____/ \::/____/
/:::/ / \:::\ \ \:::\/:::/ / /:::/ / /:::/ / |::| ~| ~~
/:::/ / \:::\ \ \::::::/ / /:::/ / /:::/ / |::| |
/:::/ / \:::\____\ \::::/ / /:::/ / /:::/ / \::| |
\::/ / \::/ / \::/ / \::/ / \::/ / \:| |
\/____/ \/____/ \/____/ \/____/ \/____/ \|___|
MLSharp-3D-Maker is a 3D Gaussian Splatting generation tool based on Apple ml-sharp model that can generate high-quality 3D models from a single photo.
| Module | Status | Completion | Description |
|---|---|---|---|
| Core Function | Completed | 100% | Image to 3D model conversion |
| GPU Acceleration | Completed | 100% | NVIDIA/AMD/Intel Support |
| Configuration Management | Completed | 100% | Command line + Configuration file |
| Logging System | Completed | 100% | loguru Professional Logging + Color Output + Detailed Context |
| Asynchronous Processing | Completed | 100% | ProcessPoolExecutor |
| Unit Testing | Completed | 95% | Core class testing + Stability testing |
| API Interface | Completed | 100% | Prediction + Health Check + Cache Management |
| Monitoring Metrics | Completed | 95% | Prometheus Integration + Performance Monitoring + Stability Improvements |
| Inference Cache | Completed | 100% | LRU Cache + Redis Distributed Cache |
| Performance Auto-Tuning | Completed | 100% | Intelligent Benchmarking + Optimal Configuration Selection |
| Webhook | Completed | 100% | Asynchronous Notification + Event Management + Error Recovery |
| Documentation | Completed | 100% | README + Configuration Examples + API Documentation |
| API Documentation | Completed | 100% | Swagger/OpenAPI + Version Control |
| Authentication Authorization | To Develop | 0% | API Key/JWT |
| GPU Memory Reclamation | Completed | 100% | Automatic Garbage Collection + Smart Memory Management + Monitoring |
| Stability Improvements | Completed | 100% | Exception Handling + Resource Management + File Operation Stability |
| Error Handling | Completed | 100% | Comprehensive Exception Capture + Graceful Degradation + Detailed Logging |
| Multi-language Support | Completed | 100% | Chinese and English interface support + Configuration file support |
Overall Completion: 100%+0%
MLSharp-3D-Maker-GPU-by-Chidc/
├── app.py # Main application (refactored version) ⭐
├── config/ # Configuration file directory (recommended to use)
│ ├── config.yaml # YAML format configuration file
│ └── config.json # JSON format configuration file
├── gpu_utils.py # GPU tools module
├── logger.py # Logging module
├── metrics.py # Monitoring metrics module ⭐
├── test_gpu_gc.py # GPU memory reclamation test script ⭐
├── demo_gpu_gc.py # GPU memory reclamation demo script ⭐
├── GPU_MEMORY_GC_README.md # GPU memory reclamation function documentation ⭐
├── optimistic.md # Performance optimization solution documentation ⭐
├── Start.bat # Windows startup script
├── Start.ps1 # PowerShell startup script
├── model_assets/ # Model files and resources
│ ├── sharp_2572gikvuh.pt # ml-sharp model weights
│ ├── inputs/ # Input examples
│ └── outputs/ # Output examples
├── python_env/ # Python environment
├── logs/ # Log files folder
├── tmp/ # Temporary files and backups
│ └── 1.28/ # 2026-01-28 Backup
└── temp_workspace/ # Temporary workspace
Click to expand latest update details
Optimize the Code, Fix Errors, Complete the Multilingual Module 2.28.1500
- Code Robustness Significantly Improved
- Fixed CLIArgs missing no_cache field issue
- Fixed Logger method duplicate definition issue
- Fixed Pydantic v2 deprecated parameters (min_items → min_length)
- Fixed metrics.py thread safety issue (using threading.Event)
- Fixed app.py GPUManager thread safety issue
- Fixed traceback.format_exc() non-exception context call issue
- Security Enhancements
- Added RestrictedUnpickler to prevent pickle deserialization attacks
- Added file upload type validation (magic number check)
- Added path traversal attack protection
- Added request size limit
- Added sensitive information leak protection
- Added configuration file path validation
- Stability Improvements
- Fixed gpu_utils.py silent exception issue
- Fixed file handle leak issue
- Fixed monitoring middleware race condition
- Added logger.py file handle close mechanism
- Multi-language Support Improvements
- Fixed all hardcoded Chinese strings
- Complete Chinese and English translation support
- Added translation key missing warning feature
- CLI parameter help text internationalization
- API error message internationalization
- Startup banner and log message internationalization
Logging and Error Handling Enhancement 02.20.1425
- Logging Style Enhancement - Added color output, icons and detailed context information (filename, function name, line number)
- Diversified Logging Methods - Added new methods like
styled_section,progress_info,performance,gpu_info,cache_info - Error Handling Enhancement - Improved all empty
except:clauses, replaced with specific exception handling - File Operation Stability - Enhanced error handling and recovery mechanisms for file saving, loading, and renaming operations
- Model Loading Protection - Added error recovery for model loading, image processing, PLY saving and other critical operations
- Webhook Resilience - Improved error handling for Webhook notifications, failure doesn't affect main flow
- Cache Operation Protection - Added error handling for Redis and local cache operations with graceful fallback
- VRAM Management Optimization - Improved handling of VRAM shortage errors with more user-friendly error messages and solutions
Stability Enhancement and Bug Fixes 02.20.1200
- Input Validation Fix - Fixed issue with
validate_input_sizefunction calling logging methods before logging system initialization - File Operation Improvement - Added exception handling for PLY file renaming operation, improved file operation fault tolerance
- Directory Cleanup Optimization - Improved exception handling for temporary directory cleanup operations, provided better error information
- Logging System Improvement - Unified logging method for GPU monitoring loop, maintained logging format consistency
- Startup Script Fix - Removed hardcoded IP address in
Start.ps1, improved portability - Resource Management Optimization - Improved temporary file and resource cleanup mechanisms, prevented resource leaks
- Test Coverage - Added stability test cases to ensure stability of key functions
Double-click Start.ps1Features:
- Automatic Detection: GPU type (NVIDIA/AMD/Intel), environment configuration, dependencies
- Smart Recommendation: Automatically recommend best startup script based on graphics card
- Comprehensive Diagnostics: 100+ error handling, intelligent problem identification
- Solutions: Each error provides detailed solution suggestions
- Log Recording: All run logs saved in logs/ folder
- Color Output: Clear visual feedback, easy to read
# Auto-detect mode (default)
python app.py
# Force GPU mode
python app.py --mode gpu
# Force CPU mode
python app.py --mode cpu
# Custom port
python app.py --port 8080
# Don't auto-open browser
python app.py --no-browserAfter startup, visit: http://127.0.0.1:8000
pip install -r requirements.txtClick to expand command line parameters details
| Parameter | Abbreviation | Type | Default Value | Description |
|---|---|---|---|---|
--mode |
-m |
string | auto |
Startup mode |
--port |
-p |
int | 8000 |
Web service port |
--host |
string | 127.0.0.1 |
Web service host address | |
--input-size |
int[] | [1536, 1536] |
Input image size [width, height] | |
--no-browser |
flag | false | Don't auto-open browser | |
--no-amp |
flag | false | Disable mixed precision inference (AMP) | |
--no-cudnn-benchmark |
flag | false | Disable cuDNN Benchmark | |
--config |
-c |
string | - | Configuration file path (supports YAML and JSON) |
--enable-cache |
flag | true | Enable inference cache (default: enabled) | |
--no-cache |
flag | false | Disable inference cache | |
--cache-size |
int | 100 |
Maximum cache entries | |
--clear-cache |
flag | false | Clear cache on startup | |
--enable-auto-tune |
flag | false | Enable performance auto-tuning | |
--redis-url |
string | - | Redis connection URL (distributed cache) | |
--enable-webhook |
flag | false | Enable Webhook asynchronous notification | |
--enable-auto-gc |
flag | true | Enable GPU auto garbage collection (default: enabled) | |
--no-auto-gc |
flag | false | Disable GPU auto garbage collection | |
--auto-gc-interval |
int | 30 |
GPU auto garbage collection check interval (seconds) | |
--auto-gc-threshold |
float | 85.0 |
GPU memory usage threshold, auto clean when exceeded (percentage) | |
--enable-smart-reclaim |
flag | true | Enable smart memory reclamation (default: enabled) | |
--no-smart-reclaim |
flag | false | Disable smart memory reclamation |
| Mode | Description |
|---|---|
auto |
Auto-detect and select best mode (default) |
gpu |
Force GPU mode (auto-detect vendor) |
cpu |
Force CPU mode |
nvidia |
Force NVIDIA GPU mode |
amd |
Force AMD GPU mode (ROCm) |
Set input image size for inference. Default is 1536x1536, which is the size used during model training.
Usage example:
# Use default size 1536x1536
python app.py
# Use custom size 1024x1024
python app.py --input-size 1024 1024
# Use 768x768 for quick testing
python app.py --input-size 768 768Constraints:
- Input size must be divisible by 64 (model encoder uses patch-based splitting)
- Width and height must be equal (model uses square input)
- Maximum supported size is 1536x1536 (SPN encoder has patch splitting errors with larger sizes)
- If provided size doesn't meet requirements, program will automatically adjust to closest valid size
Automatic Adjustment Example:
# 1000x1000 → Automatically adjusted to 1024x1024
python app.py --input-size 1000 1000
# 1200x800 → Automatically adjusted to 1200x1200 (maintaining square)
python app.py --input-size 1200 800Recommended Sizes:
| Size | Purpose | Memory Requirement | Output Quality |
|---|---|---|---|
| 512x512 | Quick testing | Low | Basic |
| 768x768 | Balanced mode | Medium | Good |
| 1024x1024 | Standard mode | Medium | Excellent |
| 1536x1536 | High quality (default/maximum) | High | Best |
Note: Maximum supported size is 1536x1536, exceeding this will cause patch splitting errors in SPN encoder.
Notes:
- Larger input sizes improve model output quality but require more memory and computing time
- Smaller input sizes can speed up inference and reduce memory usage but may lower output quality
- Recommended range: 512x512 to 1536x1536
- Maximum supported size is 1536x1536, exceeding this causes patch splitting errors
- If memory insufficient, use smaller sizes
- If using non-standard sizes, program will auto-adjust and display warning
# Basic use
python app.py
python app.py --mode gpu
python app.py --mode cpu
# Specify GPU vendor
python app.py --mode nvidia
python app.py --mode amd
# Custom port and host
python app.py --port 8080
python app.py --host 0.0.0.0 --port 8000
# Custom input size
python app.py --input-size 1024 1024
python app.py --input-size 768 768
# Disable optimization options (for debugging)
python app.py --no-browser
python app.py --no-amp
python app.py --no-cudnn-benchmark
# Enable gradient checkpointing (reduce memory usage)
python app.py --gradient-checkpointing
# Cache management (enabled by default)
python app.py # Default cache enabled
python app.py --no-cache # Disable cache
python app.py --cache-size 200 # Set cache size to 200
python app.py --clear-cache # Clear cache on startup
# Performance auto-tuning (advanced feature)
python app.py --enable-auto-tune # Auto-test and select optimal optimization configuration on startup
# Combining usage
python app.py --mode nvidia --port 8080 --no-browser --input-size 1024 1024
python app.py --gradient-checkpointing --input-size 1536 1536
python app.py --cache-size 200 --mode gpu
python app.py --clear-cache --mode gpu
# Using configuration file
python app.py --config config.yaml
python app.py --config config.json
python app.py -c config.yaml
# Configuration file + Command line parameters (command line parameters take priority)
python app.py --config config.yaml --port 8080 --input-size 1024 1024
# Multi-language support
python app.py --lang zh # Chinese interface (default)
python app.py --lang en # English interfacepython app.py --help
python app.py -hClick to expand GPU support details
| Architecture | Graphics Series | Compute Capability | Support Status | Optimization |
|---|---|---|---|---|
| Ampere | RTX 30/40 Series | 8.0+ | Full Support | AMP, TF32, cuDNN |
| Turing | RTX 20 Series | 7.5 | Full Support | AMP, cuDNN |
| Pascal | GTX 10/16 Series | 6.1 | Full Support | AMP, cuDNN |
| Maxwell | GTX 9xx Series | 5.2 | Support | AMP |
| Kepler | GTX 7xx Series | 3.0-3.7 | Basic | |
| Fermi | GTX 6xx Series | 2.1 | ❌ Not Recommended | - |
| Architecture | Graphics Series | ROCm Support | Support Status |
|---|---|---|---|
| RDNA 2 | RX 6000 Series | Full Support | Full Support |
| RDNA 1 | RX 5000 Series | Full Support | Full Support |
| GCN 5 | Vega Series | Full Support | Support |
| GCN 4 | RX 400/500 Series | ||
| GCN 3 | RX 300 Series | ❌ | ❌ No Support |
| Architecture | Graphics Series | Support Status |
|---|---|---|
| Xe | Arc Series | |
| Iris Xe | Integrated Graphics | |
| UHD | Integrated Graphics |
Click to expand logging system details
MLSharp uses Loguru as the logging system, providing professional logging management:
- Structured Logging: Includes timestamp, logging level, source information
- Color Output: Console color display, easy to distinguish different levels
- File Logging: Automatically saved to
logs/directory - Log Rotation: Automatic rotation and compression of log files (10MB rotation, keep 7 days)
- Error Tracking: Complete error stack trace and diagnostic information
- Multi-Level: DEBUG, INFO, WARNING, ERROR, CRITICAL
Log files saved in logs/ directory:
- File naming:
mlsharp_YYYYMMDD.log - Compressed files:
mlsharp_YYYYMMDD.log.zip - Retention time: 7 days
| Level | Purpose | Example |
|---|---|---|
| DEBUG | Debug information | Variable values, function calls |
| INFO | General information | Startup information, processing progress |
| WARNING | Warning information | Performance warnings, compatibility issues |
| ERROR | Error information | Processing failures, exceptions |
| CRITICAL | Serious errors | System crashes, fatal errors |
2026-01-28 20:00:00 | INFO | MLSharp:run:10 - Service started
2026-01-28 20:00:01 | SUCCESS | MLSharp:load_model:50 - Model loaded successfully
2026-01-28 20:00:02 | WARNING | MLSharp:detect_gpu:30 - Less than 4GB VRAM
2026-01-28 20:00:03 | ERROR | MLSharp:predict:100 | Processing failed: Out of memory
# View today's logs
type logs\mlsharp_20260128.log
# View all log files
dir logs\
# View error logs
findstr /C:"ERROR" logs\mlsharp_*.logClick to expand configuration file usage details
Supports both YAML and JSON format configuration files.
Default Configuration File: If --config parameter is not specified, system automatically uses config.yaml in project root directory as default configuration file.
# MLSharp-3D-Maker Configuration File
# Supported Format: YAML
# Service Configuration
server:
host: "127.0.0.1" # Service host address
port: 8000 # Service port
# Startup Mode
mode: "auto" # Startup mode: auto, gpu, cpu, nvidia, amd
# Language Configuration
language: "zh" # Interface language: zh(Chinese), en(English)
# Browser Configuration
browser:
auto_open: true # Auto-open browser
# GPU Optimization Configuration
gpu:
enable_amp: true # Enable mixed precision inference (AMP)
enable_cudnn_benchmark: true # Enable cuDNN Benchmark
enable_tf32: true # Enable TensorFloat32
# Logging Configuration
logging:
level: "INFO" # Logging level: DEBUG, INFO, WARNING, ERROR
console: true # Console output
file: false # File output
# Model Configuration
model:
checkpoint: "model_assets/sharp_2572gikvuh.pt" # Model weights path
temp_dir: "temp_workspace" # Temporary workspace directory
# Inference Configuration
inference:
input_size: [1536, 1536] # Input image size [width, height] (default: 1536x1536)
# Optimization Configuration
optimization:
gradient_checkpointing: false # Enable gradient checkpointing (reduce memory usage, slightly decrease inference speed)
checkpoint_segments: 3 # Gradient checkpointing segments (not used yet)
# Cache Configuration
cache:
enabled: true # Enable inference cache (default: enabled)
size: 100 # Maximum cache entries (default: 100)
# Redis Cache Configuration
redis:
enabled: false # Enable Redis cache (default: disabled)
url: "redis://localhost:6379/0" # Redis connection URL
prefix: "mlsharp" # Cache key prefix
# Webhook Configuration
webhook:
enabled: false # Enable Webhook notification (default: disabled)
task_completed: "" # Task completed notification URL
task_failed: "" # Task failed notification URL
# Monitoring Configuration
monitoring:
enabled: true # Enable monitoring
enable_gpu: true # Enable GPU monitoring
metrics_path: "/metrics" # Prometheus metrics endpoint path
# Performance Configuration
performance:
max_workers: 4 # Maximum worker threads
max_concurrency: 10 # Maximum concurrency
timeout_keep_alive: 30 # Keep-alive timeout(seconds)
max_requests: 1000 # Maximum requests
# Performance Cache Configuration (auto-generated, no manual configuration needed)
performance_cache:
last_test: null # Last test time (ISO 8601 format)
best_config: null # Optimal configuration
gpu: null # GPU information{
"server": {
"host": "127.0.0.1",
"port": 8000
},
"mode": "auto",
"browser": {
"auto_open": true
},
"gpu": {
"enable_amp": true,
"enable_cudnn_benchmark": true,
"enable_tf32": true
},
"logging": {
"level": "INFO",
"console": true,
"file": false
},
"model": {
"checkpoint": "model_assets/sharp_2572gikvuh.pt",
"temp_dir": "temp_workspace"
},
"inference": {
"input_size": [1536, 1536]
},
"optimization": {
"gradient_checkpointing": false,
"checkpoint_segments": 3
},
"cache": {
"enabled": true,
"size": 100
},
"redis": {
"enabled": false,
"url": "redis://localhost:6379/0",
"prefix": "mlsharp"
},
"webhook": {
"enabled": false,
"task_completed": "",
"task_failed": ""
},
"monitoring": {
"enabled": true,
"enable_gpu": true,
"metrics_path": "/metrics"
},
"performance": {
"max_workers": 4,
"max_concurrency": 10,
"timeout_keep_alive": 30,
"max_requests": 1000
}
}Basic Usage:
# Use YAML configuration file
python app.py --config config.yaml
# Use JSON configuration file
python app.py --config config.json
# Abbreviation
python app.py -c config.yaml
# Recommended: Use config folder to manage configuration files
python app.py --config config/performance.yaml
python app.py --config config/settings.jsonConfiguration File + Command Line Parameters:
# Command line parameters override corresponding settings in configuration file
python app.py --config config.yaml --port 8080 --mode gpuConfiguration File Auto-Creation/Update:
# If configuration file doesn't exist, auto-create with default configuration
# If configuration file exists, only update performance tuning cache, other configurations remain unchanged
python app.py --enable-auto-tune --config config/auto_tune.jsonCommand line parameters > Configuration file > Default values
For example:
# config.yaml sets port: 8000
# Command line parameter specifies --port 8080
# Final uses 8080
python app.py --config config.yaml --port 8080| Configuration Item | Description | Optional Values |
|---|---|---|
server.host |
Service host address | IP address |
server.port |
Service port | 1-65535 |
mode |
Startup mode | auto, gpu, cpu, nvidia, amd |
browser.auto_open |
Auto-open browser | true, false |
gpu.enable_amp |
Enable mixed precision inference | true, false |
gpu.enable_cudnn_benchmark |
Enable cuDNN Benchmark | true, false |
gpu.enable_tf32 |
Enable TensorFloat32 | true, false |
logging.level |
Logging level | DEBUG, INFO, WARNING, ERROR |
logging.console |
Console output | true, false |
logging.file |
File output | true, false |
model.checkpoint |
Model weights path | File path |
model.temp_dir |
Temporary workspace directory | Directory path |
inference.input_size |
Input image size | [width, height], default [1536, 1536] |
monitoring.enabled |
Enable monitoring | true, false |
monitoring.enable_gpu |
Enable GPU monitoring | true, false |
monitoring.metrics_path |
Prometheus metrics endpoint path | Path string |
optimization.gradient_checkpointing |
Enable gradient checkpointing | true, false |
optimization.checkpoint_segments |
Gradient checkpointing segments | Positive integer |
performance.max_workers |
Maximum worker threads | Positive integer |
performance.max_concurrency |
Maximum concurrency | Positive integer |
performance.timeout_keep_alive |
Keep-alive timeout(seconds) | Positive integer |
performance.max_requests |
Maximum requests | Positive integer |
auto_tune.enabled |
Enable performance auto-tuning | true, false |
auto_tune.test_size |
Test image size | [width, height] |
auto_tune.warmup_runs |
Warm-up run count | Positive integer |
auto_tune.test_runs |
Test run count | Positive integer |
performance_cache.last_test |
Last test time | ISO 8601 timestamp (auto-generated) |
performance_cache.best_config |
Optimal configuration | Configuration dictionary (auto-generated) |
performance_cache.gpu |
GPU information | GPU information (auto-generated) |
Click to expand auto-tuning function details
MLSharp provides intelligent performance auto-tuning function that can automatically test and select optimal optimization configuration.
- Intelligent Benchmarking: Automatically test various optimization configuration combinations
- Optimal Configuration Selection: Automatically select best configuration based on test results
- GPU Adaptation: Automatically filter out unsupported configurations based on GPU capability
- Quick Testing: Use small size to complete testing quickly (about 10 seconds)
- Detailed Logging: Output complete test process and results
- Performance Improvement: 30-50% performance improvement compared to non-optimized configuration
- Result Caching: Automatically save test results to configuration file, valid for 7 days
- Smart Skip: Automatically skip testing when detecting valid cache, speed up startup
Auto-tuner will test following configuration combinations:
| Configuration | Description | Applicable Scenario |
|---|---|---|
| Baseline Configuration | No optimizations | All GPUs |
| AMP Only | Only enable mixed precision | Compute capability ≥ 5.3 |
| cuDNN Only | Only enable cuDNN Benchmark | NVIDIA, Compute capability ≥ 6.0 |
| TF32 Only | Only enable TensorFloat32 | NVIDIA, Compute capability ≥ 8.0 |
| AMP + cuDNN | Mixed precision + cuDNN | NVIDIA, Compute capability ≥ 6.0 |
| AMP + TF32 | Mixed precision + TF32 | NVIDIA, Compute capability ≥ 8.0 |
| All Optimizations | Enable all optimizations | High-end NVIDIA GPU |
# Enable performance auto-tuning (using default configuration file config.yaml)
python app.py --enable-auto-tune
# Combining usage
python app.py --enable-auto-tune --mode gpu --input-size 1024 1024
# Specify configuration file (results will be saved to this file)
python app.py --enable-auto-tune --config config.yaml
# Use config folder to save configuration (recommended)
python app.py --enable-auto-tune --config config/performance.yaml
# If configuration file doesn't exist, auto-create with default configuration
python app.py --enable-auto-tune --config config/auto_tune.jsonNote: If --config parameter is not specified, system automatically uses config.yaml in project root directory as default configuration file.
Auto-tuning results are automatically saved to configuration file to avoid repeated testing:
- Cache Validity: 7 days
- Cache Condition: GPU model, vendor, compute capability must match
- Auto Skip: Automatically skip testing when detecting valid cache
- Auto Apply: Directly use cached optimal configuration
- Auto Creation/Update: Auto-create configuration file if doesn't exist (with default configuration), only update performance tuning cache if exists
- Directory Support: Auto-create configuration directory (such as config folder)
Log Output Example (when using cache):
[INFO] Found valid performance tuning cache (3 days ago)
============================================================
[INFO] Using cached performance configuration
============================================================
Configuration Name: All Optimizations
Description: Enable all optimizations
Log Output Example (when creating configuration file):
[INFO] Configuration file doesn't exist, auto-create new configuration file: config.yaml
[SUCCESS] Performance tuning results added to configuration file: config.yaml
Log Output Example (when updating existing configuration file):
[INFO] Configuration file exists, update performance tuning cache: config.yaml
[SUCCESS] Performance tuning results updated to configuration file: config.yaml
Configuration File Processing Description:
- Configuration file exists: Only update
performance_cachefield, other configurations remain unchanged - Configuration file doesn't exist: Create new configuration file, containing complete default configuration
Tuning results saved in performance_cache field of configuration file:
# config.yaml
performance_cache:
last_test: "2026-01-31T12:00:00+00:00"
best_config:
name: "All Optimizations"
amp: true
cudnn_benchmark: true
tf32: true
description: "Enable all optimizations"
gpu:
name: "NVIDIA GeForce RTX 4090"
vendor: "NVIDIA"
compute_capability: 89- Cache Check: Check if valid tuning cache exists in configuration file (within 7 days)
- Cache Hit: If cache is valid and GPU matches, directly use cached results
- Benchmark Testing: If cache invalid or expired, perform complete test
- Warm-up Phase: Run 2 warm-ups to stabilize performance
- Test Phase: Run 3 tests for each configuration
- Result Statistics: Calculate average inference time and throughput
- Optimal Selection: Select fastest configuration and apply
- Cache Save: Save optimal configuration to configuration file
============================================================
[INFO] Performance Auto-Tuning
============================================================
Testing different optimization configurations...
Test Configuration: Baseline Configuration
Description: No optimizations
Run 1/3: 2.543 seconds
Run 2/3: 2.512 seconds
Run 3/3: 2.528 seconds
Average Inference Time: 2.528 seconds
Test Configuration: AMP Only
Description: Only enable mixed precision inference
Run 1/3: 1.892 seconds
Run 2/3: 1.876 seconds
Run 3/3: 1.884 seconds
Average Inference Time: 1.884 seconds
Test Configuration: All Optimizations
Description: Enable all optimizations
Run 1/3: 1.245 seconds
Run 2/3: 1.238 seconds
Run 3/3: 1.241 seconds
Average Inference Time: 1.241 seconds
============================================================
[INFO] Tuning Results
============================================================
[SUCCESS] Optimal Configuration: All Optimizations
[INFO] Description: Enable all optimizations
[INFO] Average Inference Time: 1.241 seconds
[INFO] Throughput: 0.81 FPS
[SUCCESS] Performance auto-tuning completed!
[INFO] Optimal configuration applied
- Initial Run: Recommended to enable auto-tuning on first run
- Hardware Changes: Re-run auto-tuning after changing GPU
- Driver Updates: Re-test after GPU driver updates
- Regular Tuning: Recommended to run auto-tuning monthly
- Cache Management: System automatically caches tuning results for 7 days, no manual management needed
- Configuration File: Recommended to use
config/folder to manage configuration files, such asconfig/performance.yaml - Auto Creation/Update: Configuration file doesn't exist: auto-create (with default configuration), exists: only update performance tuning cache
- Clear Cache: To force re-testing, delete
performance_cachefield in configuration file or use new configuration file
Click to expand performance optimization suggestions
-
Use Appropriate Image Size
- Recommended: 512x512 - 1024x1024
- Avoid exceeding 2048x2048
-
Enable All Optimizations
- AMP (mixed precision) enabled by default
- cuDNN Benchmark enabled by default
- TF32 enabled by default (Ampere architecture)
-
Enable Gradient Checkpointing When VRAM Insufficient
- Use
--gradient-checkpointingparameter - Can reduce 30-50% VRAM usage
- Speed slightly reduced 10-20% (acceptable)
- Use
-
Close Other GPU Occupying Programs
- Close browser hardware acceleration
- Close other AI applications
- Close games or graphics-intensive applications
-
Use Smaller Images
- Recommended: 512x512 or smaller
-
Reduce Concurrency
- Modify
max_workersin configuration - Recommended value: CPU core count / 2
- Modify
-
Use Faster Startup Script
Start_CPU_Fast.bat- Fast mode
-
Increase Virtual Memory
- Set to 1.5-2 times physical memory
-
Use SSD
- Faster model loading and I/O operations
-
Close Unnecessary Background Programs
- Free up more system resources
Click to expand inference cache details
MLSharp provides intelligent inference cache function that can significantly improve processing speed for repeated scenarios.
- Smart Hashing: Generate unique cache key based on image content and focal length
- LRU Elimination: Least recently used algorithm automatically eliminates old cache
- Statistical Monitoring: Real-time cache hit rate, hit/miss count statistics
- Thread Safety: Use lock mechanism to ensure multi-thread safety
- Memory Management: Configurable cache size limit
Cache function enabled by default, can be controlled via command line parameters or configuration file:
# Command line parameters
python app.py # Default cache enabled
python app.py --no-cache # Disable cache
python app.py --cache-size 200 # Set cache size to 200# config.yaml
cache:
enabled: true # Enable cache (default: true)
size: 100 # Maximum cache entries (default: 100)curl http://127.0.0.1:8000/v1/cacheReturn Example:
{
"enabled": true,
"size": 45,
"max_size": 100,
"hits": 120,
"misses": 30,
"hit_rate": 80.0
}curl -X POST http://127.0.0.1:8000/v1/cache/clearReturn Example:
{
"status": "success",
"message": "Cache cleared"
}Cache function can significantly improve processing speed, especially in repeated scenarios:
| Cache Hit Rate | Speed Improvement | Applicable Scenario |
|---|---|---|
| 30% | 30% | Small amount of repeated images |
| 50% | 50% | Medium repeated scenario |
| 80% | 80% | Large amount of repeated images |
- Adjust Cache Size Appropriately: Adjust cache size based on memory and actual needs
- Monitor Cache Hit Rate: Regularly check cache hit rate, evaluate cache effectiveness
- Clear Cache Regularly: If memory tight, clear cache regularly
- Disable Cache Scenario: When processing completely different images, can disable cache
Click to expand Redis cache details
- Distributed Cache: Support multi-instance sharing cache
- Persistence: Cache data persistence to Redis
- TTL Support: Automatic expiration mechanism
- Mixed Usage: Can be used with local cache simultaneously
- High Performance: Based on Redis in-memory database
# Use Redis cache
python app.py --redis-url redis://localhost:6379/0
# Use Redis cache + Webhook
python app.py --redis-url redis://localhost:6379/0 --enable-webhook# config.yaml
redis:
enabled: true
url: "redis://localhost:6379/0"
prefix: "mlsharp"| Cache Type | Hit Speed | Distributed Support | Persistence | Applicable Scenario |
|---|---|---|---|---|
| Local Cache | Fastest | ❌ | ❌ | Single-instance deployment |
| Redis Cache | Fast | ✅ | ✅ | Multi-instance deployment |
- Production Environment Recommended: Use Redis cache to support multi-instance deployment
- Local Development: Use local cache, no Redis service needed
Click to expand version history
Code Health Check and Fix 02.05.1914
- Code Quality Improvement - Fixed unused ProcessPoolExecutor, optimized resource usage
- Pydantic v2 Update - Updated to Pydantic v2 syntax, using @field_validator instead of @validator
- Resource Management Optimization - Added cleanup() method, ensured GPU monitoring threads and Webhook clients close properly
- Redis Connection Management - Added del method, automatically closes Redis connections
- Test File Added - Added test_app.py, including core function tests
- Test Script Update - Updated run_tests.bat and run_tests.ps1, supporting Windows and PowerShell
- Test Coverage - Module import, configuration validation, GPU detection, monitoring metrics and other core functions
- Test Results - All tests passed (4/4)
- New Format - Adopted [Month].[Day].[HHMM] format (e.g., 02.05.1900)
- Description - Month.Day.HourMinute (24-hour format)
Snapdragon GPU Adaptation 02.03.1851
- Main Branch Removed Adreno GPU Support - Removed Snapdragon/Adreno series GPU support
GPU Memory Auto Reclamation 02.03.1851
- Memory Information Query - Real-time GPU memory usage (total, used, available, usage rate)
- Cache Cleanup - Automatically clear PyTorch reserved but unused memory
- Force Garbage Collection - Complete garbage collection process (clear cache → sync GPU → Python GC → clear again)
- Smart Memory Reclamation - Automatically clean when memory usage exceeds threshold (default 85%)
- Auto Memory Monitoring - Background thread regularly checks and automatically clears memory (default every 30 seconds)
- Command Line Parameters - Supports
--enable-auto-gc,--auto-gc-interval,--auto-gc-thresholdparameters - Configuration File Support - Configure memory reclamation strategy in config.yaml
- Performance Optimization - Prevent memory leaks, improve system stability
- Logging - Detailed memory cleanup logs for debugging
Snapdragon GPU Adaptation 01.31.1931
- Adreno GPU Detection - Automatically detect Snapdragon/Adreno series GPU
- Qualcomm Mode - Added
--mode qualcommstartup mode - ONNX Runtime Support - Added ONNX Runtime + DirectML acceleration solution
- Smart Fallback - Automatically use CPU mode when detecting Snapdragon GPU
- Platform Support - Windows/Android platform identification
- Documentation Update - Added Snapdragon GPU support instructions and limitations
Click to expand future improvements plan
- Unit Testing: Added unit tests for each class
- Configuration Files: Support loading configuration from files
- Logging System: Use professional logging library (e.g., loguru)
- Asynchronous Optimization: Further optimize asynchronous processing
- Authentication Authorization - Add user authentication
- API Key authentication
- JWT Token support
- Rate limiting
-
Task Queue - Asynchronous task processing
- Redis queue support
- Task status tracking
- Batch processing support
-
Batch Processing API - Batch image processing
- Multiple file uploads
- Batch prediction
- Result packaging and download
-
Internationalization - Multi-language support ✅ Completed
- i18n support ✅
- Chinese and English interface ✅
- Expandable language packs ✅
- Configuration file support ✅
-
Plugin System - Extensible architecture
- Custom plugins
- Model plugins
- Post-processing plugins
-
Batch Processing API - Batch image processing
- Multiple file uploads
- Batch prediction
- Result packaging and download
Welcome to submit Issues and Pull Requests!
- Configuration File Example - YAML format configuration file
- API Documentation - Swagger/OpenAPI auto-generated API documentation
- Project Homepage: https://github.com/ChidcGithub/MLSharp-3D-Maker-GPU
- Issue Feedback: Issues