Skip to content

Latest commit

 

History

History
1187 lines (932 loc) · 44.3 KB

File metadata and controls

1187 lines (932 loc) · 44.3 KB
          _____                    _____            _____                    _____                    _____                    _____                    _____          
         /\    \                  /\    \          /\    \                  /\    \                  /\    \                  /\    \                  /\    \         
        /::\____\                /::\____\        /::\    \                /::\____\                /::\    \                /::\    \                /::\    \        
       /::::|   |               /:::/    /       /::::\    \              /:::/    /               /::::\    \              /::::\    \              /::::\    \       
      /:::::|   |              /:::/    /       /::::::\    \            /:::/    /               /::::::\    \            /::::::\    \            /::::::\    \      
     /::::::|   |             /:::/    /       /:::/\:::\    \          /:::/    /               /:::/\:::\    \          /:::/\:::\    \          /:::/\:::\    \     
    /:::/|::|   |            /:::/    /       /:::/__\:::\    \        /:::/____/               /:::/__\:::\    \        /:::/__\:::\    \        /:::/__\:::\    \    
   /:::/ |::|   |           /:::/    /        \:::\   \:::\    \      /::::\    \              /::::\   \:::\    \      /::::\   \:::\    \      /::::\   \:::\    \   
  /:::/  |::|___|______    /:::/    /       ___\:::\   \:::\    \    /::::::\    \   _____    /::::::\   \:::\    \    /::::::\   \:::\    \    /::::::\   \:::\    \  
 /:::/   |::::::::\    \  /:::/    /       /\   \:::\   \:::\    \  /:::/\:::\    \ /\    \  /:::/\:::\   \:::\    \  /:::/\:::\   \:::\____\  /:::/\:::\   \:::\____\ 
/:::/    |:::::::::\____\/:::/____/       /::\   \:::\   \:::\____\/:::/  \:::\    /::\____\/:::/  \:::\   \:::\____\/:::/  \:::\   \:::|    |/:::/  \:::\   \:::|    |
\::/    / ~~~~~/:::/    /\:::\    \       \:::\   \:::\   \::/    /\::/    \:::\  /:::/    /\::/    \:::\  /:::/    /\::/   |::::\  /:::|____|\::/    \:::\  /:::|____|
 \/____/      /:::/    /  \:::\    \       \:::\   \:::\   \/____/  \/____/ \:::\/:::/    /  \/____/ \:::\/:::/    /  \/____|:::::\/:::/    /  \/_____/\:::\/:::/    /  
             /:::/    /    \:::\    \       \:::\   \:::\    \               \::::::/    /            \::::::/    /         |:::::::::/    /            \::::::/    /   
            /:::/    /      \:::\    \       \:::\   \:::\____\               \::::/    /              \::::/    /          |::|\::::/    /              \::::/    /    
           /:::/    /        \:::\    \       \:::\  /:::/    /               /:::/    /               /:::/    /           |::| \::/____/                \::/____/     
          /:::/    /          \:::\    \       \:::\/:::/    /               /:::/    /               /:::/    /            |::|  ~|                       ~~           
         /:::/    /            \:::\    \       \::::::/    /               /:::/    /               /:::/    /             |::|   |                                   
        /:::/    /              \:::\____\       \::::/    /               /:::/    /               /:::/    /              \::|   |                                   
        \::/    /                \::/    /        \::/    /                \::/    /                \::/    /                \:|   |                                   
         \/____/                  \/____/          \/____/                  \/____/                  \/____/                  \|___|                                    

MLSharp 3D Maker

Instructions

Project Overview

MLSharp-3D-Maker is a 3D Gaussian Splatting generation tool based on Apple ml-sharp model that can generate high-quality 3D models from a single photo.

Project Completion

Module Status Completion Description
Core Function Completed 100% Image to 3D model conversion
GPU Acceleration Completed 100% NVIDIA/AMD/Intel Support
Configuration Management Completed 100% Command line + Configuration file
Logging System Completed 100% loguru Professional Logging + Color Output + Detailed Context
Asynchronous Processing Completed 100% ProcessPoolExecutor
Unit Testing Completed 95% Core class testing + Stability testing
API Interface Completed 100% Prediction + Health Check + Cache Management
Monitoring Metrics Completed 95% Prometheus Integration + Performance Monitoring + Stability Improvements
Inference Cache Completed 100% LRU Cache + Redis Distributed Cache
Performance Auto-Tuning Completed 100% Intelligent Benchmarking + Optimal Configuration Selection
Webhook Completed 100% Asynchronous Notification + Event Management + Error Recovery
Documentation Completed 100% README + Configuration Examples + API Documentation
API Documentation Completed 100% Swagger/OpenAPI + Version Control
Authentication Authorization To Develop 0% API Key/JWT
GPU Memory Reclamation Completed 100% Automatic Garbage Collection + Smart Memory Management + Monitoring
Stability Improvements Completed 100% Exception Handling + Resource Management + File Operation Stability
Error Handling Completed 100% Comprehensive Exception Capture + Graceful Degradation + Detailed Logging
Multi-language Support Completed 100% Chinese and English interface support + Configuration file support

Overall Completion: 100%+0%


Project Structure and Updates

MLSharp-3D-Maker-GPU-by-Chidc/
├── app.py # Main application (refactored version) ⭐
├── config/ # Configuration file directory (recommended to use)
│ ├── config.yaml # YAML format configuration file
│ └── config.json # JSON format configuration file
├── gpu_utils.py # GPU tools module
├── logger.py # Logging module
├── metrics.py # Monitoring metrics module ⭐
├── test_gpu_gc.py # GPU memory reclamation test script ⭐
├── demo_gpu_gc.py # GPU memory reclamation demo script ⭐
├── GPU_MEMORY_GC_README.md # GPU memory reclamation function documentation ⭐
├── optimistic.md # Performance optimization solution documentation ⭐
├── Start.bat # Windows startup script
├── Start.ps1 # PowerShell startup script
├── model_assets/ # Model files and resources
│ ├── sharp_2572gikvuh.pt # ml-sharp model weights
│ ├── inputs/ # Input examples
│ └── outputs/ # Output examples
├── python_env/ # Python environment
├── logs/ # Log files folder
├── tmp/ # Temporary files and backups
│ └── 1.28/ # 2026-01-28 Backup
└── temp_workspace/ # Temporary workspace
Click to expand latest update details

Latest Update (2026-02-28)

Optimize the Code, Fix Errors, Complete the Multilingual Module 2.28.1500

  • Code Robustness Significantly Improved
    • Fixed CLIArgs missing no_cache field issue
    • Fixed Logger method duplicate definition issue
    • Fixed Pydantic v2 deprecated parameters (min_items → min_length)
    • Fixed metrics.py thread safety issue (using threading.Event)
    • Fixed app.py GPUManager thread safety issue
    • Fixed traceback.format_exc() non-exception context call issue
  • Security Enhancements
    • Added RestrictedUnpickler to prevent pickle deserialization attacks
    • Added file upload type validation (magic number check)
    • Added path traversal attack protection
    • Added request size limit
    • Added sensitive information leak protection
    • Added configuration file path validation
  • Stability Improvements
    • Fixed gpu_utils.py silent exception issue
    • Fixed file handle leak issue
    • Fixed monitoring middleware race condition
    • Added logger.py file handle close mechanism
  • Multi-language Support Improvements
    • Fixed all hardcoded Chinese strings
    • Complete Chinese and English translation support
    • Added translation key missing warning feature
    • CLI parameter help text internationalization
    • API error message internationalization
    • Startup banner and log message internationalization

Logging and Error Handling Enhancement 02.20.1425

  • Logging Style Enhancement - Added color output, icons and detailed context information (filename, function name, line number)
  • Diversified Logging Methods - Added new methods like styled_section, progress_info, performance, gpu_info, cache_info
  • Error Handling Enhancement - Improved all empty except: clauses, replaced with specific exception handling
  • File Operation Stability - Enhanced error handling and recovery mechanisms for file saving, loading, and renaming operations
  • Model Loading Protection - Added error recovery for model loading, image processing, PLY saving and other critical operations
  • Webhook Resilience - Improved error handling for Webhook notifications, failure doesn't affect main flow
  • Cache Operation Protection - Added error handling for Redis and local cache operations with graceful fallback
  • VRAM Management Optimization - Improved handling of VRAM shortage errors with more user-friendly error messages and solutions

Stability Enhancement and Bug Fixes 02.20.1200

  • Input Validation Fix - Fixed issue with validate_input_size function calling logging methods before logging system initialization
  • File Operation Improvement - Added exception handling for PLY file renaming operation, improved file operation fault tolerance
  • Directory Cleanup Optimization - Improved exception handling for temporary directory cleanup operations, provided better error information
  • Logging System Improvement - Unified logging method for GPU monitoring loop, maintained logging format consistency
  • Startup Script Fix - Removed hardcoded IP address in Start.ps1, improved portability
  • Resource Management Optimization - Improved temporary file and resource cleanup mechanisms, prevented resource leaks
  • Test Coverage - Added stability test cases to ensure stability of key functions

Quick Start

Recommended Startup Method

Smart Run (Recommended for beginners):

Double-click Start.ps1

Features:

  • Automatic Detection: GPU type (NVIDIA/AMD/Intel), environment configuration, dependencies
  • Smart Recommendation: Automatically recommend best startup script based on graphics card
  • Comprehensive Diagnostics: 100+ error handling, intelligent problem identification
  • Solutions: Each error provides detailed solution suggestions
  • Log Recording: All run logs saved in logs/ folder
  • Color Output: Clear visual feedback, easy to read

Using Command Line Parameters (Advanced Users):

# Auto-detect mode (default)
python app.py

# Force GPU mode
python app.py --mode gpu

# Force CPU mode
python app.py --mode cpu

# Custom port
python app.py --port 8080

# Don't auto-open browser
python app.py --no-browser

Access Address

After startup, visit: http://127.0.0.1:8000


Dependency Installation

Basic Dependencies

pip install -r requirements.txt

Command Line Parameters

Click to expand command line parameters details

Basic Parameters

Parameter Abbreviation Type Default Value Description
--mode -m string auto Startup mode
--port -p int 8000 Web service port
--host string 127.0.0.1 Web service host address
--input-size int[] [1536, 1536] Input image size [width, height]
--no-browser flag false Don't auto-open browser
--no-amp flag false Disable mixed precision inference (AMP)
--no-cudnn-benchmark flag false Disable cuDNN Benchmark
--config -c string - Configuration file path (supports YAML and JSON)
--enable-cache flag true Enable inference cache (default: enabled)
--no-cache flag false Disable inference cache
--cache-size int 100 Maximum cache entries
--clear-cache flag false Clear cache on startup
--enable-auto-tune flag false Enable performance auto-tuning
--redis-url string - Redis connection URL (distributed cache)
--enable-webhook flag false Enable Webhook asynchronous notification
--enable-auto-gc flag true Enable GPU auto garbage collection (default: enabled)
--no-auto-gc flag false Disable GPU auto garbage collection
--auto-gc-interval int 30 GPU auto garbage collection check interval (seconds)
--auto-gc-threshold float 85.0 GPU memory usage threshold, auto clean when exceeded (percentage)
--enable-smart-reclaim flag true Enable smart memory reclamation (default: enabled)
--no-smart-reclaim flag false Disable smart memory reclamation

Startup Modes (--mode)

Mode Description
auto Auto-detect and select best mode (default)
gpu Force GPU mode (auto-detect vendor)
cpu Force CPU mode
nvidia Force NVIDIA GPU mode
amd Force AMD GPU mode (ROCm)

Input Size (--input-size)

Set input image size for inference. Default is 1536x1536, which is the size used during model training.

Usage example:

# Use default size 1536x1536
python app.py

# Use custom size 1024x1024
python app.py --input-size 1024 1024

# Use 768x768 for quick testing
python app.py --input-size 768 768

Constraints:

  • Input size must be divisible by 64 (model encoder uses patch-based splitting)
  • Width and height must be equal (model uses square input)
  • Maximum supported size is 1536x1536 (SPN encoder has patch splitting errors with larger sizes)
  • If provided size doesn't meet requirements, program will automatically adjust to closest valid size

Automatic Adjustment Example:

# 1000x1000 → Automatically adjusted to 1024x1024
python app.py --input-size 1000 1000

# 1200x800 → Automatically adjusted to 1200x1200 (maintaining square)
python app.py --input-size 1200 800

Recommended Sizes:

Size Purpose Memory Requirement Output Quality
512x512 Quick testing Low Basic
768x768 Balanced mode Medium Good
1024x1024 Standard mode Medium Excellent
1536x1536 High quality (default/maximum) High Best

Note: Maximum supported size is 1536x1536, exceeding this will cause patch splitting errors in SPN encoder.

Notes:

  • Larger input sizes improve model output quality but require more memory and computing time
  • Smaller input sizes can speed up inference and reduce memory usage but may lower output quality
  • Recommended range: 512x512 to 1536x1536
  • Maximum supported size is 1536x1536, exceeding this causes patch splitting errors
  • If memory insufficient, use smaller sizes
  • If using non-standard sizes, program will auto-adjust and display warning

Usage Examples

# Basic use
python app.py
python app.py --mode gpu
python app.py --mode cpu

# Specify GPU vendor
python app.py --mode nvidia
python app.py --mode amd

# Custom port and host
python app.py --port 8080
python app.py --host 0.0.0.0 --port 8000

# Custom input size
python app.py --input-size 1024 1024
python app.py --input-size 768 768

# Disable optimization options (for debugging)
python app.py --no-browser
python app.py --no-amp
python app.py --no-cudnn-benchmark

# Enable gradient checkpointing (reduce memory usage)
python app.py --gradient-checkpointing

# Cache management (enabled by default)
python app.py # Default cache enabled
python app.py --no-cache # Disable cache
python app.py --cache-size 200 # Set cache size to 200
python app.py --clear-cache # Clear cache on startup

# Performance auto-tuning (advanced feature)
python app.py --enable-auto-tune # Auto-test and select optimal optimization configuration on startup

# Combining usage
python app.py --mode nvidia --port 8080 --no-browser --input-size 1024 1024
python app.py --gradient-checkpointing --input-size 1536 1536
python app.py --cache-size 200 --mode gpu
python app.py --clear-cache --mode gpu

# Using configuration file
python app.py --config config.yaml
python app.py --config config.json
python app.py -c config.yaml

# Configuration file + Command line parameters (command line parameters take priority)
python app.py --config config.yaml --port 8080 --input-size 1024 1024

# Multi-language support
python app.py --lang zh  # Chinese interface (default)
python app.py --lang en  # English interface

Get Help

python app.py --help
python app.py -h

GPU Support Status

Click to expand GPU support details

NVIDIA GPU

Architecture Graphics Series Compute Capability Support Status Optimization
Ampere RTX 30/40 Series 8.0+ Full Support AMP, TF32, cuDNN
Turing RTX 20 Series 7.5 Full Support AMP, cuDNN
Pascal GTX 10/16 Series 6.1 Full Support AMP, cuDNN
Maxwell GTX 9xx Series 5.2 Support AMP
Kepler GTX 7xx Series 3.0-3.7 ⚠️ Old GPU Basic
Fermi GTX 6xx Series 2.1 ❌ Not Recommended -

AMD GPU

Architecture Graphics Series ROCm Support Support Status
RDNA 2 RX 6000 Series Full Support Full Support
RDNA 1 RX 5000 Series Full Support Full Support
GCN 5 Vega Series Full Support Support
GCN 4 RX 400/500 Series ⚠️ ⚠️ Partial Support
GCN 3 RX 300 Series ❌ No Support

Intel GPU

Architecture Graphics Series Support Status
Xe Arc Series ⚠️ CPU Mode Only
Iris Xe Integrated Graphics ⚠️ CPU Mode Only
UHD Integrated Graphics ⚠️ CPU Mode Only

Logging System

Click to expand logging system details

Logging Features

MLSharp uses Loguru as the logging system, providing professional logging management:

  • Structured Logging: Includes timestamp, logging level, source information
  • Color Output: Console color display, easy to distinguish different levels
  • File Logging: Automatically saved to logs/ directory
  • Log Rotation: Automatic rotation and compression of log files (10MB rotation, keep 7 days)
  • Error Tracking: Complete error stack trace and diagnostic information
  • Multi-Level: DEBUG, INFO, WARNING, ERROR, CRITICAL

Log Files

Log files saved in logs/ directory:

  • File naming: mlsharp_YYYYMMDD.log
  • Compressed files: mlsharp_YYYYMMDD.log.zip
  • Retention time: 7 days

Logging Levels

Level Purpose Example
DEBUG Debug information Variable values, function calls
INFO General information Startup information, processing progress
WARNING Warning information Performance warnings, compatibility issues
ERROR Error information Processing failures, exceptions
CRITICAL Serious errors System crashes, fatal errors

Log Output Example

2026-01-28 20:00:00 | INFO | MLSharp:run:10 - Service started
2026-01-28 20:00:01 | SUCCESS | MLSharp:load_model:50 - Model loaded successfully
2026-01-28 20:00:02 | WARNING | MLSharp:detect_gpu:30 - Less than 4GB VRAM
2026-01-28 20:00:03 | ERROR | MLSharp:predict:100 | Processing failed: Out of memory

View Logs

# View today's logs
type logs\mlsharp_20260128.log

# View all log files
dir logs\

# View error logs
findstr /C:"ERROR" logs\mlsharp_*.log

Configuration File Usage

Click to expand configuration file usage details

Configuration File Format

Supports both YAML and JSON format configuration files.

Default Configuration File: If --config parameter is not specified, system automatically uses config.yaml in project root directory as default configuration file.

YAML Format (config.yaml)

# MLSharp-3D-Maker Configuration File
# Supported Format: YAML

# Service Configuration
server:
  host: "127.0.0.1" # Service host address
  port: 8000 # Service port

# Startup Mode
mode: "auto" # Startup mode: auto, gpu, cpu, nvidia, amd

# Language Configuration
language: "zh"             # Interface language: zh(Chinese), en(English)

# Browser Configuration
browser:
  auto_open: true # Auto-open browser

# GPU Optimization Configuration
gpu:
  enable_amp: true # Enable mixed precision inference (AMP)
  enable_cudnn_benchmark: true # Enable cuDNN Benchmark
  enable_tf32: true # Enable TensorFloat32

# Logging Configuration
logging:
  level: "INFO" # Logging level: DEBUG, INFO, WARNING, ERROR
  console: true # Console output
  file: false # File output

# Model Configuration
model:
  checkpoint: "model_assets/sharp_2572gikvuh.pt" # Model weights path
  temp_dir: "temp_workspace" # Temporary workspace directory

# Inference Configuration
inference:
  input_size: [1536, 1536] # Input image size [width, height] (default: 1536x1536)

# Optimization Configuration
optimization:
  gradient_checkpointing: false # Enable gradient checkpointing (reduce memory usage, slightly decrease inference speed)
  checkpoint_segments: 3 # Gradient checkpointing segments (not used yet)

# Cache Configuration
cache:
  enabled: true # Enable inference cache (default: enabled)
  size: 100 # Maximum cache entries (default: 100)

# Redis Cache Configuration
redis:
  enabled: false # Enable Redis cache (default: disabled)
  url: "redis://localhost:6379/0" # Redis connection URL
  prefix: "mlsharp" # Cache key prefix

# Webhook Configuration
webhook:
  enabled: false # Enable Webhook notification (default: disabled)
  task_completed: "" # Task completed notification URL
  task_failed: "" # Task failed notification URL

# Monitoring Configuration
monitoring:
  enabled: true # Enable monitoring
  enable_gpu: true # Enable GPU monitoring
  metrics_path: "/metrics" # Prometheus metrics endpoint path

# Performance Configuration
performance:
  max_workers: 4 # Maximum worker threads
  max_concurrency: 10 # Maximum concurrency
  timeout_keep_alive: 30 # Keep-alive timeout(seconds)
  max_requests: 1000 # Maximum requests

# Performance Cache Configuration (auto-generated, no manual configuration needed)
performance_cache:
  last_test: null # Last test time (ISO 8601 format)
  best_config: null # Optimal configuration
  gpu: null # GPU information

JSON Format (config.json)

{
  "server": {
    "host": "127.0.0.1",
    "port": 8000
  },
  "mode": "auto",
  "browser": {
    "auto_open": true
  },
  "gpu": {
    "enable_amp": true,
    "enable_cudnn_benchmark": true,
    "enable_tf32": true
  },
  "logging": {
    "level": "INFO",
    "console": true,
    "file": false
  },
  "model": {
    "checkpoint": "model_assets/sharp_2572gikvuh.pt",
    "temp_dir": "temp_workspace"
  },
  "inference": {
    "input_size": [1536, 1536]
  },
  "optimization": {
    "gradient_checkpointing": false,
    "checkpoint_segments": 3
  },
  "cache": {
    "enabled": true,
    "size": 100
  },
  "redis": {
    "enabled": false,
    "url": "redis://localhost:6379/0",
    "prefix": "mlsharp"
  },
  "webhook": {
    "enabled": false,
    "task_completed": "",
    "task_failed": ""
  },
  "monitoring": {
    "enabled": true,
    "enable_gpu": true,
    "metrics_path": "/metrics"
  },
  "performance": {
    "max_workers": 4,
    "max_concurrency": 10,
    "timeout_keep_alive": 30,
    "max_requests": 1000
  }
}

Using Configuration Files

Basic Usage:

# Use YAML configuration file
python app.py --config config.yaml

# Use JSON configuration file
python app.py --config config.json

# Abbreviation
python app.py -c config.yaml

# Recommended: Use config folder to manage configuration files
python app.py --config config/performance.yaml
python app.py --config config/settings.json

Configuration File + Command Line Parameters:

# Command line parameters override corresponding settings in configuration file
python app.py --config config.yaml --port 8080 --mode gpu

Configuration File Auto-Creation/Update:

# If configuration file doesn't exist, auto-create with default configuration
# If configuration file exists, only update performance tuning cache, other configurations remain unchanged
python app.py --enable-auto-tune --config config/auto_tune.json

Parameter Priority

Command line parameters > Configuration file > Default values

For example:

# config.yaml sets port: 8000
# Command line parameter specifies --port 8080
# Final uses 8080
python app.py --config config.yaml --port 8080

Configuration Items

Configuration Item Description Optional Values
server.host Service host address IP address
server.port Service port 1-65535
mode Startup mode auto, gpu, cpu, nvidia, amd
browser.auto_open Auto-open browser true, false
gpu.enable_amp Enable mixed precision inference true, false
gpu.enable_cudnn_benchmark Enable cuDNN Benchmark true, false
gpu.enable_tf32 Enable TensorFloat32 true, false
logging.level Logging level DEBUG, INFO, WARNING, ERROR
logging.console Console output true, false
logging.file File output true, false
model.checkpoint Model weights path File path
model.temp_dir Temporary workspace directory Directory path
inference.input_size Input image size [width, height], default [1536, 1536]
monitoring.enabled Enable monitoring true, false
monitoring.enable_gpu Enable GPU monitoring true, false
monitoring.metrics_path Prometheus metrics endpoint path Path string
optimization.gradient_checkpointing Enable gradient checkpointing true, false
optimization.checkpoint_segments Gradient checkpointing segments Positive integer
performance.max_workers Maximum worker threads Positive integer
performance.max_concurrency Maximum concurrency Positive integer
performance.timeout_keep_alive Keep-alive timeout(seconds) Positive integer
performance.max_requests Maximum requests Positive integer
auto_tune.enabled Enable performance auto-tuning true, false
auto_tune.test_size Test image size [width, height]
auto_tune.warmup_runs Warm-up run count Positive integer
auto_tune.test_runs Test run count Positive integer
performance_cache.last_test Last test time ISO 8601 timestamp (auto-generated)
performance_cache.best_config Optimal configuration Configuration dictionary (auto-generated)
performance_cache.gpu GPU information GPU information (auto-generated)

Performance Auto-Tuning

Click to expand auto-tuning function details

MLSharp provides intelligent performance auto-tuning function that can automatically test and select optimal optimization configuration.

Tuning Features

  • Intelligent Benchmarking: Automatically test various optimization configuration combinations
  • Optimal Configuration Selection: Automatically select best configuration based on test results
  • GPU Adaptation: Automatically filter out unsupported configurations based on GPU capability
  • Quick Testing: Use small size to complete testing quickly (about 10 seconds)
  • Detailed Logging: Output complete test process and results
  • Performance Improvement: 30-50% performance improvement compared to non-optimized configuration
  • Result Caching: Automatically save test results to configuration file, valid for 7 days
  • Smart Skip: Automatically skip testing when detecting valid cache, speed up startup

Test Configurations

Auto-tuner will test following configuration combinations:

Configuration Description Applicable Scenario
Baseline Configuration No optimizations All GPUs
AMP Only Only enable mixed precision Compute capability ≥ 5.3
cuDNN Only Only enable cuDNN Benchmark NVIDIA, Compute capability ≥ 6.0
TF32 Only Only enable TensorFloat32 NVIDIA, Compute capability ≥ 8.0
AMP + cuDNN Mixed precision + cuDNN NVIDIA, Compute capability ≥ 6.0
AMP + TF32 Mixed precision + TF32 NVIDIA, Compute capability ≥ 8.0
All Optimizations Enable all optimizations High-end NVIDIA GPU

Enable Auto-Tuning

# Enable performance auto-tuning (using default configuration file config.yaml)
python app.py --enable-auto-tune

# Combining usage
python app.py --enable-auto-tune --mode gpu --input-size 1024 1024

# Specify configuration file (results will be saved to this file)
python app.py --enable-auto-tune --config config.yaml

# Use config folder to save configuration (recommended)
python app.py --enable-auto-tune --config config/performance.yaml

# If configuration file doesn't exist, auto-create with default configuration
python app.py --enable-auto-tune --config config/auto_tune.json

Note: If --config parameter is not specified, system automatically uses config.yaml in project root directory as default configuration file.

Caching Mechanism

Auto-tuning results are automatically saved to configuration file to avoid repeated testing:

  • Cache Validity: 7 days
  • Cache Condition: GPU model, vendor, compute capability must match
  • Auto Skip: Automatically skip testing when detecting valid cache
  • Auto Apply: Directly use cached optimal configuration
  • Auto Creation/Update: Auto-create configuration file if doesn't exist (with default configuration), only update performance tuning cache if exists
  • Directory Support: Auto-create configuration directory (such as config folder)

Log Output Example (when using cache):

[INFO] Found valid performance tuning cache (3 days ago)
============================================================
[INFO] Using cached performance configuration
============================================================
Configuration Name: All Optimizations
Description: Enable all optimizations

Log Output Example (when creating configuration file):

[INFO] Configuration file doesn't exist, auto-create new configuration file: config.yaml
[SUCCESS] Performance tuning results added to configuration file: config.yaml

Log Output Example (when updating existing configuration file):

[INFO] Configuration file exists, update performance tuning cache: config.yaml
[SUCCESS] Performance tuning results updated to configuration file: config.yaml

Configuration File Processing Description:

  • Configuration file exists: Only update performance_cache field, other configurations remain unchanged
  • Configuration file doesn't exist: Create new configuration file, containing complete default configuration

Configuration File Format

Tuning results saved in performance_cache field of configuration file:

# config.yaml
performance_cache:
  last_test: "2026-01-31T12:00:00+00:00"
  best_config:
    name: "All Optimizations"
    amp: true
    cudnn_benchmark: true
    tf32: true
    description: "Enable all optimizations"
  gpu:
    name: "NVIDIA GeForce RTX 4090"
    vendor: "NVIDIA"
    compute_capability: 89

Tuning Process

  1. Cache Check: Check if valid tuning cache exists in configuration file (within 7 days)
  2. Cache Hit: If cache is valid and GPU matches, directly use cached results
  3. Benchmark Testing: If cache invalid or expired, perform complete test
  4. Warm-up Phase: Run 2 warm-ups to stabilize performance
  5. Test Phase: Run 3 tests for each configuration
  6. Result Statistics: Calculate average inference time and throughput
  7. Optimal Selection: Select fastest configuration and apply
  8. Cache Save: Save optimal configuration to configuration file

Tuning Output Example

============================================================
[INFO] Performance Auto-Tuning
============================================================

Testing different optimization configurations...

Test Configuration: Baseline Configuration
  Description: No optimizations
  Run 1/3: 2.543 seconds
  Run 2/3: 2.512 seconds
  Run 3/3: 2.528 seconds
  Average Inference Time: 2.528 seconds

Test Configuration: AMP Only
  Description: Only enable mixed precision inference
  Run 1/3: 1.892 seconds
  Run 2/3: 1.876 seconds
  Run 3/3: 1.884 seconds
  Average Inference Time: 1.884 seconds

Test Configuration: All Optimizations
  Description: Enable all optimizations
  Run 1/3: 1.245 seconds
  Run 2/3: 1.238 seconds
  Run 3/3: 1.241 seconds
  Average Inference Time: 1.241 seconds

============================================================
[INFO] Tuning Results
============================================================
[SUCCESS] Optimal Configuration: All Optimizations
[INFO]   Description: Enable all optimizations
[INFO]   Average Inference Time: 1.241 seconds
[INFO]   Throughput: 0.81 FPS

[SUCCESS] Performance auto-tuning completed!
[INFO] Optimal configuration applied

Best Practices

  1. Initial Run: Recommended to enable auto-tuning on first run
  2. Hardware Changes: Re-run auto-tuning after changing GPU
  3. Driver Updates: Re-test after GPU driver updates
  4. Regular Tuning: Recommended to run auto-tuning monthly
  5. Cache Management: System automatically caches tuning results for 7 days, no manual management needed
  6. Configuration File: Recommended to use config/ folder to manage configuration files, such as config/performance.yaml
  7. Auto Creation/Update: Configuration file doesn't exist: auto-create (with default configuration), exists: only update performance tuning cache
  8. Clear Cache: To force re-testing, delete performance_cache field in configuration file or use new configuration file

Performance Optimization Suggestions

Click to expand performance optimization suggestions

GPU Mode Optimization

  1. Use Appropriate Image Size

    • Recommended: 512x512 - 1024x1024
    • Avoid exceeding 2048x2048
  2. Enable All Optimizations

    • AMP (mixed precision) enabled by default
    • cuDNN Benchmark enabled by default
    • TF32 enabled by default (Ampere architecture)
  3. Enable Gradient Checkpointing When VRAM Insufficient

    • Use --gradient-checkpointing parameter
    • Can reduce 30-50% VRAM usage
    • Speed slightly reduced 10-20% (acceptable)
  4. Close Other GPU Occupying Programs

    • Close browser hardware acceleration
    • Close other AI applications
    • Close games or graphics-intensive applications

CPU Mode Optimization

  1. Use Smaller Images

    • Recommended: 512x512 or smaller
  2. Reduce Concurrency

    • Modify max_workers in configuration
    • Recommended value: CPU core count / 2
  3. Use Faster Startup Script

    • Start_CPU_Fast.bat - Fast mode

System-Level Optimization

  1. Increase Virtual Memory

    • Set to 1.5-2 times physical memory
  2. Use SSD

    • Faster model loading and I/O operations
  3. Close Unnecessary Background Programs

  • Free up more system resources

Inference Cache

Click to expand inference cache details

MLSharp provides intelligent inference cache function that can significantly improve processing speed for repeated scenarios.

Cache Features

  • Smart Hashing: Generate unique cache key based on image content and focal length
  • LRU Elimination: Least recently used algorithm automatically eliminates old cache
  • Statistical Monitoring: Real-time cache hit rate, hit/miss count statistics
  • Thread Safety: Use lock mechanism to ensure multi-thread safety
  • Memory Management: Configurable cache size limit

Enable Cache

Cache function enabled by default, can be controlled via command line parameters or configuration file:

# Command line parameters
python app.py # Default cache enabled
python app.py --no-cache # Disable cache
python app.py --cache-size 200 # Set cache size to 200
# config.yaml
cache:
  enabled: true # Enable cache (default: true)
  size: 100 # Maximum cache entries (default: 100)

API Endpoints

Get Cache Statistics

curl http://127.0.0.1:8000/v1/cache

Return Example:

{
  "enabled": true,
  "size": 45,
  "max_size": 100,
  "hits": 120,
  "misses": 30,
  "hit_rate": 80.0
}

Clear Cache

curl -X POST http://127.0.0.1:8000/v1/cache/clear

Return Example:

{
  "status": "success",
  "message": "Cache cleared"
}

Performance Improvement

Cache function can significantly improve processing speed, especially in repeated scenarios:

Cache Hit Rate Speed Improvement Applicable Scenario
30% 30% Small amount of repeated images
50% 50% Medium repeated scenario
80% 80% Large amount of repeated images

Best Practices

  1. Adjust Cache Size Appropriately: Adjust cache size based on memory and actual needs
  2. Monitor Cache Hit Rate: Regularly check cache hit rate, evaluate cache effectiveness
  3. Clear Cache Regularly: If memory tight, clear cache regularly
  4. Disable Cache Scenario: When processing completely different images, can disable cache

Redis Distributed Cache

Click to expand Redis cache details

MLSharp supports Redis distributed cache for multi-instance deployment and persistent cache.

Redis Cache Features

  • Distributed Cache: Support multi-instance sharing cache
  • Persistence: Cache data persistence to Redis
  • TTL Support: Automatic expiration mechanism
  • Mixed Usage: Can be used with local cache simultaneously
  • High Performance: Based on Redis in-memory database

Enable Redis Cache

# Use Redis cache
python app.py --redis-url redis://localhost:6379/0

# Use Redis cache + Webhook
python app.py --redis-url redis://localhost:6379/0 --enable-webhook

Configuration File

# config.yaml
redis:
  enabled: true
  url: "redis://localhost:6379/0"
  prefix: "mlsharp"

Performance Comparison

Cache Type Hit Speed Distributed Support Persistence Applicable Scenario
Local Cache Fastest Single-instance deployment
Redis Cache Fast Multi-instance deployment

Best Practices

  1. Production Environment Recommended: Use Redis cache to support multi-instance deployment
  2. Local Development: Use local cache, no Redis service needed

Version History

Click to expand version history

Code Health Check and Fix 02.05.1914

  • Code Quality Improvement - Fixed unused ProcessPoolExecutor, optimized resource usage
  • Pydantic v2 Update - Updated to Pydantic v2 syntax, using @field_validator instead of @validator
  • Resource Management Optimization - Added cleanup() method, ensured GPU monitoring threads and Webhook clients close properly
  • Redis Connection Management - Added del method, automatically closes Redis connections
  • Test File Added - Added test_app.py, including core function tests
  • Test Script Update - Updated run_tests.bat and run_tests.ps1, supporting Windows and PowerShell
  • Test Coverage - Module import, configuration validation, GPU detection, monitoring metrics and other core functions
  • Test Results - All tests passed (4/4)
  • New Format - Adopted [Month].[Day].[HHMM] format (e.g., 02.05.1900)
  • Description - Month.Day.HourMinute (24-hour format)

Snapdragon GPU Adaptation 02.03.1851

  • Main Branch Removed Adreno GPU Support - Removed Snapdragon/Adreno series GPU support

GPU Memory Auto Reclamation 02.03.1851

  • Memory Information Query - Real-time GPU memory usage (total, used, available, usage rate)
  • Cache Cleanup - Automatically clear PyTorch reserved but unused memory
  • Force Garbage Collection - Complete garbage collection process (clear cache → sync GPU → Python GC → clear again)
  • Smart Memory Reclamation - Automatically clean when memory usage exceeds threshold (default 85%)
  • Auto Memory Monitoring - Background thread regularly checks and automatically clears memory (default every 30 seconds)
  • Command Line Parameters - Supports --enable-auto-gc, --auto-gc-interval, --auto-gc-threshold parameters
  • Configuration File Support - Configure memory reclamation strategy in config.yaml
  • Performance Optimization - Prevent memory leaks, improve system stability
  • Logging - Detailed memory cleanup logs for debugging

Snapdragon GPU Adaptation 01.31.1931

  • Adreno GPU Detection - Automatically detect Snapdragon/Adreno series GPU
  • Qualcomm Mode - Added --mode qualcomm startup mode
  • ONNX Runtime Support - Added ONNX Runtime + DirectML acceleration solution
  • Smart Fallback - Automatically use CPU mode when detecting Snapdragon GPU
  • Platform Support - Windows/Android platform identification
  • Documentation Update - Added Snapdragon GPU support instructions and limitations

Future Improvements

Click to expand future improvements plan

Completed

  • Unit Testing: Added unit tests for each class
  • Configuration Files: Support loading configuration from files
  • Logging System: Use professional logging library (e.g., loguru)
  • Asynchronous Optimization: Further optimize asynchronous processing

To Improve

High Priority

  1. Authentication Authorization - Add user authentication
    • API Key authentication
    • JWT Token support
    • Rate limiting

Medium Priority

  1. Task Queue - Asynchronous task processing

    • Redis queue support
    • Task status tracking
    • Batch processing support
  2. Batch Processing API - Batch image processing

    • Multiple file uploads
    • Batch prediction
    • Result packaging and download

Low Priority

  1. Internationalization - Multi-language support ✅ Completed

    • i18n support ✅
    • Chinese and English interface ✅
    • Expandable language packs ✅
    • Configuration file support ✅
  2. Plugin System - Extensible architecture

    • Custom plugins
    • Model plugins
    • Post-processing plugins
  3. Batch Processing API - Batch image processing

    • Multiple file uploads
    • Batch prediction
    • Result packaging and download

Contribution

Welcome to submit Issues and Pull Requests!


📚 Related Documentation


Contact


Version Number Naming Rule

This project adopts the [Month].[Day].[HHMM] format version number naming rule

If this project is helpful to you, please give a ⭐️ Star! Modded with ❤️ by Chidc with CPU-Mode-Provider GemosDoDo

README.md Version Code 02.28.1500