_____                    _____            _____                    _____                    _____                    _____                    _____          
         /\    \                  /\    \          /\    \                  /\    \                  /\    \                  /\    \                  /\    \         
        /::\____\                /::\____\        /::\    \                /::\____\                /::\    \                /::\    \                /::\    \        
       /::::|   |               /:::/    /       /::::\    \              /:::/    /               /::::\    \              /::::\    \              /::::\    \       
      /:::::|   |              /:::/    /       /::::::\    \            /:::/    /               /::::::\    \            /::::::\    \            /::::::\    \      
     /::::::|   |             /:::/    /       /:::/\:::\    \          /:::/    /               /:::/\:::\    \          /:::/\:::\    \          /:::/\:::\    \     
    /:::/|::|   |            /:::/    /       /:::/__\:::\    \        /:::/____/               /:::/__\:::\    \        /:::/__\:::\    \        /:::/__\:::\    \    
   /:::/ |::|   |           /:::/    /        \:::\   \:::\    \      /::::\    \              /::::\   \:::\    \      /::::\   \:::\    \      /::::\   \:::\    \   
  /:::/  |::|___|______    /:::/    /       ___\:::\   \:::\    \    /::::::\    \   _____    /::::::\   \:::\    \    /::::::\   \:::\    \    /::::::\   \:::\    \  
 /:::/   |::::::::\    \  /:::/    /       /\   \:::\   \:::\    \  /:::/\:::\    \ /\    \  /:::/\:::\   \:::\    \  /:::/\:::\   \:::\____\  /:::/\:::\   \:::\____\ 
/:::/    |:::::::::\____\/:::/____/       /::\   \:::\   \:::\____\/:::/  \:::\    /::\____\/:::/  \:::\   \:::\____\/:::/  \:::\   \:::|    |/:::/  \:::\   \:::|    |
\::/    / ~~~~~/:::/    /\:::\    \       \:::\   \:::\   \::/    /\::/    \:::\  /:::/    /\::/    \:::\  /:::/    /\::/   |::::\  /:::|____|\::/    \:::\  /:::|____|
 \/____/      /:::/    /  \:::\    \       \:::\   \:::\   \/____/  \/____/ \:::\/:::/    /  \/____/ \:::\/:::/    /  \/____|:::::\/:::/    /  \/_____/\:::\/:::/    /  
             /:::/    /    \:::\    \       \:::\   \:::\    \               \::::::/    /            \::::::/    /         |:::::::::/    /            \::::::/    /   
            /:::/    /      \:::\    \       \:::\   \:::\____\               \::::/    /              \::::/    /          |::|\::::/    /              \::::/    /    
           /:::/    /        \:::\    \       \:::\  /:::/    /               /:::/    /               /:::/    /           |::| \::/____/                \::/____/     
          /:::/    /          \:::\    \       \:::\/:::/    /               /:::/    /               /:::/    /            |::|  ~|                       ~~           
         /:::/    /            \:::\    \       \::::::/    /               /:::/    /               /:::/    /             |::|   |                                   
        /:::/    /              \:::\____\       \::::/    /               /:::/    /               /:::/    /              \::|   |                                   
        \::/    /                \::/    /        \::/    /                \::/    /                \::/    /                \:|   |                                   
         \/____/                  \/____/          \/____/                  \/____/                  \/____/                  \|___|

MLSharp 3D Maker

中文文档 | English Documentation

Instructions

Project Overview

MLSharp-3D-Maker is a 3D Gaussian Splatting generation tool based on Apple ml-sharp model that can generate high-quality 3D models from a single photo.

Project Completion

Module	Status	Completion	Description
Core Function	Completed	100%	Image to 3D model conversion
GPU Acceleration	Completed	100%	NVIDIA/AMD/Intel Support
Configuration Management	Completed	100%	Command line + Configuration file
Logging System	Completed	100%	loguru Professional Logging + Color Output + Detailed Context
Asynchronous Processing	Completed	100%	ProcessPoolExecutor
Unit Testing	Completed	95%	Core class testing + Stability testing
API Interface	Completed	100%	Prediction + Health Check + Cache Management
Monitoring Metrics	Completed	95%	Prometheus Integration + Performance Monitoring + Stability Improvements
Inference Cache	Completed	100%	LRU Cache + Redis Distributed Cache
Performance Auto-Tuning	Completed	100%	Intelligent Benchmarking + Optimal Configuration Selection
Webhook	Completed	100%	Asynchronous Notification + Event Management + Error Recovery
Documentation	Completed	100%	README + Configuration Examples + API Documentation
API Documentation	Completed	100%	Swagger/OpenAPI + Version Control
Authentication Authorization	To Develop	0%	API Key/JWT
GPU Memory Reclamation	Completed	100%	Automatic Garbage Collection + Smart Memory Management + Monitoring
Stability Improvements	Completed	100%	Exception Handling + Resource Management + File Operation Stability
Error Handling	Completed	100%	Comprehensive Exception Capture + Graceful Degradation + Detailed Logging
Multi-language Support	Completed	100%	Chinese and English interface support + Configuration file support

Overall Completion: 100%+0%

Project Structure and Updates

MLSharp-3D-Maker-GPU-by-Chidc/
├── app.py # Main application (refactored version) ⭐
├── config/ # Configuration file directory (recommended to use)
│ ├── config.yaml # YAML format configuration file
│ └── config.json # JSON format configuration file
├── gpu_utils.py # GPU tools module
├── logger.py # Logging module
├── metrics.py # Monitoring metrics module ⭐
├── test_gpu_gc.py # GPU memory reclamation test script ⭐
├── demo_gpu_gc.py # GPU memory reclamation demo script ⭐
├── GPU_MEMORY_GC_README.md # GPU memory reclamation function documentation ⭐
├── optimistic.md # Performance optimization solution documentation ⭐
├── Start.bat # Windows startup script
├── Start.ps1 # PowerShell startup script
├── model_assets/ # Model files and resources
│ ├── sharp_2572gikvuh.pt # ml-sharp model weights
│ ├── inputs/ # Input examples
│ └── outputs/ # Output examples
├── python_env/ # Python environment
├── logs/ # Log files folder
├── tmp/ # Temporary files and backups
│ └── 1.28/ # 2026-01-28 Backup
└── temp_workspace/ # Temporary workspace

Click to expand latest update details

Latest Update (2026-02-28)

Optimize the Code, Fix Errors, Complete the Multilingual Module 2.28.1500

Code Robustness Significantly Improved
- Fixed CLIArgs missing no_cache field issue
- Fixed Logger method duplicate definition issue
- Fixed Pydantic v2 deprecated parameters (min_items → min_length)
- Fixed metrics.py thread safety issue (using threading.Event)
- Fixed app.py GPUManager thread safety issue
- Fixed traceback.format_exc() non-exception context call issue
Security Enhancements
- Added RestrictedUnpickler to prevent pickle deserialization attacks
- Added file upload type validation (magic number check)
- Added path traversal attack protection
- Added request size limit
- Added sensitive information leak protection
- Added configuration file path validation
Stability Improvements
- Fixed gpu_utils.py silent exception issue
- Fixed file handle leak issue
- Fixed monitoring middleware race condition
- Added logger.py file handle close mechanism
Multi-language Support Improvements
- Fixed all hardcoded Chinese strings
- Complete Chinese and English translation support
- Added translation key missing warning feature
- CLI parameter help text internationalization
- API error message internationalization
- Startup banner and log message internationalization

Logging and Error Handling Enhancement 02.20.1425

Logging Style Enhancement - Added color output, icons and detailed context information (filename, function name, line number)
Diversified Logging Methods - Added new methods like styled_section, progress_info, performance, gpu_info, cache_info
Error Handling Enhancement - Improved all empty except: clauses, replaced with specific exception handling
File Operation Stability - Enhanced error handling and recovery mechanisms for file saving, loading, and renaming operations
Model Loading Protection - Added error recovery for model loading, image processing, PLY saving and other critical operations
Webhook Resilience - Improved error handling for Webhook notifications, failure doesn't affect main flow
Cache Operation Protection - Added error handling for Redis and local cache operations with graceful fallback
VRAM Management Optimization - Improved handling of VRAM shortage errors with more user-friendly error messages and solutions

Stability Enhancement and Bug Fixes 02.20.1200

Input Validation Fix - Fixed issue with validate_input_size function calling logging methods before logging system initialization
File Operation Improvement - Added exception handling for PLY file renaming operation, improved file operation fault tolerance
Directory Cleanup Optimization - Improved exception handling for temporary directory cleanup operations, provided better error information
Logging System Improvement - Unified logging method for GPU monitoring loop, maintained logging format consistency
Startup Script Fix - Removed hardcoded IP address in Start.ps1, improved portability
Resource Management Optimization - Improved temporary file and resource cleanup mechanisms, prevented resource leaks
Test Coverage - Added stability test cases to ensure stability of key functions

Quick Start

Recommended Startup Method

Smart Run (Recommended for beginners):

Double-click Start.ps1

Features:

Automatic Detection: GPU type (NVIDIA/AMD/Intel), environment configuration, dependencies
Smart Recommendation: Automatically recommend best startup script based on graphics card
Comprehensive Diagnostics: 100+ error handling, intelligent problem identification
Solutions: Each error provides detailed solution suggestions
Log Recording: All run logs saved in logs/ folder
Color Output: Clear visual feedback, easy to read

Using Command Line Parameters (Advanced Users):

# Auto-detect mode (default)
python app.py

# Force GPU mode
python app.py --mode gpu

# Force CPU mode
python app.py --mode cpu

# Custom port
python app.py --port 8080

# Don't auto-open browser
python app.py --no-browser

Access Address

After startup, visit: http://127.0.0.1:8000

Dependency Installation

Basic Dependencies

pip install -r requirements.txt

Command Line Parameters

Click to expand command line parameters details

Basic Parameters

Parameter	Abbreviation	Type	Default Value	Description
`--mode`	`-m`	string	`auto`	Startup mode
`--port`	`-p`	int	`8000`	Web service port
`--host`		string	`127.0.0.1`	Web service host address
`--input-size`		int[]	`[1536, 1536]`	Input image size [width, height]
`--no-browser`		flag	false	Don't auto-open browser
`--no-amp`		flag	false	Disable mixed precision inference (AMP)
`--no-cudnn-benchmark`		flag	false	Disable cuDNN Benchmark
`--config`	`-c`	string	-	Configuration file path (supports YAML and JSON)
`--enable-cache`		flag	true	Enable inference cache (default: enabled)
`--no-cache`		flag	false	Disable inference cache
`--cache-size`		int	`100`	Maximum cache entries
`--clear-cache`		flag	false	Clear cache on startup
`--enable-auto-tune`		flag	false	Enable performance auto-tuning
`--redis-url`		string	-	Redis connection URL (distributed cache)
`--enable-webhook`		flag	false	Enable Webhook asynchronous notification
`--enable-auto-gc`		flag	true	Enable GPU auto garbage collection (default: enabled)
`--no-auto-gc`		flag	false	Disable GPU auto garbage collection
`--auto-gc-interval`		int	`30`	GPU auto garbage collection check interval (seconds)
`--auto-gc-threshold`		float	`85.0`	GPU memory usage threshold, auto clean when exceeded (percentage)
`--enable-smart-reclaim`		flag	true	Enable smart memory reclamation (default: enabled)
`--no-smart-reclaim`		flag	false	Disable smart memory reclamation

Startup Modes (--mode)

Mode	Description
`auto`	Auto-detect and select best mode (default)
`gpu`	Force GPU mode (auto-detect vendor)
`cpu`	Force CPU mode
`nvidia`	Force NVIDIA GPU mode
`amd`	Force AMD GPU mode (ROCm)

Input Size (--input-size)

Set input image size for inference. Default is 1536x1536, which is the size used during model training.

Usage example:

# Use default size 1536x1536
python app.py

# Use custom size 1024x1024
python app.py --input-size 1024 1024

# Use 768x768 for quick testing
python app.py --input-size 768 768

Constraints:

Input size must be divisible by 64 (model encoder uses patch-based splitting)
Width and height must be equal (model uses square input)
Maximum supported size is 1536x1536 (SPN encoder has patch splitting errors with larger sizes)
If provided size doesn't meet requirements, program will automatically adjust to closest valid size

Automatic Adjustment Example:

# 1000x1000 → Automatically adjusted to 1024x1024
python app.py --input-size 1000 1000

# 1200x800 → Automatically adjusted to 1200x1200 (maintaining square)
python app.py --input-size 1200 800

Recommended Sizes:

Size	Purpose	Memory Requirement	Output Quality
512x512	Quick testing	Low	Basic
768x768	Balanced mode	Medium	Good
1024x1024	Standard mode	Medium	Excellent
1536x1536	High quality (default/maximum)	High	Best

Note: Maximum supported size is 1536x1536, exceeding this will cause patch splitting errors in SPN encoder.

Notes:

Larger input sizes improve model output quality but require more memory and computing time
Smaller input sizes can speed up inference and reduce memory usage but may lower output quality
Recommended range: 512x512 to 1536x1536
Maximum supported size is 1536x1536, exceeding this causes patch splitting errors
If memory insufficient, use smaller sizes
If using non-standard sizes, program will auto-adjust and display warning

Usage Examples

# Basic use
python app.py
python app.py --mode gpu
python app.py --mode cpu

# Specify GPU vendor
python app.py --mode nvidia
python app.py --mode amd

# Custom port and host
python app.py --port 8080
python app.py --host 0.0.0.0 --port 8000

# Custom input size
python app.py --input-size 1024 1024
python app.py --input-size 768 768

# Disable optimization options (for debugging)
python app.py --no-browser
python app.py --no-amp
python app.py --no-cudnn-benchmark

# Enable gradient checkpointing (reduce memory usage)
python app.py --gradient-checkpointing

# Cache management (enabled by default)
python app.py # Default cache enabled
python app.py --no-cache # Disable cache
python app.py --cache-size 200 # Set cache size to 200
python app.py --clear-cache # Clear cache on startup

# Performance auto-tuning (advanced feature)
python app.py --enable-auto-tune # Auto-test and select optimal optimization configuration on startup

# Combining usage
python app.py --mode nvidia --port 8080 --no-browser --input-size 1024 1024
python app.py --gradient-checkpointing --input-size 1536 1536
python app.py --cache-size 200 --mode gpu
python app.py --clear-cache --mode gpu

# Using configuration file
python app.py --config config.yaml
python app.py --config config.json
python app.py -c config.yaml

# Configuration file + Command line parameters (command line parameters take priority)
python app.py --config config.yaml --port 8080 --input-size 1024 1024

# Multi-language support
python app.py --lang zh  # Chinese interface (default)
python app.py --lang en  # English interface

Get Help

python app.py --help
python app.py -h

GPU Support Status

Click to expand GPU support details

NVIDIA GPU

Architecture	Graphics Series	Compute Capability	Support Status	Optimization
Ampere	RTX 30/40 Series	8.0+	Full Support	AMP, TF32, cuDNN
Turing	RTX 20 Series	7.5	Full Support	AMP, cuDNN
Pascal	GTX 10/16 Series	6.1	Full Support	AMP, cuDNN
Maxwell	GTX 9xx Series	5.2	Support	AMP
Kepler	GTX 7xx Series	3.0-3.7	⚠️ Old GPU	Basic
Fermi	GTX 6xx Series	2.1	❌ Not Recommended	-

AMD GPU

Architecture	Graphics Series	ROCm Support	Support Status
RDNA 2	RX 6000 Series	Full Support	Full Support
RDNA 1	RX 5000 Series	Full Support	Full Support
GCN 5	Vega Series	Full Support	Support
GCN 4	RX 400/500 Series	⚠️	⚠️ Partial Support
GCN 3	RX 300 Series	❌	❌ No Support

Intel GPU

Architecture	Graphics Series	Support Status
Xe	Arc Series	⚠️ CPU Mode Only
Iris Xe	Integrated Graphics	⚠️ CPU Mode Only
UHD	Integrated Graphics	⚠️ CPU Mode Only

Logging System

Click to expand logging system details

Logging Features

MLSharp uses Loguru as the logging system, providing professional logging management:

Structured Logging: Includes timestamp, logging level, source information
Color Output: Console color display, easy to distinguish different levels
File Logging: Automatically saved to logs/ directory
Log Rotation: Automatic rotation and compression of log files (10MB rotation, keep 7 days)
Error Tracking: Complete error stack trace and diagnostic information
Multi-Level: DEBUG, INFO, WARNING, ERROR, CRITICAL

Log Files

Log files saved in logs/ directory:

File naming: mlsharp_YYYYMMDD.log
Compressed files: mlsharp_YYYYMMDD.log.zip
Retention time: 7 days

Logging Levels

Level	Purpose	Example
DEBUG	Debug information	Variable values, function calls
INFO	General information	Startup information, processing progress
WARNING	Warning information	Performance warnings, compatibility issues
ERROR	Error information	Processing failures, exceptions
CRITICAL	Serious errors	System crashes, fatal errors

Log Output Example

2026-01-28 20:00:00 | INFO | MLSharp:run:10 - Service started
2026-01-28 20:00:01 | SUCCESS | MLSharp:load_model:50 - Model loaded successfully
2026-01-28 20:00:02 | WARNING | MLSharp:detect_gpu:30 - Less than 4GB VRAM
2026-01-28 20:00:03 | ERROR | MLSharp:predict:100 | Processing failed: Out of memory

View Logs

# View today's logs
type logs\mlsharp_20260128.log

# View all log files
dir logs\

# View error logs
findstr /C:"ERROR" logs\mlsharp_*.log

Configuration File Usage

Click to expand configuration file usage details

Configuration File Format

Supports both YAML and JSON format configuration files.

Default Configuration File: If --config parameter is not specified, system automatically uses config.yaml in project root directory as default configuration file.

YAML Format (config.yaml)

# MLSharp-3D-Maker Configuration File
# Supported Format: YAML

# Service Configuration
server:
  host: "127.0.0.1" # Service host address
  port: 8000 # Service port

# Startup Mode
mode: "auto" # Startup mode: auto, gpu, cpu, nvidia, amd

# Language Configuration
language: "zh"             # Interface language: zh(Chinese), en(English)

# Browser Configuration
browser:
  auto_open: true # Auto-open browser

# GPU Optimization Configuration
gpu:
  enable_amp: true # Enable mixed precision inference (AMP)
  enable_cudnn_benchmark: true # Enable cuDNN Benchmark
  enable_tf32: true # Enable TensorFloat32

# Logging Configuration
logging:
  level: "INFO" # Logging level: DEBUG, INFO, WARNING, ERROR
  console: true # Console output
  file: false # File output

# Model Configuration
model:
  checkpoint: "model_assets/sharp_2572gikvuh.pt" # Model weights path
  temp_dir: "temp_workspace" # Temporary workspace directory

# Inference Configuration
inference:
  input_size: [1536, 1536] # Input image size [width, height] (default: 1536x1536)

# Optimization Configuration
optimization:
  gradient_checkpointing: false # Enable gradient checkpointing (reduce memory usage, slightly decrease inference speed)
  checkpoint_segments: 3 # Gradient checkpointing segments (not used yet)

# Cache Configuration
cache:
  enabled: true # Enable inference cache (default: enabled)
  size: 100 # Maximum cache entries (default: 100)

# Redis Cache Configuration
redis:
  enabled: false # Enable Redis cache (default: disabled)
  url: "redis://localhost:6379/0" # Redis connection URL
  prefix: "mlsharp" # Cache key prefix

# Webhook Configuration
webhook:
  enabled: false # Enable Webhook notification (default: disabled)
  task_completed: "" # Task completed notification URL
  task_failed: "" # Task failed notification URL

# Monitoring Configuration
monitoring:
  enabled: true # Enable monitoring
  enable_gpu: true # Enable GPU monitoring
  metrics_path: "/metrics" # Prometheus metrics endpoint path

# Performance Configuration
performance:
  max_workers: 4 # Maximum worker threads
  max_concurrency: 10 # Maximum concurrency
  timeout_keep_alive: 30 # Keep-alive timeout(seconds)
  max_requests: 1000 # Maximum requests

# Performance Cache Configuration (auto-generated, no manual configuration needed)
performance_cache:
  last_test: null # Last test time (ISO 8601 format)
  best_config: null # Optimal configuration
  gpu: null # GPU information

JSON Format (config.json)

{
  "server": {
    "host": "127.0.0.1",
    "port": 8000
  },
  "mode": "auto",
  "browser": {
    "auto_open": true
  },
  "gpu": {
    "enable_amp": true,
    "enable_cudnn_benchmark": true,
    "enable_tf32": true
  },
  "logging": {
    "level": "INFO",
    "console": true,
    "file": false
  },
  "model": {
    "checkpoint": "model_assets/sharp_2572gikvuh.pt",
    "temp_dir": "temp_workspace"
  },
  "inference": {
    "input_size": [1536, 1536]
  },
  "optimization": {
    "gradient_checkpointing": false,
    "checkpoint_segments": 3
  },
  "cache": {
    "enabled": true,
    "size": 100
  },
  "redis": {
    "enabled": false,
    "url": "redis://localhost:6379/0",
    "prefix": "mlsharp"
  },
  "webhook": {
    "enabled": false,
    "task_completed": "",
    "task_failed": ""
  },
  "monitoring": {
    "enabled": true,
    "enable_gpu": true,
    "metrics_path": "/metrics"
  },
  "performance": {
    "max_workers": 4,
    "max_concurrency": 10,
    "timeout_keep_alive": 30,
    "max_requests": 1000
  }
}

Using Configuration Files

Basic Usage:

# Use YAML configuration file
python app.py --config config.yaml

# Use JSON configuration file
python app.py --config config.json

# Abbreviation
python app.py -c config.yaml

# Recommended: Use config folder to manage configuration files
python app.py --config config/performance.yaml
python app.py --config config/settings.json

Configuration File + Command Line Parameters:

# Command line parameters override corresponding settings in configuration file
python app.py --config config.yaml --port 8080 --mode gpu

Configuration File Auto-Creation/Update:

# If configuration file doesn't exist, auto-create with default configuration
# If configuration file exists, only update performance tuning cache, other configurations remain unchanged
python app.py --enable-auto-tune --config config/auto_tune.json

Parameter Priority

Command line parameters > Configuration file > Default values

For example:

# config.yaml sets port: 8000
# Command line parameter specifies --port 8080
# Final uses 8080
python app.py --config config.yaml --port 8080

Configuration Items

Configuration Item	Description	Optional Values
`server.host`	Service host address	IP address
`server.port`	Service port	1-65535
`mode`	Startup mode	auto, gpu, cpu, nvidia, amd
`browser.auto_open`	Auto-open browser	true, false
`gpu.enable_amp`	Enable mixed precision inference	true, false
`gpu.enable_cudnn_benchmark`	Enable cuDNN Benchmark	true, false
`gpu.enable_tf32`	Enable TensorFloat32	true, false
`logging.level`	Logging level	DEBUG, INFO, WARNING, ERROR
`logging.console`	Console output	true, false
`logging.file`	File output	true, false
`model.checkpoint`	Model weights path	File path
`model.temp_dir`	Temporary workspace directory	Directory path
`inference.input_size`	Input image size	[width, height], default [1536, 1536]
`monitoring.enabled`	Enable monitoring	true, false
`monitoring.enable_gpu`	Enable GPU monitoring	true, false
`monitoring.metrics_path`	Prometheus metrics endpoint path	Path string
`optimization.gradient_checkpointing`	Enable gradient checkpointing	true, false
`optimization.checkpoint_segments`	Gradient checkpointing segments	Positive integer
`performance.max_workers`	Maximum worker threads	Positive integer
`performance.max_concurrency`	Maximum concurrency	Positive integer
`performance.timeout_keep_alive`	Keep-alive timeout(seconds)	Positive integer
`performance.max_requests`	Maximum requests	Positive integer
`auto_tune.enabled`	Enable performance auto-tuning	true, false
`auto_tune.test_size`	Test image size	[width, height]
`auto_tune.warmup_runs`	Warm-up run count	Positive integer
`auto_tune.test_runs`	Test run count	Positive integer
`performance_cache.last_test`	Last test time	ISO 8601 timestamp (auto-generated)
`performance_cache.best_config`	Optimal configuration	Configuration dictionary (auto-generated)
`performance_cache.gpu`	GPU information	GPU information (auto-generated)

Performance Auto-Tuning

Click to expand auto-tuning function details

MLSharp provides intelligent performance auto-tuning function that can automatically test and select optimal optimization configuration.

Tuning Features

Intelligent Benchmarking: Automatically test various optimization configuration combinations
Optimal Configuration Selection: Automatically select best configuration based on test results
GPU Adaptation: Automatically filter out unsupported configurations based on GPU capability
Quick Testing: Use small size to complete testing quickly (about 10 seconds)
Detailed Logging: Output complete test process and results
Performance Improvement: 30-50% performance improvement compared to non-optimized configuration
Result Caching: Automatically save test results to configuration file, valid for 7 days
Smart Skip: Automatically skip testing when detecting valid cache, speed up startup

Test Configurations

Auto-tuner will test following configuration combinations:

Configuration	Description	Applicable Scenario
Baseline Configuration	No optimizations	All GPUs
AMP Only	Only enable mixed precision	Compute capability ≥ 5.3
cuDNN Only	Only enable cuDNN Benchmark	NVIDIA, Compute capability ≥ 6.0
TF32 Only	Only enable TensorFloat32	NVIDIA, Compute capability ≥ 8.0
AMP + cuDNN	Mixed precision + cuDNN	NVIDIA, Compute capability ≥ 6.0
AMP + TF32	Mixed precision + TF32	NVIDIA, Compute capability ≥ 8.0
All Optimizations	Enable all optimizations	High-end NVIDIA GPU

Enable Auto-Tuning

# Enable performance auto-tuning (using default configuration file config.yaml)
python app.py --enable-auto-tune

# Combining usage
python app.py --enable-auto-tune --mode gpu --input-size 1024 1024

# Specify configuration file (results will be saved to this file)
python app.py --enable-auto-tune --config config.yaml

# Use config folder to save configuration (recommended)
python app.py --enable-auto-tune --config config/performance.yaml

# If configuration file doesn't exist, auto-create with default configuration
python app.py --enable-auto-tune --config config/auto_tune.json

Note: If --config parameter is not specified, system automatically uses config.yaml in project root directory as default configuration file.

Caching Mechanism

Auto-tuning results are automatically saved to configuration file to avoid repeated testing:

Cache Validity: 7 days
Cache Condition: GPU model, vendor, compute capability must match
Auto Skip: Automatically skip testing when detecting valid cache
Auto Apply: Directly use cached optimal configuration
Auto Creation/Update: Auto-create configuration file if doesn't exist (with default configuration), only update performance tuning cache if exists
Directory Support: Auto-create configuration directory (such as config folder)

Log Output Example (when using cache):

[INFO] Found valid performance tuning cache (3 days ago)
============================================================
[INFO] Using cached performance configuration
============================================================
Configuration Name: All Optimizations
Description: Enable all optimizations

Log Output Example (when creating configuration file):

[INFO] Configuration file doesn't exist, auto-create new configuration file: config.yaml
[SUCCESS] Performance tuning results added to configuration file: config.yaml

Log Output Example (when updating existing configuration file):

[INFO] Configuration file exists, update performance tuning cache: config.yaml
[SUCCESS] Performance tuning results updated to configuration file: config.yaml

Configuration File Processing Description:

Configuration file exists: Only update performance_cache field, other configurations remain unchanged
Configuration file doesn't exist: Create new configuration file, containing complete default configuration

Configuration File Format

Tuning results saved in performance_cache field of configuration file:

# config.yaml
performance_cache:
  last_test: "2026-01-31T12:00:00+00:00"
  best_config:
    name: "All Optimizations"
    amp: true
    cudnn_benchmark: true
    tf32: true
    description: "Enable all optimizations"
  gpu:
    name: "NVIDIA GeForce RTX 4090"
    vendor: "NVIDIA"
    compute_capability: 89

Tuning Process

Cache Check: Check if valid tuning cache exists in configuration file (within 7 days)
Cache Hit: If cache is valid and GPU matches, directly use cached results
Benchmark Testing: If cache invalid or expired, perform complete test
Warm-up Phase: Run 2 warm-ups to stabilize performance
Test Phase: Run 3 tests for each configuration
Result Statistics: Calculate average inference time and throughput
Optimal Selection: Select fastest configuration and apply
Cache Save: Save optimal configuration to configuration file

Tuning Output Example

============================================================
[INFO] Performance Auto-Tuning
============================================================

Testing different optimization configurations...

Test Configuration: Baseline Configuration
  Description: No optimizations
  Run 1/3: 2.543 seconds
  Run 2/3: 2.512 seconds
  Run 3/3: 2.528 seconds
  Average Inference Time: 2.528 seconds

Test Configuration: AMP Only
  Description: Only enable mixed precision inference
  Run 1/3: 1.892 seconds
  Run 2/3: 1.876 seconds
  Run 3/3: 1.884 seconds
  Average Inference Time: 1.884 seconds

Test Configuration: All Optimizations
  Description: Enable all optimizations
  Run 1/3: 1.245 seconds
  Run 2/3: 1.238 seconds
  Run 3/3: 1.241 seconds
  Average Inference Time: 1.241 seconds

============================================================
[INFO] Tuning Results
============================================================
[SUCCESS] Optimal Configuration: All Optimizations
[INFO]   Description: Enable all optimizations
[INFO]   Average Inference Time: 1.241 seconds
[INFO]   Throughput: 0.81 FPS

[SUCCESS] Performance auto-tuning completed!
[INFO] Optimal configuration applied

Best Practices

Initial Run: Recommended to enable auto-tuning on first run
Hardware Changes: Re-run auto-tuning after changing GPU
Driver Updates: Re-test after GPU driver updates
Regular Tuning: Recommended to run auto-tuning monthly
Cache Management: System automatically caches tuning results for 7 days, no manual management needed
Configuration File: Recommended to use config/ folder to manage configuration files, such as config/performance.yaml
Auto Creation/Update: Configuration file doesn't exist: auto-create (with default configuration), exists: only update performance tuning cache
Clear Cache: To force re-testing, delete performance_cache field in configuration file or use new configuration file

Performance Optimization Suggestions

Click to expand performance optimization suggestions

GPU Mode Optimization

Use Appropriate Image Size
- Recommended: 512x512 - 1024x1024
- Avoid exceeding 2048x2048
Enable All Optimizations
- AMP (mixed precision) enabled by default
- cuDNN Benchmark enabled by default
- TF32 enabled by default (Ampere architecture)
Enable Gradient Checkpointing When VRAM Insufficient
- Use --gradient-checkpointing parameter
- Can reduce 30-50% VRAM usage
- Speed slightly reduced 10-20% (acceptable)
Close Other GPU Occupying Programs
- Close browser hardware acceleration
- Close other AI applications
- Close games or graphics-intensive applications

CPU Mode Optimization

Use Smaller Images
- Recommended: 512x512 or smaller
Reduce Concurrency
- Modify max_workers in configuration
- Recommended value: CPU core count / 2
Use Faster Startup Script
- Start_CPU_Fast.bat - Fast mode

System-Level Optimization

Increase Virtual Memory
- Set to 1.5-2 times physical memory
Use SSD
- Faster model loading and I/O operations
Close Unnecessary Background Programs

Free up more system resources

Inference Cache

Click to expand inference cache details

MLSharp provides intelligent inference cache function that can significantly improve processing speed for repeated scenarios.

Cache Features

Smart Hashing: Generate unique cache key based on image content and focal length
LRU Elimination: Least recently used algorithm automatically eliminates old cache
Statistical Monitoring: Real-time cache hit rate, hit/miss count statistics
Thread Safety: Use lock mechanism to ensure multi-thread safety
Memory Management: Configurable cache size limit

Enable Cache

Cache function enabled by default, can be controlled via command line parameters or configuration file:

# Command line parameters
python app.py # Default cache enabled
python app.py --no-cache # Disable cache
python app.py --cache-size 200 # Set cache size to 200

# config.yaml
cache:
  enabled: true # Enable cache (default: true)
  size: 100 # Maximum cache entries (default: 100)

API Endpoints

Get Cache Statistics

curl http://127.0.0.1:8000/v1/cache

Return Example:

{
  "enabled": true,
  "size": 45,
  "max_size": 100,
  "hits": 120,
  "misses": 30,
  "hit_rate": 80.0
}

Clear Cache

curl -X POST http://127.0.0.1:8000/v1/cache/clear

Return Example:

{
  "status": "success",
  "message": "Cache cleared"
}

Performance Improvement

Cache function can significantly improve processing speed, especially in repeated scenarios:

Cache Hit Rate	Speed Improvement	Applicable Scenario
30%	30%	Small amount of repeated images
50%	50%	Medium repeated scenario
80%	80%	Large amount of repeated images

Best Practices

Adjust Cache Size Appropriately: Adjust cache size based on memory and actual needs
Monitor Cache Hit Rate: Regularly check cache hit rate, evaluate cache effectiveness
Clear Cache Regularly: If memory tight, clear cache regularly
Disable Cache Scenario: When processing completely different images, can disable cache

Redis Distributed Cache

Click to expand Redis cache details

MLSharp supports Redis distributed cache for multi-instance deployment and persistent cache.

Redis Cache Features

Distributed Cache: Support multi-instance sharing cache
Persistence: Cache data persistence to Redis
TTL Support: Automatic expiration mechanism
Mixed Usage: Can be used with local cache simultaneously
High Performance: Based on Redis in-memory database

Enable Redis Cache

# Use Redis cache
python app.py --redis-url redis://localhost:6379/0

# Use Redis cache + Webhook
python app.py --redis-url redis://localhost:6379/0 --enable-webhook

Configuration File

# config.yaml
redis:
  enabled: true
  url: "redis://localhost:6379/0"
  prefix: "mlsharp"

Performance Comparison

Cache Type	Hit Speed	Distributed Support	Persistence	Applicable Scenario
Local Cache	Fastest	❌	❌	Single-instance deployment
Redis Cache	Fast	✅	✅	Multi-instance deployment

Best Practices

Production Environment Recommended: Use Redis cache to support multi-instance deployment
Local Development: Use local cache, no Redis service needed

Version History

Click to expand version history

Code Health Check and Fix 02.05.1914

Code Quality Improvement - Fixed unused ProcessPoolExecutor, optimized resource usage
Pydantic v2 Update - Updated to Pydantic v2 syntax, using @field_validator instead of @validator
Resource Management Optimization - Added cleanup() method, ensured GPU monitoring threads and Webhook clients close properly
Redis Connection Management - Added del method, automatically closes Redis connections
Test File Added - Added test_app.py, including core function tests
Test Script Update - Updated run_tests.bat and run_tests.ps1, supporting Windows and PowerShell
Test Coverage - Module import, configuration validation, GPU detection, monitoring metrics and other core functions
Test Results - All tests passed (4/4)
New Format - Adopted [Month].[Day].[HHMM] format (e.g., 02.05.1900)
Description - Month.Day.HourMinute (24-hour format)

Snapdragon GPU Adaptation 02.03.1851

Main Branch Removed Adreno GPU Support - Removed Snapdragon/Adreno series GPU support

GPU Memory Auto Reclamation 02.03.1851

Memory Information Query - Real-time GPU memory usage (total, used, available, usage rate)
Cache Cleanup - Automatically clear PyTorch reserved but unused memory
Force Garbage Collection - Complete garbage collection process (clear cache → sync GPU → Python GC → clear again)
Smart Memory Reclamation - Automatically clean when memory usage exceeds threshold (default 85%)
Auto Memory Monitoring - Background thread regularly checks and automatically clears memory (default every 30 seconds)
Command Line Parameters - Supports --enable-auto-gc, --auto-gc-interval, --auto-gc-threshold parameters
Configuration File Support - Configure memory reclamation strategy in config.yaml
Performance Optimization - Prevent memory leaks, improve system stability
Logging - Detailed memory cleanup logs for debugging

Snapdragon GPU Adaptation 01.31.1931

Adreno GPU Detection - Automatically detect Snapdragon/Adreno series GPU
Qualcomm Mode - Added --mode qualcomm startup mode
ONNX Runtime Support - Added ONNX Runtime + DirectML acceleration solution
Smart Fallback - Automatically use CPU mode when detecting Snapdragon GPU
Platform Support - Windows/Android platform identification
Documentation Update - Added Snapdragon GPU support instructions and limitations

Future Improvements

Click to expand future improvements plan

Completed

Unit Testing: Added unit tests for each class
Configuration Files: Support loading configuration from files
Logging System: Use professional logging library (e.g., loguru)
Asynchronous Optimization: Further optimize asynchronous processing

To Improve

High Priority

Authentication Authorization - Add user authentication
- API Key authentication
- JWT Token support
- Rate limiting

Medium Priority

Task Queue - Asynchronous task processing
- Redis queue support
- Task status tracking
- Batch processing support
Batch Processing API - Batch image processing
- Multiple file uploads
- Batch prediction
- Result packaging and download

Low Priority

Internationalization - Multi-language support ✅ Completed
- i18n support ✅
- Chinese and English interface ✅
- Expandable language packs ✅
- Configuration file support ✅
Plugin System - Extensible architecture
- Custom plugins
- Model plugins
- Post-processing plugins
Batch Processing API - Batch image processing
- Multiple file uploads
- Batch prediction
- Result packaging and download

Contribution

Welcome to submit Issues and Pull Requests!

📚 Related Documentation

Configuration File Example - YAML format configuration file
API Documentation - Swagger/OpenAPI auto-generated API documentation

Contact

Project Homepage: https://github.com/ChidcGithub/MLSharp-3D-Maker-GPU
Issue Feedback: Issues

Version Number Naming Rule

This project adopts the [Month].[Day].[HHMM] format version number naming rule

If this project is helpful to you, please give a ⭐️ Star! Modded with ❤️ by Chidc with CPU-Mode-Provider GemosDoDo

README.md Version Code 02.28.1500

FilesExpand file tree

README.EN.md

Latest commit

History

README.EN.md

File metadata and controls

MLSharp 3D Maker

Instructions

Project Overview

Project Completion

Project Structure and Updates

Latest Update (2026-02-28)

Quick Start

Recommended Startup Method

Smart Run (Recommended for beginners):

Using Command Line Parameters (Advanced Users):

Access Address

Dependency Installation

Basic Dependencies

Command Line Parameters

Basic Parameters

Startup Modes (--mode)

Input Size (--input-size)

Usage Examples

Get Help

GPU Support Status

NVIDIA GPU

AMD GPU

Intel GPU

Logging System

Logging Features

Log Files

Logging Levels

Log Output Example

View Logs

Configuration File Usage

Configuration File Format

YAML Format (config.yaml)

JSON Format (config.json)

Using Configuration Files

Parameter Priority

Configuration Items

Performance Auto-Tuning

MLSharp provides intelligent performance auto-tuning function that can automatically test and select optimal optimization configuration.

Tuning Features

Test Configurations

Enable Auto-Tuning

Caching Mechanism

Configuration File Format

Tuning Process

Tuning Output Example

Best Practices

Performance Optimization Suggestions

GPU Mode Optimization

CPU Mode Optimization

System-Level Optimization

Inference Cache

MLSharp provides intelligent inference cache function that can significantly improve processing speed for repeated scenarios.

Cache Features

Enable Cache

API Endpoints

Get Cache Statistics

Clear Cache

Performance Improvement

Best Practices

Redis Distributed Cache

MLSharp supports Redis distributed cache for multi-instance deployment and persistent cache.

Redis Cache Features

Enable Redis Cache

Configuration File

Performance Comparison

Best Practices

Version History

Future Improvements

Completed

To Improve

High Priority

Medium Priority

Low Priority

Contribution