Skip to content

Commit e48c12a

Browse files
authored
V3.8 (#81)
* UI Changes and new Encoder and Process options - Added buttons for encoder settings and processing options. - Implemented multi-language support and tooltips for new dialog boxes. - Adjusted preview image window size and video info layout to prevent window overflow. - 3D tab columns now stack correctly when resizing the window on smaller screens. * v3.8 3D Backend - Reworked Auto Crop Black Bars to use first-frame detection with cached crop reuse. - Prevents per-frame crop jitter and depth/frame misalignment. - Improves stability for cinema content with subtle letterboxing. - Keep Audio checkbox now respects the user-selected output container instead of forcing MP4. * v3.8 - Depth Pipeline ### Depth Models - Fixed ONNX model loading: - Distill-Any-Depth (inference resolution 518×518, batch size 8) - Video Depth Anything (inference resolution 512×288, batch size 8) - Implemented LBM depth model (development version). Thanks to Aether for the implementation fix. - Removed depth models from the dropdown that returned no `d_type`. - Fixed Hugging Face model downloads and caching so zoo models consistently save inside the app `weights/` directory (no more extra `.cache` downloads). - Updated Transformers image processor loading to prefer `use_fast=True` when available (with automatic fallback when unsupported). ### Depth Backend - Implemented temporal smoothing in the depth pipeline to reduce flicker and improve temporal stability of depth map output. - Packaged VisionDepth3D.exe with Distill-Any-Depth (ONNX), Video Depth Anything (ONNX), and Depth Anything v2 Giant weights. * v3.8 External 3D - Changed entry box to be easier to input - Console loading text changed * Change log level for xFormers availability warning * Change xFormers warning to debug log level * v3.8 Frametool Pipeline - changed processor output text to match rest of system - Added SSResNet Model support for future implementation * v3.8 Depth Blender - Standardized startup console messages to clearly reflect which subsystems are initializing (Torch, depth estimation, upscaler, external 3D pipeline, language, settings). * v3.8 Preview GUI - Changed size of frame to smaller size so Drop down doesnt get cropped to small * v3.8 * v3.8 * v3.8 * v3.8 * v3.8 * v3.8 New Language * Revise Changelog for VisionDepth3D v3.8 Updated changelog for version 3.8, detailing new features, improvements, and fixes across various tabs including Depth Estimation and 3D Render. Emphasized stability, model compatibility, and UI enhancements.
1 parent a92532d commit e48c12a

16 files changed

Lines changed: 2362 additions & 1537 deletions

File tree

Changelog.md

Lines changed: 39 additions & 175 deletions
Original file line numberDiff line numberDiff line change
@@ -1,204 +1,68 @@
1-
# VisionDepth3D v3.7 – Changelog
1+
# VisionDepth3D v3.8 – Changelog
22

33
---
44

5-
## 1) Live 3D Capture – Real-Time Overhaul
5+
## 1) Depth Estimation Tab
66

7-
### Audio Support Added
7+
### Depth Models
88

9-
* Added optional live audio passthrough monitor for external capture devices
10-
* New flags:
11-
* `--audio-device "<device name>"`
12-
* `--audio-delay-ms <value>`
13-
* Supports DirectShow and fallback to WASAPI
14-
* FFplay-based audio pipeline integrated for low latency monitoring
15-
* Audio sync delay control for VR headset streaming
9+
- Fixed ONNX model loading:
10+
- Distill-Any-Depth (inference resolution 518×518, batch size 8)
11+
- Video Depth Anything (inference resolution 512×288, batch size 8)
12+
- Implemented LBM depth model (development version). Thanks to Aether for the implementation fix.
13+
- Removed depth models from the dropdown that returned no `d_type`.
14+
- Fixed Hugging Face model downloads and caching so zoo models consistently save inside the app `weights/` directory (no more extra `.cache` downloads).
15+
- Updated Transformers image processor loading to prefer `use_fast=True` when available (with automatic fallback when unsupported).
1616

17-
### Color Channel Controls
17+
### Depth Backend
1818

19-
* Added `--pixelshift-rgb` to properly route RGB output from CUDA warp kernel
20-
* Added `--force-bgr-swap` fallback for cameras that provide BGR swapped frames
21-
* Fixed purple and red tint issues on some capture devices
22-
23-
### Performance and FPS Optimization
24-
25-
* Tuned live depth inference resolution defaults for smoother real-time processing
26-
* Recommended real-time profile: `infer-w 448`, `infer-h 256`, depth FPS ~ 8 to 12
27-
* Better GPU scheduling for live stereo warp when audio is active
28-
* More stable stream with reduced stutter on high resolution capture input (1080p HDMI sources)
29-
30-
### Network Streaming Prep (Experimental)
31-
32-
* Added headless running mode (`--no-preview`) to remove local display overhead
33-
* Prepared architecture for browser-based SBS VR streaming
34-
* In development: WebRTC path for synchronized audio + video to Quest browser
35-
36-
### General Stability
37-
38-
* Improved diagnostic output and runtime logs
39-
* Verified working flags for external capture cards (Elgato HD60 S+)
40-
* Ensured fallback modes for low resource conditions
41-
42-
---
43-
44-
## 2) Initial Bugfixes & Issues Resolved (Live 3D Cleanup)
45-
46-
### UI Settings Not Applying
47-
48-
* GUI sliders and dropdowns weren’t being respected when starting live capture
49-
* Fixed runtime sync so all selected GUI settings apply properly
50-
51-
### Manual CLI vs GUI Discrepancy
52-
53-
* Runtime ignored resolution, backend, FPS, and other settings when launched via GUI
54-
* Now fixed, all selected parameters propagate as expected
55-
56-
### Video Tint Correction
57-
58-
* Purple/red tint caused by incorrect channel routing
59-
* Fixed with `--pixelshift-rgb` and `--force-bgr-swap` fallbacks
60-
61-
### Capture Failures (Webcam/HDMI)
62-
63-
* Fixed `no frames arriving` error by enforcing `--fourcc MJPG` and proper MSMF backend handling
64-
65-
### No Audio Output
66-
67-
* Live mode initially had no sound output
68-
* Added full audio routing support for monitoring devices (DirectShow/WASAPI)
69-
70-
### FPS Performance Bottlenecks
71-
72-
* Live stereo + depth ran around 6.5–7 FPS at first
73-
* Now tuned with smaller inference size, better scheduling, and CUDA-side warp boost
74-
75-
---
76-
77-
## 3) Floating-Window and Stability Fixes
78-
79-
### Dynamic Floating Window (DFW) Logic
80-
81-
* Fixed undefined `zero_parallax_offset` references and long term stabilization drift.
82-
* Rebuilt the DFW into an asymmetric, side only window that masks a single edge (left or right) based on the dominant parallax direction.
83-
* Added a minimum parallax threshold so the window stays completely off when depth is very close to the screen plane.
84-
* Computes bar width as a blend of parallax magnitude and how far the tracked subject is from mid depth, then clamps it to a small fraction of the per eye width for a subtle mask.
85-
* Uses `FloatingWindowTracker` to smooth the horizontal offset over time and reduce small frame to frame jitters.
86-
* Uses `FloatingBarEaser` to ease the bar width in and out so it expands and collapses gradually instead of popping on and off.
87-
* Supports both soft faded masks and solid black cinema bars through a single toggle, with faded mode as the default for VR and monitor playback.
88-
* Result: a more invisible and cinema friendly floating window that reduces window violations, keeps edges clean, and stays out of the viewer’s way.
89-
90-
### Frame Jitter and Temporal Stability
91-
92-
* Fixed flickering, depth “breathing,” and jitter caused by unsmoothed convergence and subject tracking.
93-
* Applied `SubjectDepthEMA`, `DepthPercentileEMA`, and `ConvergenceEMA` for improved temporal stability.
94-
* Added a global `ShiftSmoother` to stabilize foreground, midground, and background parallax.
95-
* Result: consistent parallax motion across frames, no more in-and-out depth pulsing, and a much cleaner, more comfortable stereo flow.
96-
97-
### Auto-Crop Black Bar Logic
98-
99-
* Fixed crop mis detections during fade ins and fade outs.
100-
* Added a mean brightness guard so black bar detection does not update on very dark transition frames.
101-
* Added per frame re evaluation when black bar height changes significantly.
102-
* Result: letterboxed content (2.35:1 and similar) now auto crops reliably without vertical drift.
19+
- Implemented temporal smoothing in the depth pipeline to reduce flicker and improve temporal stability of depth map output.
20+
- Packaged VisionDepth3D.exe with Distill-Any-Depth (ONNX), Video Depth Anything (ONNX), and Depth Anything v2 Giant weights.
10321

10422
---
105-
## 4) Unified Depth Pipeline Upgrade & Platform Stability Improvements
10623

107-
v3.7 introduces the largest Depth tab upgrade yet, combining:
24+
## 2) 3D Render Tab
10825

109-
- **GPU backend support across NVIDIA, AMD, and Apple Silicon**
110-
- **FFmpeg codec selection for hardware-accelerated depth exports**
111-
- **Pause / Resume / Cancel controls** for long video renders
26+
### UI Fixes
11227

113-
---
114-
115-
### GPU Backend & Platform Support (CUDA / ROCm / MPS / CPU)
116-
117-
* Full rewrite of device detection — CUDA is no longer assumed
118-
* Automatic selection of best available compute backend
28+
- Added buttons for encoder settings and processing options.
29+
- Implemented multi-language support and tooltips for new dialog boxes.
30+
- Adjusted preview image window size and video info layout to prevent window overflow.
31+
- 3D tab columns now stack correctly when resizing the window on smaller screens.
11932

120-
#### Supported Depth Inference Backends
121-
- **CUDA** — NVIDIA GPUs
122-
- **ROCm** — AMD GPUs
123-
- **MPS** — Apple Silicon GPUs
124-
- **CPU fallback** when no GPU is detected
33+
### 3D Backend
12534

126-
#### Benefits
127-
* Prevents CPU-only fallback on capable GPUs
128-
* Removes CUDA-only links that caused AMD/macOS crashes
129-
* Core foundation for **Linux** deployment moving forward
130-
131-
> Result: VD3D depth inference now runs properly on **NVIDIA, AMD, Apple, and CPU-only** systems.
35+
- Reworked Auto Crop Black Bars to use first-frame detection with cached crop reuse.
36+
- Prevents per-frame crop jitter and depth/frame misalignment.
37+
- Improves stability for cinema content with subtle letterboxing.
38+
- Keep Audio checkbox now respects the user-selected output container instead of forcing MP4.
13239

13340
---
13441

135-
### FFmpeg Codec Selection for Depth Video Rendering
136-
137-
* New Video Codec dropdown added to the Depth Tab
138-
* Hardware and software encoding now user-selectable
139-
140-
#### Supported Encoders
42+
## Frametool Backend
14143

142-
**NVIDIA NVENC**
143-
- `h264_nvenc`, `hevc_nvenc`, `av1_nvenc`
144-
145-
**AMD AMF**
146-
- `h264_amf`, `hevc_amf`, `av1_amf`
147-
148-
**Intel QSV**
149-
- `h264_qsv`, `hevc_qsv`, `vp9_qsv`, `av1_qsv`
150-
151-
**CPU Fallback**
152-
- `libx264`, `libx265`, `libaom-av1`, `libsvtav1`, `mp4v`, `XVID`, `DIVX`
153-
154-
---
155-
156-
#### Safety & Compatibility Enhancements
157-
* Fixes XVID encoding failures on AMD/Intel systems
158-
* Unifies codec support with 3D Converter & FrameTools
159-
* AV1 warning system alerts users when OpenCV cannot decode
160-
* Built to allow full FFmpeg pipeline in next release
44+
- Reworked Frametool backend to support SSResNet models for feature model integration.
16145

16246
---
16347

164-
### Depth Pipeline Control System
48+
## Console Improvements
16549

166-
* **Pause**, **Resume**, and **Cancel** depth rendering
167-
* Real-time resource management during pauses
168-
* Safe termination prevents file corruption
169-
* Clear progress status states:
170-
- Running
171-
- Paused
172-
- Canceling
173-
- Completed
174-
175-
> Enables fast workflow adjustments and prevents wasted GPU/CPU time.
50+
- Standardized startup console messages to clearly reflect which subsystems are initializing (Torch, depth estimation, upscaler, external 3D pipeline, language, settings).
51+
- Unified compute device reporting across pipelines for consistent and clearer console output.
52+
- Suppressed optional xFormers dependency warning on startup.
53+
- Prevented duplicate language loading during settings restore.
17654

17755
---
17856

179-
## 5) 3D Pipeline & UX Polish
180-
181-
- Added a **Keep Original Audio** checkbox to optionally copy the source video’s audio into the final 3D render (no re-encoding).
182-
- Hooked a new **image-based 3D pipeline** directly into the main 3D renderer for single-frame conversions.
183-
- Wired the **Mode** selector so it cleanly switches between **Single**, **Batch**, and **Image** workflows.
184-
- Implemented an automatic **3D filename suffix system** so exports are labeled by format and eye mode
185-
(for example: `_LRF_Full_SBS`, `_LRF_Half_SBS`, `_VR`, `_Anaglyph`, `_Interlaced`, `_LRF_Left`, `_LRF_Right`).
186-
- Reviewed and cleaned up **multi-language labels and tooltips** across all supported locales.
187-
188-
## 6) Depth Blender – Live Preview & Scrubber
189-
190-
- The **Depth Blender** tab now has a fully working live preview, showing the V2 base map and the blended result side by side.
191-
- All blend parameters (white strength, feather blur, CLAHE, bilateral filters) now update the preview in real time, making it easier to dial in a clean, stable depth mix before running a full batch.
192-
193-
19457
## Summary
19558

196-
v3.7 focuses on stabilizing **Live 3D Capture**, fixing sync and GUI issues, and refining floating window behavior.
197-
Combined with unified **GPU backend support** and **hardware-accelerated codecs** for depth rendering, VD3D is now significantly more stable across:
59+
v3.8 focuses on stabilizing depth estimation, improving model compatibility,
60+
and refining the 3D Render tab UI with better layout behavior, clearer diagnostics, and improved localization support.
19861

199-
- NVIDIA systems
200-
- AMD ROCm systems
201-
- Apple Silicon (M1/M2/M3)
202-
- CPU-only environments
62+
> Back up your `weights/` and `presets/` folders before uninstalling v3.7.
63+
> Then run VisionDepth3D_Setup_Downloader to download the official
64+
> VisionDepth3D v3.8 Windows installer and required `.bin` files.
20365
204-
These improvements build the foundation for **Linux** and broader cross-platform support.
66+
> (Optional but recommended) Clear the Hugging Face cache to free space and
67+
> avoid duplicate model downloads:
68+
> `C:\Users\YOUR_USERNAME\.cache\huggingface`

0 commit comments

Comments
 (0)