|
1 | | -# VisionDepth3D v3.7 – Changelog |
| 1 | +# VisionDepth3D v3.8 – Changelog |
2 | 2 |
|
3 | 3 | --- |
4 | 4 |
|
5 | | -## 1) Live 3D Capture – Real-Time Overhaul |
| 5 | +## 1) Depth Estimation Tab |
6 | 6 |
|
7 | | -### Audio Support Added |
| 7 | +### Depth Models |
8 | 8 |
|
9 | | -* Added optional live audio passthrough monitor for external capture devices |
10 | | -* New flags: |
11 | | - * `--audio-device "<device name>"` |
12 | | - * `--audio-delay-ms <value>` |
13 | | -* Supports DirectShow and fallback to WASAPI |
14 | | -* FFplay-based audio pipeline integrated for low latency monitoring |
15 | | -* Audio sync delay control for VR headset streaming |
| 9 | +- Fixed ONNX model loading: |
| 10 | + - Distill-Any-Depth (inference resolution 518×518, batch size 8) |
| 11 | + - Video Depth Anything (inference resolution 512×288, batch size 8) |
| 12 | +- Implemented LBM depth model (development version). Thanks to Aether for the implementation fix. |
| 13 | +- Removed depth models from the dropdown that returned no `d_type`. |
| 14 | +- Fixed Hugging Face model downloads and caching so zoo models consistently save inside the app `weights/` directory (no more extra `.cache` downloads). |
| 15 | +- Updated Transformers image processor loading to prefer `use_fast=True` when available (with automatic fallback when unsupported). |
16 | 16 |
|
17 | | -### Color Channel Controls |
| 17 | +### Depth Backend |
18 | 18 |
|
19 | | -* Added `--pixelshift-rgb` to properly route RGB output from CUDA warp kernel |
20 | | -* Added `--force-bgr-swap` fallback for cameras that provide BGR swapped frames |
21 | | -* Fixed purple and red tint issues on some capture devices |
22 | | - |
23 | | -### Performance and FPS Optimization |
24 | | - |
25 | | -* Tuned live depth inference resolution defaults for smoother real-time processing |
26 | | -* Recommended real-time profile: `infer-w 448`, `infer-h 256`, depth FPS ~ 8 to 12 |
27 | | -* Better GPU scheduling for live stereo warp when audio is active |
28 | | -* More stable stream with reduced stutter on high resolution capture input (1080p HDMI sources) |
29 | | - |
30 | | -### Network Streaming Prep (Experimental) |
31 | | - |
32 | | -* Added headless running mode (`--no-preview`) to remove local display overhead |
33 | | -* Prepared architecture for browser-based SBS VR streaming |
34 | | -* In development: WebRTC path for synchronized audio + video to Quest browser |
35 | | - |
36 | | -### General Stability |
37 | | - |
38 | | -* Improved diagnostic output and runtime logs |
39 | | -* Verified working flags for external capture cards (Elgato HD60 S+) |
40 | | -* Ensured fallback modes for low resource conditions |
41 | | - |
42 | | ---- |
43 | | - |
44 | | -## 2) Initial Bugfixes & Issues Resolved (Live 3D Cleanup) |
45 | | - |
46 | | -### UI Settings Not Applying |
47 | | - |
48 | | -* GUI sliders and dropdowns weren’t being respected when starting live capture |
49 | | -* Fixed runtime sync so all selected GUI settings apply properly |
50 | | - |
51 | | -### Manual CLI vs GUI Discrepancy |
52 | | - |
53 | | -* Runtime ignored resolution, backend, FPS, and other settings when launched via GUI |
54 | | -* Now fixed, all selected parameters propagate as expected |
55 | | - |
56 | | -### Video Tint Correction |
57 | | - |
58 | | -* Purple/red tint caused by incorrect channel routing |
59 | | -* Fixed with `--pixelshift-rgb` and `--force-bgr-swap` fallbacks |
60 | | - |
61 | | -### Capture Failures (Webcam/HDMI) |
62 | | - |
63 | | -* Fixed `no frames arriving` error by enforcing `--fourcc MJPG` and proper MSMF backend handling |
64 | | - |
65 | | -### No Audio Output |
66 | | - |
67 | | -* Live mode initially had no sound output |
68 | | -* Added full audio routing support for monitoring devices (DirectShow/WASAPI) |
69 | | - |
70 | | -### FPS Performance Bottlenecks |
71 | | - |
72 | | -* Live stereo + depth ran around 6.5–7 FPS at first |
73 | | -* Now tuned with smaller inference size, better scheduling, and CUDA-side warp boost |
74 | | - |
75 | | ---- |
76 | | - |
77 | | -## 3) Floating-Window and Stability Fixes |
78 | | - |
79 | | -### Dynamic Floating Window (DFW) Logic |
80 | | - |
81 | | -* Fixed undefined `zero_parallax_offset` references and long term stabilization drift. |
82 | | -* Rebuilt the DFW into an asymmetric, side only window that masks a single edge (left or right) based on the dominant parallax direction. |
83 | | -* Added a minimum parallax threshold so the window stays completely off when depth is very close to the screen plane. |
84 | | -* Computes bar width as a blend of parallax magnitude and how far the tracked subject is from mid depth, then clamps it to a small fraction of the per eye width for a subtle mask. |
85 | | -* Uses `FloatingWindowTracker` to smooth the horizontal offset over time and reduce small frame to frame jitters. |
86 | | -* Uses `FloatingBarEaser` to ease the bar width in and out so it expands and collapses gradually instead of popping on and off. |
87 | | -* Supports both soft faded masks and solid black cinema bars through a single toggle, with faded mode as the default for VR and monitor playback. |
88 | | -* Result: a more invisible and cinema friendly floating window that reduces window violations, keeps edges clean, and stays out of the viewer’s way. |
89 | | - |
90 | | -### Frame Jitter and Temporal Stability |
91 | | - |
92 | | -* Fixed flickering, depth “breathing,” and jitter caused by unsmoothed convergence and subject tracking. |
93 | | -* Applied `SubjectDepthEMA`, `DepthPercentileEMA`, and `ConvergenceEMA` for improved temporal stability. |
94 | | -* Added a global `ShiftSmoother` to stabilize foreground, midground, and background parallax. |
95 | | -* Result: consistent parallax motion across frames, no more in-and-out depth pulsing, and a much cleaner, more comfortable stereo flow. |
96 | | - |
97 | | -### Auto-Crop Black Bar Logic |
98 | | - |
99 | | -* Fixed crop mis detections during fade ins and fade outs. |
100 | | -* Added a mean brightness guard so black bar detection does not update on very dark transition frames. |
101 | | -* Added per frame re evaluation when black bar height changes significantly. |
102 | | -* Result: letterboxed content (2.35:1 and similar) now auto crops reliably without vertical drift. |
| 19 | +- Implemented temporal smoothing in the depth pipeline to reduce flicker and improve temporal stability of depth map output. |
| 20 | +- Packaged VisionDepth3D.exe with Distill-Any-Depth (ONNX), Video Depth Anything (ONNX), and Depth Anything v2 Giant weights. |
103 | 21 |
|
104 | 22 | --- |
105 | | -## 4) Unified Depth Pipeline Upgrade & Platform Stability Improvements |
106 | 23 |
|
107 | | -v3.7 introduces the largest Depth tab upgrade yet, combining: |
| 24 | +## 2) 3D Render Tab |
108 | 25 |
|
109 | | -- **GPU backend support across NVIDIA, AMD, and Apple Silicon** |
110 | | -- **FFmpeg codec selection for hardware-accelerated depth exports** |
111 | | -- **Pause / Resume / Cancel controls** for long video renders |
| 26 | +### UI Fixes |
112 | 27 |
|
113 | | ---- |
114 | | - |
115 | | -### GPU Backend & Platform Support (CUDA / ROCm / MPS / CPU) |
116 | | - |
117 | | -* Full rewrite of device detection — CUDA is no longer assumed |
118 | | -* Automatic selection of best available compute backend |
| 28 | +- Added buttons for encoder settings and processing options. |
| 29 | +- Implemented multi-language support and tooltips for new dialog boxes. |
| 30 | +- Adjusted preview image window size and video info layout to prevent window overflow. |
| 31 | +- 3D tab columns now stack correctly when resizing the window on smaller screens. |
119 | 32 |
|
120 | | -#### Supported Depth Inference Backends |
121 | | -- **CUDA** — NVIDIA GPUs |
122 | | -- **ROCm** — AMD GPUs |
123 | | -- **MPS** — Apple Silicon GPUs |
124 | | -- **CPU fallback** when no GPU is detected |
| 33 | +### 3D Backend |
125 | 34 |
|
126 | | -#### Benefits |
127 | | -* Prevents CPU-only fallback on capable GPUs |
128 | | -* Removes CUDA-only links that caused AMD/macOS crashes |
129 | | -* Core foundation for **Linux** deployment moving forward |
130 | | - |
131 | | -> Result: VD3D depth inference now runs properly on **NVIDIA, AMD, Apple, and CPU-only** systems. |
| 35 | +- Reworked Auto Crop Black Bars to use first-frame detection with cached crop reuse. |
| 36 | +- Prevents per-frame crop jitter and depth/frame misalignment. |
| 37 | +- Improves stability for cinema content with subtle letterboxing. |
| 38 | +- Keep Audio checkbox now respects the user-selected output container instead of forcing MP4. |
132 | 39 |
|
133 | 40 | --- |
134 | 41 |
|
135 | | -### FFmpeg Codec Selection for Depth Video Rendering |
136 | | - |
137 | | -* New Video Codec dropdown added to the Depth Tab |
138 | | -* Hardware and software encoding now user-selectable |
139 | | - |
140 | | -#### Supported Encoders |
| 42 | +## Frametool Backend |
141 | 43 |
|
142 | | -**NVIDIA NVENC** |
143 | | -- `h264_nvenc`, `hevc_nvenc`, `av1_nvenc` |
144 | | - |
145 | | -**AMD AMF** |
146 | | -- `h264_amf`, `hevc_amf`, `av1_amf` |
147 | | - |
148 | | -**Intel QSV** |
149 | | -- `h264_qsv`, `hevc_qsv`, `vp9_qsv`, `av1_qsv` |
150 | | - |
151 | | -**CPU Fallback** |
152 | | -- `libx264`, `libx265`, `libaom-av1`, `libsvtav1`, `mp4v`, `XVID`, `DIVX` |
153 | | - |
154 | | ---- |
155 | | - |
156 | | -#### Safety & Compatibility Enhancements |
157 | | -* Fixes XVID encoding failures on AMD/Intel systems |
158 | | -* Unifies codec support with 3D Converter & FrameTools |
159 | | -* AV1 warning system alerts users when OpenCV cannot decode |
160 | | -* Built to allow full FFmpeg pipeline in next release |
| 44 | +- Reworked Frametool backend to support SSResNet models for feature model integration. |
161 | 45 |
|
162 | 46 | --- |
163 | 47 |
|
164 | | -### Depth Pipeline Control System |
| 48 | +## Console Improvements |
165 | 49 |
|
166 | | -* **Pause**, **Resume**, and **Cancel** depth rendering |
167 | | -* Real-time resource management during pauses |
168 | | -* Safe termination prevents file corruption |
169 | | -* Clear progress status states: |
170 | | - - Running |
171 | | - - Paused |
172 | | - - Canceling |
173 | | - - Completed |
174 | | - |
175 | | -> Enables fast workflow adjustments and prevents wasted GPU/CPU time. |
| 50 | +- Standardized startup console messages to clearly reflect which subsystems are initializing (Torch, depth estimation, upscaler, external 3D pipeline, language, settings). |
| 51 | +- Unified compute device reporting across pipelines for consistent and clearer console output. |
| 52 | +- Suppressed optional xFormers dependency warning on startup. |
| 53 | +- Prevented duplicate language loading during settings restore. |
176 | 54 |
|
177 | 55 | --- |
178 | 56 |
|
179 | | -## 5) 3D Pipeline & UX Polish |
180 | | - |
181 | | -- Added a **Keep Original Audio** checkbox to optionally copy the source video’s audio into the final 3D render (no re-encoding). |
182 | | -- Hooked a new **image-based 3D pipeline** directly into the main 3D renderer for single-frame conversions. |
183 | | -- Wired the **Mode** selector so it cleanly switches between **Single**, **Batch**, and **Image** workflows. |
184 | | -- Implemented an automatic **3D filename suffix system** so exports are labeled by format and eye mode |
185 | | - (for example: `_LRF_Full_SBS`, `_LRF_Half_SBS`, `_VR`, `_Anaglyph`, `_Interlaced`, `_LRF_Left`, `_LRF_Right`). |
186 | | -- Reviewed and cleaned up **multi-language labels and tooltips** across all supported locales. |
187 | | - |
188 | | -## 6) Depth Blender – Live Preview & Scrubber |
189 | | - |
190 | | -- The **Depth Blender** tab now has a fully working live preview, showing the V2 base map and the blended result side by side. |
191 | | -- All blend parameters (white strength, feather blur, CLAHE, bilateral filters) now update the preview in real time, making it easier to dial in a clean, stable depth mix before running a full batch. |
192 | | - |
193 | | - |
194 | 57 | ## Summary |
195 | 58 |
|
196 | | -v3.7 focuses on stabilizing **Live 3D Capture**, fixing sync and GUI issues, and refining floating window behavior. |
197 | | -Combined with unified **GPU backend support** and **hardware-accelerated codecs** for depth rendering, VD3D is now significantly more stable across: |
| 59 | +v3.8 focuses on stabilizing depth estimation, improving model compatibility, |
| 60 | +and refining the 3D Render tab UI with better layout behavior, clearer diagnostics, and improved localization support. |
198 | 61 |
|
199 | | -- NVIDIA systems |
200 | | -- AMD ROCm systems |
201 | | -- Apple Silicon (M1/M2/M3) |
202 | | -- CPU-only environments |
| 62 | +> Back up your `weights/` and `presets/` folders before uninstalling v3.7. |
| 63 | +> Then run VisionDepth3D_Setup_Downloader to download the official |
| 64 | +> VisionDepth3D v3.8 Windows installer and required `.bin` files. |
203 | 65 |
|
204 | | -These improvements build the foundation for **Linux** and broader cross-platform support. |
| 66 | +> (Optional but recommended) Clear the Hugging Face cache to free space and |
| 67 | +> avoid duplicate model downloads: |
| 68 | +> `C:\Users\YOUR_USERNAME\.cache\huggingface` |
0 commit comments