Skip to content

Fix deadlock on swapchain recreation#250

Merged
SaschaWillems merged 1 commit intoKhronosGroup:mainfrom
mjrapson:fix_fence_deadlock
Dec 20, 2025
Merged

Fix deadlock on swapchain recreation#250
SaschaWillems merged 1 commit intoKhronosGroup:mainfrom
mjrapson:fix_fence_deadlock

Conversation

@mjrapson
Copy link
Copy Markdown
Contributor

I'm working through the Vulkan tutorial and noticed the following.

In the deadlock section: https://docs.vulkan.org/tutorial/latest/03_Drawing_a_triangle/04_Swap_chain_recreation.html#_fixing_a_deadlock

It gives the example of a deadlock where the device fence is reset and the drawFrame function is exited early. On the next call to drawFrame, the call to waitForFences never returns, as the fence is left in an unsignaled state without anything to signal it.

The tutorial docs seem correct in that explanation, but the C++ code attachment still has the deadlock condition.

This fixes the C++ attachments where I was able to find the deadlocking code still present.

Now we reset the fence when we are ready to use it. The location was chosen to match attachments where the deadlock does not appear, or was already in the right location.

@SaschaWillems
Copy link
Copy Markdown
Collaborator

Not sure about this. I don't think the code does deadlock, also see #246, which is reworking the fence setup.

@mjrapson
Copy link
Copy Markdown
Contributor Author

Not sure about this. I don't think the code does deadlock, also see #246, which is reworking the fence setup.

Hi @SaschaWillems

I was able to reproduce the deadlock (on a linux build) in the original attachment examples (e.g. attachments/17_swap_chain_recreation.cpp)

The deadlock occurs when you resize the window, as explained in the documentation.

device.resetFences is called immediately after waitForFences, then if the swapchain image returns eErrorOutOfDateKHR it exits drawFrame. The next call to drawFrame then locks on waitForFences since the fence is now in an unsignaled state, with no object to signal it.

I wonder in your rework @SaschaWillems if you resize the window, does it hang until the timeout is reached then throws an exception? (Not a deadlock, but still not the idea case?) since fenceResult is likely to be eTimeout.

It may appear like a deadock if the timeout is UINT64_MAX as in the attachment.

@SaschaWillems
Copy link
Copy Markdown
Collaborator

Nope. Not seeing that issue here on windows.

@mjrapson
Copy link
Copy Markdown
Contributor Author

mjrapson commented Dec 14, 2025

Nope. Not seeing that issue here on windows.

Hi @SaschaWillems

I can reproduce in the following case only, and it is entirely dependent on the ErrorOutOfDateKHR condition being met since that is the only time drawFrame has the early return leaving the fence unsignaled

Build attachments

# For example
cd attachments
cmake -B build
cmake --build build

Run 17_swap_chain_recreation

cd build/17_swap_chain_recreation
./17_swap_chain_recreation

If GPU device is discrete GPU, the hang occurs

If GPU is not discrete device (I modified pickPhysicalDevice to prefer otherwise it picks the first that matches all conditions) - then the application runs normally.

For me, this means:
GPU: NVIDIA GeForce RTX 3050 Laptop GPU <-- lock occurs
GPU: Intel(R) Iris(R) Xe Graphics (ADL GT2) <-- no lock

So the case seems driver dependent (maybe not necessarily platform e.g. Linux vs. Windows)

@mjrapson
Copy link
Copy Markdown
Contributor Author

In either case, the changes match the tutorial documentation - unless I misunderstand it?

https://docs.vulkan.org/tutorial/latest/03_Drawing_a_triangle/04_Swap_chain_recreation.html#_fixing_a_deadlock

There is a simple fix thankfully. Delay resetting the fence until after we know for sure, we will be submitting work with it. Thus, if we return early, the fence is still signaled and vkWaitForFences wont deadlock the next time we use the same fence object.
The beginning of drawFrame should now look like this:

vkWaitForFences(device, 1, &inFlightFences[frameIndex], VK_TRUE, UINT64_MAX);

uint32_t imageIndex;
VkResult result = vkAcquireNextImageKHR(device, swapChain, UINT64_MAX, imageAvailableSemaphores[frameIndex], VK_NULL_HANDLE, &imageIndex);

if (result == VK_ERROR_OUT_OF_DATE_KHR) {
    recreateSwapChain();
    return;
} else if (result != VK_SUCCESS && result != VK_SUBOPTIMAL_KHR) {
    throw std::runtime_error("failed to acquire swap chain image!");
}

// Only reset the fence if we are submitting work
vkResetFences(device, 1, &inFlightFences[frameIndex]);

@SaschaWillems
Copy link
Copy Markdown
Collaborator

That's one of the biggest problems with the tutorial right now. The documentation and the code aren't matching in many places, and fixing that is kinda hard. Pinging @gpx1000, as he might have ideas on how to fix all of this.

@mjrapson
Copy link
Copy Markdown
Contributor Author

Thanks @SaschaWillems

Agreed - I haven't updated the literal code in the docs, but at least the instruction to not call resetFences until after any early returns

@SaschaWillems
Copy link
Copy Markdown
Collaborator

Taking a second look I think I can see how the current code can lead to such a deadlock and I think your PR is correct. We'll merge #246 first, as that also contains other fixes and changes to the documentation. Will review your PR afterwards.

@SaschaWillems
Copy link
Copy Markdown
Collaborator

#246 has been merged. Can you fix the merge conflicts? Will review afterwards.

@mjrapson
Copy link
Copy Markdown
Contributor Author

@SaschaWillems I have now rebased and resolved the conflicts

Copy link
Copy Markdown
Collaborator

@SaschaWillems SaschaWillems left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Very much appreciated.

Can't provoke out of date on Windows, but code looks good and matches the documentation.

@mjrapson
Copy link
Copy Markdown
Contributor Author

mjrapson commented Dec 20, 2025

Thanks. Very much appreciated.

Can't provoke out of date on Windows, but code looks good and matches the documentation.

Thanks for your help @SaschaWillems

I found the tutorials very useful in general. I don't have a lot of spare time, but I'd like to keep contributing where I can, especially updating some of the written sections.

I don't have permissions to merge, if someone could do that please :)

@SaschaWillems SaschaWillems merged commit 4e36540 into KhronosGroup:main Dec 20, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants