Skip to content

Commit 7be8457

Browse files
committed
Add Synchronization 2 tutorial series with validation and advanced topics
Introduce comprehensive Synchronization 2 tutorial covering modern Vulkan synchronization patterns. Add chapters on dependency anatomy, pipeline barriers, timeline semaphores, frame-in-flight architecture, async compute, transfer queues, dynamic rendering sync, host image copies, and synchronization validation. Include practical examples from Simple Engine and guidance on debugging with validation layers and interpreting VUIDs.
1 parent 1778b46 commit 7be8457

39 files changed

Lines changed: 2157 additions & 0 deletions

antora/modules/ROOT/nav.adoc

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,28 @@
6161
** xref:courses/18_Ray_tracing/05_Shadow_transparency.adoc[Shadow transparency]
6262
** xref:courses/18_Ray_tracing/06_Reflections.adoc[Reflections]
6363
** xref:courses/18_Ray_tracing/07_Conclusion.adoc[Conclusion]
64+
* Synchronization 2
65+
** xref:Synchronization/introduction.adoc[Introduction]
66+
** Anatomy of a Dependency
67+
*** xref:Synchronization/Anatomy_of_a_Dependency/01_introduction.adoc[Introduction]
68+
** Pipeline Barriers and Transitions
69+
*** xref:Synchronization/Pipeline_Barriers_Transitions/01_introduction.adoc[Introduction]
70+
** Timeline Semaphores: The Master Clock
71+
*** xref:Synchronization/Timeline_Semaphores/01_introduction.adoc[Introduction]
72+
** Frame-in-Flight Architecture
73+
*** xref:Synchronization/Frame_in_Flight/01_introduction.adoc[Introduction]
74+
** Asynchronous Compute & Execution Overlap
75+
*** xref:Synchronization/Async_Compute_Overlap/01_introduction.adoc[Introduction]
76+
** Transfer Queues & Asset Streaming Sync
77+
*** xref:Synchronization/Transfer_Queues_Streaming/01_introduction.adoc[Introduction]
78+
** Synchronization in Dynamic Rendering
79+
*** xref:Synchronization/Dynamic_Rendering_Sync/01_introduction.adoc[Introduction]
80+
** Host Image Copies & Memory Mapped Sync
81+
*** xref:Synchronization/Host_Image_Copies_Memory_Sync/01_introduction.adoc[Introduction]
82+
** Debugging with Synchronization Validation
83+
*** xref:Synchronization/Synchronization_Validation/01_introduction.adoc[Introduction]
84+
** Profiling, Batching, and Optimization
85+
*** xref:Synchronization/Profiling_Optimization/01_introduction.adoc[Introduction]
6486
* xref:90_FAQ.adoc[FAQ]
6587
* link:https://github.com/KhronosGroup/Vulkan-Tutorial[GitHub Repository, window=_blank]
6688
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
:pp: {plus}{plus}
2+
= Anatomy of a Dependency: Introduction
3+
4+
== Overview
5+
6+
Every Vulkan operation, from a simple color clear to a complex ray-traced reflections pass, lives and breathes by the dependencies we define. In this chapter, we take a deep dive into the core mechanics of how data actually moves through the Vulkan pipeline and why synchronization is about much more than just "setting a bitmask."
7+
8+
image::../../../images/rendering_pipeline_flowchart.png[Rendering Pipeline Flowchart, width=600, alt="Flowchart showing the stages of a modern Vulkan rendering pipeline"]
9+
10+
To truly master synchronization, we first need to break down what happens when the GPU processes your commands. We often talk about the GPU as a "massive parallel processor," but what does that mean for data integrity? We'll start by deconstructing the fundamental differences between **Execution Dependencies** (the "when" of GPU work) and **Memory Dependencies** (the "where" and "visibility" of data).
11+
12+
=== What You'll Learn in This Chapter
13+
14+
This chapter is designed to move you from "making it work" to "knowing why it works." We'll explore:
15+
16+
* **The Hardware Perspective**: Understanding why execution barriers alone are not enough to prevent data corruption on modern, multi-cache GPUs.
17+
* **Execution vs. Memory Dependencies**: Learning how to distinguish between stopping a stage and ensuring its data is actually readable by the next one.
18+
* **The Synchronization 2 Advantage**: Why the new `vk::DependencyInfo` and `vkCmdPipelineBarrier2` are more than just a syntax cleanup—they are a fundamental shift in how we express intent to the driver.
19+
* **Surgical Precision with Pipeline Stages**: Mastering `vk::PipelineStageFlagBits2` and `vk::AccessFlagBits2` to target specific hardware units, ensuring maximum GPU occupancy by avoiding unnecessary pipeline bubbles.
20+
21+
By the end of this chapter, you’ll have a clear understanding of the "handshake" that must occur between any two pieces of GPU work. This foundation is crucial for everything that follows, from simple image layout transitions to complex asynchronous compute architectures.
22+
23+
== Navigation
24+
25+
Previous: xref:../introduction.adoc[Introduction] | Next: xref:02_execution_vs_memory.adoc[Execution vs. Memory Dependencies]
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
:pp: {plus}{plus}
2+
= Execution vs. Memory Dependencies
3+
4+
== Introduction
5+
6+
To understand why synchronization is so critical, we first need to look at what's happening under the hood when a GPU processes your work. Unlike a CPU, which generally executes instructions in a linear, predictable fashion, the GPU is a massive, highly-parallel array of specialized hardware units. When you submit a command buffer, the GPU doesn't just start at the top and finish at the bottom; it distributes tasks across various stages of its pipeline—geometry, rasterization, fragment shading, and more—often all at once.
7+
8+
This parallelism is what makes Vulkan powerful, but it's also where the danger lies. If you want a fragment shader to read data that was just written by a compute shader, you must define exactly how that dependency works. In Vulkan, this is split into two distinct concepts: **Execution Dependencies** and **Memory Dependencies**.
9+
10+
=== The "When": Execution Dependencies
11+
12+
An **Execution Dependency** is the simplest form of synchronization. It answers the question: "When can this work start?"
13+
14+
Imagine you have two commands: Command A and Command B. An execution dependency from A to B simply tells the GPU: "Don't start the specified pipeline stages of Command B until the specified pipeline stages of Command A have finished."
15+
16+
This sounds straightforward, but here's the catch: on modern hardware, Command A finishing its work is *not* the same thing as its data being ready for Command B. Execution is just the trigger; memory is the substance.
17+
18+
=== Architectural Realities: Caches and Memory Types
19+
20+
Vulkan memory isn't just one big bucket where you store textures and buffers. Depending on your hardware, it's a complex landscape of different physical locations and access speeds. To sync effectively, you need to know what you're syncing against.
21+
22+
On a **Discrete GPU**, you have dedicated Video RAM (VRAM) that is physically separate from your system's RAM. Moving data between these two is the job of the **DMA (Direct Memory Access)** engine—a specialized unit that can copy data across the PCI Express bus without bothering the main shader cores. When you upload a texture, you're often syncing the DMA engine with the Graphics pipeline.
23+
24+
On the other hand, many laptops and mobile devices use **Unified Memory Architecture (UMA)**, where the CPU and GPU share the same physical RAM sticks. While this sounds like it should make things easier, it actually adds a hidden layer of complexity: **Caches**. Even if they share the RAM, the CPU has its own L1/L2/L3 caches, and the GPU has its own L1/L2 caches. If the GPU writes data to a shared buffer, that data might stay in the GPU's L2 cache and never actually reach the physical RAM. When the CPU tries to read it, it will see the old, stale value from the RAM or its own cache.
25+
26+
In Vulkan, we categorize these behaviors into three primary memory types:
27+
28+
* **Device Local**: This is memory that is "fastest" for the GPU to access. On a discrete card, this is the VRAM. On UMA, it's just a portion of the shared RAM.
29+
* **Host Visible**: This memory can be "mapped" into your c{pp} application's address space, allowing the CPU to read and write to it directly.
30+
* **Host Coherent**: A special type of Host Visible memory where the hardware automatically ensures that CPU and GPU see the same data without you needing to manually flush caches (though you still need an execution dependency to ensure the write has *finished*!).
31+
32+
=== The "Where": Memory Dependencies
33+
34+
This is where many Vulkan developers get caught. Even if Command A has finished, its output might still be sitting in a local L1 cache on a specific shader core, or it might be in a shared L2 cache that hasn't been written back to the main pool. If Command B—perhaps running on a completely different part of the GPU or even the CPU—tries to read that data from main memory before it has been "made available," it will read stale data.
35+
36+
This is why we say execution is not enough. You can tell the hardware "Wait for the Compute Shader to finish before starting the Fragment Shader," and the hardware will happily oblige. But the Fragment Shader will then go to read the texture and find the old data because the Compute Shader's writes are still trapped in a local cache somewhere.
37+
38+
A **Memory Dependency** ensures that data is properly moved between caches and main memory so it can be safely read. This involves two critical steps:
39+
40+
1. **Availability**: This operation "flushes" the data from the source's local caches so that it is visible to a shared memory pool (like L2 cache or main memory).
41+
2. **Visibility**: This operation "invalidates" the local caches of the destination stage, forcing it to read the fresh data from the shared memory pool rather than using whatever stale bits it might already have.
42+
43+
Without both an execution dependency AND a memory dependency, you are living in a world of **hazards**. The most common is the "Read-After-Write" (RAW) hazard, where your fragment shader reads a texture before the compute shader has finished writing to it, resulting in the flickering artifacts or "shadow acne" that are so common in early Vulkan implementations.
44+
45+
=== The Practical Handshake
46+
47+
Think of it as a professional handshake. An execution dependency is the two people agreeing to meet. A memory dependency is one person actually handing the document to the other and the other person making sure they are looking at the new document, not their old notes.
48+
49+
In Synchronization 2, we define this handshake using `vk::PipelineStageFlagBits2` and `vk::AccessFlagBits2`. The stage flags define the *when* (the execution dependency), and the access flags define the *how* (the memory dependency). By pairing these correctly, you ensure that your data is not only processed in the right order but is also actually there when you go to look for it.
50+
51+
== Simple Engine Implementation: Caches and Safety
52+
53+
In `Simple Engine`, we handle these architectural realities through our `MemoryPool` class (`memory_pool.cpp`). When we allocate memory for a buffer or image, we specify the `vk::MemoryPropertyFlags` to decide its role. For example, our `UniformBuffer` objects are typically allocated as `HostVisible | HostCoherent`. This means the CPU can write to them and they are automatically visible to the GPU without a manual `flushMappedMemoryRanges` call.
54+
55+
However, just because they are **coherent** doesn't mean we can ignore execution dependencies! Even in `Simple Engine`, if the CPU updates a `HostCoherent` uniform buffer while the GPU is in the middle of a fragment shader reading from it, we will encounter a **data race**. This is why we still use `inFlightFences` and semaphores to ensure the GPU has finished using a frame's resources before the CPU starts modifying them for the next frame.
56+
57+
For our textures and vertex buffers, we use `DeviceLocal` memory for maximum performance. Because these are not host-coherent, we must use `vk::DependencyInfo` and `vk::ImageMemoryBarrier2` to explicitly manage the "Availability" and "Visibility" handshakes. This ensures that after a `vkCmdCopyBufferToImage` command, the data is properly flushed from the transfer unit's caches and invalidated for the fragment shader's caches.
58+
59+
== Navigation
60+
61+
Previous: xref:01_introduction.adoc[Introduction] | Next: xref:03_sync2_advantage.adoc[The Synchronization 2 Advantage]

0 commit comments

Comments
 (0)