Welcome to the "Advanced Vulkan Compute" tutorial series! This series is designed for developers who have mastered the basics of Vulkan compute shaders and are looking to push the boundaries of what’s possible with modern GPU hardware.
Vulkan is not just a graphics API; it is a powerful, low-level framework for general-purpose GPU programming (GPGPU). While the initial tutorials covered how to dispatch a simple compute shader, this series dives deep into the architecture, memory models, and advanced features that enable high-performance simulations, complex data structures, and heterogeneous execution.
In a basic compute shader, you might just be multiplying an array of floats. In advanced compute, you are:
-
Orchestrating thousands of threads to work together on a single problem.
-
Managing memory consistency to ensure that data written by one thread is safely read by another.
-
Leveraging specialized hardware like subgroup shuffles and cooperative matrices to bypass slow VRAM (Video Random Access Memory).
-
Building GPU-resident data structures like BVH (Bounding Volume Hierarchies) and Octrees that never need to touch the CPU.
To do this effectively, you need more than just a passing knowledge of GLSL or Slang; you need to understand the underlying hardware architecture and the Vulkan execution model.
This tutorial series is organized into several key areas:
-
Compute Architecture - Mapping workgroups to Compute Units (CU) and Streaming Multiprocessors (SM), and mastering occupancy.
-
Memory Models and Consistency - Understanding the Vulkan Memory Model, shared memory (LDS - Local Data Store), and fine-grained synchronization.
-
Subgroup Operations - Using cross-invocation communication to avoid VRAM round-trips and maximize SIMD (Single Instruction, Multiple Data) throughput.
-
Heterogeneous Ecosystems - Running OpenCL C and SYCL code on top of Vulkan using
clspv,clvk, and AdaptiveCpp. -
Advanced Data Structures - Moving complex structures like trees and linked lists entirely to the GPU using 64-bit atomics and BDA (Buffer Device Address).
-
GPU-Driven Pipelines - Moving command generation and workload management entirely to the GPU for autonomous execution.
-
Asynchronous Orchestration - Running compute and graphics concurrently using Synchronization 2 and multiple hardware queues.
-
Advanced Math & Optimization - Using Cooperative Matrices for linear algebra and auditing kernels for divergence and throughput.
This series assumes you are comfortable with:
-
Standard Vulkan initialization (Instance, Device, Queues).
-
Basic Compute Pipelines and Descriptor Sets.
-
C++20 and GLSL/Slang shader languages.
-
The concepts covered in the Compute Shader chapter of the main tutorial.
Each chapter is designed to be self-contained but builds on the concepts of previous ones. We recommend following them in order if you’re new to advanced compute, or jumping to specific sections if you’re looking to solve a particular problem.
Let’s dive into the world of high-performance GPU computing!