|
| 1 | +# WebGPU Backend |
| 2 | + |
| 3 | +Run ExecuTorch models on the GPU via [WebGPU](https://www.w3.org/TR/webgpu/). The backend compiles delegated subgraphs into WGSL compute shaders executed natively through [wgpu-native](https://github.com/gfx-rs/wgpu-native) (Metal on macOS, Vulkan on Linux/Windows). |
| 4 | + |
| 5 | +> **Status: Prototype.** The backend supports a single operator today and is under active development. See [TODO.md](TODO.md) for the roadmap. |
| 6 | +
|
| 7 | +## Architecture |
| 8 | + |
| 9 | +``` |
| 10 | +PyTorch model |
| 11 | + │ torch.export |
| 12 | + ▼ |
| 13 | +Exported Program |
| 14 | + │ VulkanPartitioner (tags supported fp32 ops) |
| 15 | + ▼ |
| 16 | +Edge Dialect IR |
| 17 | + │ VulkanBackend.preprocess (builds Vulkan FlatBuffer, buffer-only storage) |
| 18 | + ▼ |
| 19 | +.pte file (with VH00/VK00 delegate blob) |
| 20 | + │ |
| 21 | + ▼ |
| 22 | +Native runtime (wgpu-native → Metal / Vulkan) |
| 23 | + │ WebGPUGraph::build → creates GPU buffers, pipelines, bind groups |
| 24 | + │ WebGPUGraph::execute → encodes + submits compute passes |
| 25 | + ▼ |
| 26 | +GPU output (mapped back to CPU via wgpuDevicePoll) |
| 27 | +``` |
| 28 | + |
| 29 | +Key design choices: |
| 30 | +- **Reuses Vulkan serialization** — the delegate blob is a Vulkan FlatBuffer (`VK00`) with a `VH00` header. All tensor storage is forced to `BUFFER` (WebGPU has no 3D storage textures). |
| 31 | +- **Built-in WGSL shaders** — shader source is compiled as C++ string constants. Future work will embed fused shaders in the FlatBuffer for compile-time mega-kernel fusion. |
| 32 | +- **No Python AOT code** — directly consumes .pte files exported via `VulkanPartitioner`. |
| 33 | + |
| 34 | +## Operator Support |
| 35 | + |
| 36 | +| Operator | WGSL Shader | Notes | |
| 37 | +|---|---|---| |
| 38 | +| `aten.add.Tensor` | `binary_add.wgsl` | Element-wise with alpha: `out = in1 + alpha * in2` | |
| 39 | + |
| 40 | +**Planned:** `sub`, `mul`, `relu`, `linear` (matmul), `softmax`, `layer_norm` |
| 41 | + |
| 42 | +## Quick Start |
| 43 | + |
| 44 | +### 1. Setup |
| 45 | + |
| 46 | +```bash |
| 47 | +bash backends/webgpu/scripts/setup-wgpu-native.sh |
| 48 | +``` |
| 49 | + |
| 50 | +This downloads prebuilt wgpu-native binaries for your platform. |
| 51 | + |
| 52 | +### 2. Export a model |
| 53 | + |
| 54 | +```python |
| 55 | +import torch |
| 56 | +from executorch.backends.vulkan import VulkanPartitioner |
| 57 | +from executorch.exir import to_edge_transform_and_lower |
| 58 | + |
| 59 | +class AddModule(torch.nn.Module): |
| 60 | + def forward(self, a: torch.Tensor, b: torch.Tensor) -> torch.Tensor: |
| 61 | + return a + b |
| 62 | + |
| 63 | +ep = torch.export.export(AddModule(), (torch.randn(4, 4), torch.randn(4, 4))) |
| 64 | +et_program = to_edge_transform_and_lower( |
| 65 | + ep, partitioner=[VulkanPartitioner()] |
| 66 | +).to_executorch() |
| 67 | + |
| 68 | +with open("add.pte", "wb") as f: |
| 69 | + f.write(et_program.buffer) |
| 70 | +``` |
| 71 | + |
| 72 | +### 3. Build and run |
| 73 | + |
| 74 | +```bash |
| 75 | +bash backends/webgpu/test/test_build_webgpu.sh |
| 76 | +``` |
| 77 | + |
| 78 | +This runs Python export tests, exports a .pte, builds the native runtime, and validates GPU output. |
| 79 | + |
| 80 | +## Directory Structure |
| 81 | + |
| 82 | +``` |
| 83 | +backends/webgpu/ |
| 84 | +├── CMakeLists.txt |
| 85 | +├── README.md |
| 86 | +├── TODO.md |
| 87 | +├── runtime/ |
| 88 | +│ ├── WebGPUBackend.h/cpp # BackendInterface (init/execute) |
| 89 | +│ ├── WebGPUGraph.h/cpp # GPU graph: buffers, pipelines, dispatch |
| 90 | +│ ├── WebGPUDelegateHeader.h/cpp # VH00 header parser |
| 91 | +│ ├── WebGPUDevice.h/cpp # wgpu-native device abstraction |
| 92 | +│ └── ops/ |
| 93 | +│ ├── OperatorRegistry.h/cpp # Op dispatch table |
| 94 | +│ └── add/ |
| 95 | +│ ├── BinaryOp.cpp # aten.add.Tensor implementation |
| 96 | +│ ├── binary_add.wgsl # WGSL shader source |
| 97 | +│ └── binary_add_wgsl.h # Shader as C++ string constant |
| 98 | +├── scripts/ |
| 99 | +│ └── setup-wgpu-native.sh # Download wgpu-native binaries |
| 100 | +└── test/ |
| 101 | + ├── conftest.py |
| 102 | + ├── test_build_webgpu.sh # End-to-end build + test |
| 103 | + ├── test_webgpu_native.cpp # C++ native test runner |
| 104 | + └── ops/ |
| 105 | + └── add/ |
| 106 | + └── test_add.py # Python export tests |
| 107 | +``` |
| 108 | + |
| 109 | +## Requirements |
| 110 | + |
| 111 | +- **macOS**: Metal-capable GPU |
| 112 | +- **Linux**: Vulkan-capable GPU + drivers |
| 113 | +- **Build**: CMake 3.19+, conda environment with ExecuTorch installed |
0 commit comments