Commit 2a30dd0
committed
Consolidate the TMA examples around the libcudacxx wrappers.
Keep the example surface smaller and closer to CUDA C++ by showing barrier/TMA helpers and replace_address() in one place instead of duplicating raw PTX snippets.
Made-with: Cursor1 parent e9b1070 commit 2a30dd0
2 files changed
Lines changed: 79 additions & 277 deletions
This file was deleted.
0 commit comments