Commit e67e9d3
committed
Consolidate the TMA examples around the libcudacxx wrappers.
Keep the example surface smaller and closer to CUDA C++ by showing barrier/TMA helpers and replace_address() in one place instead of duplicating raw PTX snippets.
Made-with: Cursor1 parent 358d975 commit e67e9d3
2 files changed
Lines changed: 79 additions & 277 deletions
This file was deleted.
0 commit comments