CUDA execution for external patches by a10y · Pull Request #7376 · vortex-data/vortex

a10y · 2026-04-09T19:19:43Z

Summary

This fixes up CUDA execution to adaptively support interior or exterior patches for bit unpacking.

This does not implement Dynamic Dispatch

vortex-cuda/src/kernel/encodings/bitpacked.rs

vortex-cuda/src/kernel/encodings/patched.rs

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

0ax1

LGTM - Added a couple more thoughts.

0ax1 · 2026-04-14T10:49:03Z

vortex-cuda/kernels/src/patched.cu

+/// This kernel uses a thread-per-lane model where each thread is assigned to
+/// one (chunk, lane) slot and applies all patches in that slot.
+template <typename ValueT>
+__device__ void patched(ValueT *const output,


Nice! Ideally we could move to a world where we drop the other patches.cu eventually and only operate on transposed patches on the GPU, maybe.

0ax1 · 2026-04-14T11:02:44Z

vortex-cuda/src/kernel/encodings/patched.rs

+            match_each_integer_ptype!(bitpacked.ptype(bitpacked.dtype()), |P| {
+                return decode_bitpacked::<P>(
+                    bitpacked.into_owned(),
+                    P::default(),


We could also handle the fused case here right: FoR + BP?

0ax1 · 2026-04-14T11:04:20Z

vortex-cuda/src/kernel/encodings/patched.rs

+            }
+
+            // Execute the components
+            let lane_offsets = array


This could go in a separate fn:

let lane_offsets = array .lane_offsets() .clone() .execute_cuda(ctx) .await? .into_primitive() .into_data_parts() .buffer; let patch_indices = array .patch_indices() .clone() .execute_cuda(ctx) .await? .into_primitive() .into_data_parts() .buffer; let patch_values = array .patch_values() .clone() .execute_cuda(ctx) .await? .into_primitive() .into_data_parts() .buffer;

a10y force-pushed the aduffy/patched-fuse branch from e16d1de to 721fa3d Compare April 9, 2026 19:22

a10y marked this pull request as ready for review April 10, 2026 16:52

a10y added the feature A feature request label Apr 10, 2026

a10y requested a review from 0ax1 April 10, 2026 16:52

a10y added changelog/feature A new feature and removed feature A feature request labels Apr 10, 2026

a10y changed the title ~~Aduffy/patched fuse~~ CUDA execution for external patches Apr 10, 2026

0ax1 reviewed Apr 13, 2026

View reviewed changes

a10y added 5 commits April 13, 2026 11:43

setup

c3dc412

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

CUDA execute Patched

d50a341

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

add unit test

ad28eac

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

copy to device before assembling device patches

8d6eeb0

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

comments

e13c033

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y force-pushed the aduffy/patched-fuse branch from bc9c13e to e13c033 Compare April 13, 2026 15:43

a10y added 6 commits April 13, 2026 14:42

implement proper gpu fallback

705668d

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

format

d96849f

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

off by one

9cbdc94

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

flip

b7e641c

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

remove PatchKind

eff523b

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

ensore on device

88567cf

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

a10y force-pushed the aduffy/patched-fuse branch from 3acf500 to 88567cf Compare April 13, 2026 20:58

fix memory

3462e80

Signed-off-by: Andrew Duffy <andrew@a10y.dev>

0ax1 approved these changes Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA execution for external patches#7376

CUDA execution for external patches#7376
a10y wants to merge 12 commits intodevelopfrom
aduffy/patched-fuse

a10y commented Apr 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

0ax1 left a comment

Uh oh!

0ax1 Apr 14, 2026

Uh oh!

0ax1 Apr 14, 2026

Uh oh!

0ax1 Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

a10y commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

0ax1 left a comment

Choose a reason for hiding this comment

Uh oh!

0ax1 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

0ax1 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

0ax1 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

a10y commented Apr 9, 2026 •

edited

Loading