Commit a61e00b
committed
Refactoring of gemm, adding faster kernel
This change gets rid of all non-batch functors, modularizes
duplicated code, and implement non-batches functions as calls
to batched functors with trivial constexpr batch indexer.
This change also adds faster gemm kernel that threads of N,M space,
and accumulates entire range of K in single work-item.
Dispatch logic changed too, we dispatch to thead-K kernel only if
(n,m) space is sufficiently small.1 parent e6d3564 commit a61e00b
1 file changed
Lines changed: 2707 additions & 4132 deletions
0 commit comments