Commit 3f190c1
committed
opencl: flash attention for Adreno — fused kernels, quant KV, flash-decoding
Fused FA kernels for f16, native q8_0, and native q4_0 KV on Qualcomm
Adreno. Covers prefill, n_q=1 decode, and a flash-decoding (K-split)
path for long-context decode.
- N_SPLIT splits DK/DV across threads in a WG; QK partials reduced via
sub_group_shuffle_xor, with a 3-phase local-memory fallback when
subgroup shuffle isn't available.
- Flash-Decoding fires at n_kv >= 2048 on n_q <= 8 non-causal shapes,
DK <= 128 (DK <= 64 for multi-q FD).
- Asymmetric KV (-ctk X -ctv Y, X != Y) via on-GPU dequant of the quant
side to F32.
- Per-(dk,dv) lazy compilation. f16 q1 / q1_split share Q via __local
to avoid private-array spill.
- One-pass online softmax (FA-2 style) in q8_0 and q4_0 q1 / q1_split
kernels: maintains per-thread (m_i, l_i, o_acc) on a single sweep,
cross-thread merge rescales by alpha=exp(m_i - m_final). Eliminates
the second K read of the original two-pass implementation. Measured
on gpt-oss-20b at d=16k decode (Adreno X2-90): q8_0 +16.6%, q4_0
+17.5% end-to-end vs two-pass baseline.
- Dim-table tuning across DK in {40, 64, 80, 96, 112, 128, 192, 256},
with a per-generation selector hook (X1, X2 stubs).
Tested on Snapdragon X Elite (Adreno X1-85) and Snapdragon X2 Elite
Extreme (Adreno X2-90), OpenCL 3.0.1 parent 58e68df commit 3f190c1
12 files changed
Lines changed: 5612 additions & 319 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
169 | 169 | | |
170 | 170 | | |
171 | 171 | | |
| 172 | + | |
172 | 173 | | |
| 174 | + | |
| 175 | + | |
173 | 176 | | |
174 | 177 | | |
175 | 178 | | |
| |||
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
583 | 583 | | |
584 | 584 | | |
585 | 585 | | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
| 672 | + | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
586 | 776 | | |
587 | 777 | | |
588 | 778 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
48 | | - | |
| 48 | + | |
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | | - | |
| 64 | + | |
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
| |||
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
123 | | - | |
| 123 | + | |
124 | 124 | | |
125 | 125 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
41 | 44 | | |
42 | 45 | | |
43 | 46 | | |
| |||
49 | 52 | | |
50 | 53 | | |
51 | 54 | | |
52 | | - | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
53 | 59 | | |
54 | 60 | | |
55 | 61 | | |
| |||
108 | 114 | | |
109 | 115 | | |
110 | 116 | | |
111 | | - | |
| 117 | + | |
| 118 | + | |
112 | 119 | | |
113 | 120 | | |
0 commit comments