Added padding for 8-byte objects alignment in memory#1
Added padding for 8-byte objects alignment in memory#1kraglik wants to merge 1 commit intoRSpliet:masterfrom
Conversation
|
Thank you for investing a long night in debugging this issue, and my apologies for leaving your PR on the shelf for a little. I'm glad to see there is some interest for this (experimental) code, especially since NVIDIA Ampère seems to permit implementing mutex-like synchronisation primitives that make dealing with globally shared data structures a lot more feasible! Surely adds some new use-cases for malloc(). |
|
Thanks for the reply. I'll try to find some time to make this padding optional. Interestingly enough, KMA without any changes works perfectly fine on MacBook AMD GPU but fails on GTX 970 and newer. Anyways, thank you for your work! It seems to be impossible to implement Hierarchical Temporal Memory with dynamic synapse allocation in OpenCL without KMA. Also, I'll check again if there is any regress in performance on AMD GPU. If I recall correctly, there was no regress, but still. |
I had a very long night trying to figure out why my (dead simple and therefore definitely correct) code was failing on my Nvidia GPU with 4 gigs of RAM.