Added padding for 8-byte objects alignment in memory by kraglik · Pull Request #1 · RSpliet/KMA

kraglik · 2021-01-27T18:46:59Z

I had a very long night trying to figure out why my (dead simple and therefore definitely correct) code was failing on my Nvidia GPU with 4 gigs of RAM.

RSpliet · 2021-02-14T14:29:15Z

Thank you for investing a long night in debugging this issue, and my apologies for leaving your PR on the shelf for a little. I'm glad to see there is some interest for this (experimental) code, especially since NVIDIA Ampère seems to permit implementing mutex-like synchronisation primitives that make dealing with globally shared data structures a lot more feasible! Surely adds some new use-cases for malloc().
It looks like your fix hints at some additional constraints for GPUs wrt 64-bit pointers and/or 64-bit alignment of struct elements. I vaguely recall there being some constraints with 64-bit pointers back in the days, but my memory is too hazy to say anything sensible about it. Still, I do wonder if we can come up with a solution that doesn't introduce this padding or extra space for 32-bit systems or in other situations where it's not required. Can these alignment properties be queried and made optional?
If so, for a quick example of how to use host-queried properties to define preprocessor symbols in the OpenCL kernel - such that platform-specific variations can be coded up - see this bit of code in another one of my projects, plus the consumer of the newly defined preprocessor define. Admittedly, I haven't thought about the mechanics of this when including KMA as a "library"...
I appreciate it if it's beyond your scope to get a "perfect upstream" solution working, but if you could then I look forward to an updated patch. If not, I'll think about pulling this in wholesale, but I don't currently have a test set-up to triple-check nothing regresses... so bear with me.

kraglik · 2021-02-14T17:18:20Z

Thanks for the reply. I'll try to find some time to make this padding optional. Interestingly enough, KMA without any changes works perfectly fine on MacBook AMD GPU but fails on GTX 970 and newer. Anyways, thank you for your work! It seems to be impossible to implement Hierarchical Temporal Memory with dynamic synapse allocation in OpenCL without KMA.

Also, I'll check again if there is any regress in performance on AMD GPU. If I recall correctly, there was no regress, but still.

Added padding for 8-byte objects alignment in memory

6af9ff6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added padding for 8-byte objects alignment in memory#1

Added padding for 8-byte objects alignment in memory#1
kraglik wants to merge 1 commit intoRSpliet:masterfrom
kraglik:master

kraglik commented Jan 27, 2021

Uh oh!

RSpliet commented Feb 14, 2021 •

edited

Loading

Uh oh!

kraglik commented Feb 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kraglik commented Jan 27, 2021

Uh oh!

RSpliet commented Feb 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kraglik commented Feb 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RSpliet commented Feb 14, 2021 •

edited

Loading