How is data written from two different gpu cores to the same memory?

Question

Does each core’s data get written to the shared memory one at a time or both at the same time? For instance, when two cores that are next to each other need to write to the same memory spot does the gpu write the core from the left first, then the next core or does it happen at the same time? I’m asking because I want to know if I can get the 2nd core to check a value that the 1st core changed before writing the data.

Like most concurrency, this quickly gets into complicated "it depends" territory. Can you show us what, specifically, your shader is doing atm? Note that if your shader relies on calculations behaving as though they were dome serially between threads (thread 2 uses results from thread 1), often that will force the execution to be serial in places, meaning you have shading units burning time waiting instead of going full speed in parallel. There are often clever ways we can reshape our algorithms to avoid this serial dependency though — seeing your concrete application will help find options. — DMGregory, Aug 23 '20 at 12:53
I don’t have one. I was thinking about it to see if it is worth doing. All I can tell you is that I am trying to take an image an get rid of useless pixels and resizing the image so that the next time the gpu looks through the image, it doesn’t have to waste any power on empty pixels and skip straight to the ones I want. The only way I can see doing that is by having each core compare two pixels at a time and deleting the unimportant one while moving the important one one step closer to one of the image corners so that after x number of steps all the important pixels will be next to each other — user11937382, Aug 23 '20 at 16:33
I think you'll get much better results then by asking "How can I quickly crop an image to remove unnecessary pixels?". Including an example of the kind of image you need to work with would help (eg. is it one shape in a field of black/transparency? Or multiple small shapes you want to pack together? Do you need to preserve information about where the shapes were originally/where they moved to?) But also, note that a GPU does not read all the pixels in an image sequentially to find the ones you want. It already is able to compute the memory address needed for a sample and fetch them directly. — DMGregory, Aug 23 '20 at 16:38
That way of doing, though, seems too slow and I was trying to find a way to speed it up using the fact that (if it is true) the gpu cores write to the shared memory one at a time so that I could get the next core in line to check where the last core wrote it’s data and write the 2nd core data at that memory address +1. Or maybe have a value that tracks where the last used gpu core’s memory address is. — user11937382, Aug 23 '20 at 16:39
You're thinking of the problem serially, and that's typically going to get you serial performance. I'd argue this is a dead end and an example of an X/Y problem. I think you can find much more satisfactory and high-performance solutions elsewhere. — DMGregory, Aug 23 '20 at 16:41
I included a recommendation for how to improve your question above. That is what I suggest as a next step. — DMGregory, Aug 23 '20 at 16:42

How is data written from two different gpu cores to the same memory?

0 Answers0