I'm trying to implement tiled deferred lighting with OpenGL compute shaders. For that, I need to compute the minimum and maximum position of each tile.
My first approach was to use atomicCounters for the computation, like that:
shared int minX;
shared int minY;
shared int minZ;
shared int maxX;
shared int maxY;
shared int maxZ;
...
atomicMin(minX, positionWorld.x);
atomicMax(minY, positionWorld.y);
barrier();
// now the shared variables contain the min/max
// position per tile
But that runs painfully slow (I guess because the atomic operations aren't very well for parallel processing).
So what would be the best / fastest way to do this?