0

With the library glm, we can do matrix computations on the CPU. However, the GPU is more suitable to do this. So what if I put the matrix computations in a Compute Shader? Will it be faster? If it is possible, it will be really nice for C, as doing matrix computations is a really hard job in C.

I may have to implement glm::perspective(), glm::lookat(), glm::rotate(), etc. in the Compute Shader, though.

wychmaster
  • 1,251
  • 2
  • 9
  • 27
  • What kind of "matrix computations"? How frequently do they change? Why are you putting them in a compute shader instead of a vertex shader? Are you talking about something that gets done once per frame, such that the cost of uploading the data to the CS will be no different from the cost of uploading a matrix to the VS? – Nicol Bolas May 22 '22 at 03:58
  • 1
    "as doing matrix computation is a really hard job in C." And FYI: None of the functions you cite are particularly burdensome for a CPU. Any modern CPU can perform such things millions of times per frame. – Nicol Bolas May 22 '22 at 03:59
  • 1
    The GPU is suitable for working on problems that are highly parallelizable - e.g. if you have many matrices, or a very large matrix (think hundreds of thousands of elements). Also this will be slower if the computation speed up (if there's any such speed up) doesn't offset the memory transfer overhead. If you haven't done so already, you should profile your program to figure out whether the matrix computations are slowing things down. If your question was a general question instead, then the answer depends on how many and how large your matrices are, and what hardware you're targeting. – lightxbulb May 22 '22 at 10:42
  • 1
    As someone who has written his own linear algebra library, I can only support what NicolBolas and lightxbulb said. To take advantage of the GPU, the things you are doing must be highly parallelisable with many computations. This means large matrices or a vast number of small ones. For small matrices like the ones we use in CG the transfer overheads are simply too high. Also, on desktop computers (x86) one can use SSE and AVX as a sort of parallelization which makes a 4x4 multiplication as fast as approximately 20 CPU cycles (IIRC). Would be surprised if GLM doesn't do this. – wychmaster May 22 '22 at 11:48
  • Additionally, almost all modern compilers are able to do automatic SSE/AVX vectorization so that your code runs almost as fast as possible. IIRC my manually vectorized code wasn't faster than the simple, triple-loop, non-vector implementation because of the smart compiler auto-vectorization. – wychmaster May 22 '22 at 11:52
  • 1
    You normally compute the matrices you mention in CPU, but on GPU (usually vertex shader but not only) you apply those per each vertex in parallel. – user8469759 May 22 '22 at 15:13

0 Answers0