95

I understand that GPUs are generally used to do LOTS of calculations in parallel. I understand why we would want to parallelize processes in order to speed things up. However, GPUs aren't always better than CPUs, as far as I know.

What kinds of tasks are GPUs bad at? When would we prefer CPU over GPU for processing?

Discrete lizard
  • 8,248
  • 3
  • 24
  • 53
ChocolateOverflow
  • 1,111
  • 1
  • 7
  • 12
  • 6
    Sounds like dupe of https://superuser.com/questions/308771/why-are-we-still-using-cpus-instead-of-gpus – levininja Feb 25 '20 at 19:54

13 Answers13

143

GPUs are bad at doing one thing at a time. A modern high-end GPU may have several thousand cores, but these are organized into SIMD blocks of 16 or 32. If you want to compute 2+2, you might have 32 cores each compute an addition operation, and then discard 31 of the results.

GPUs are bad at doing individual things fast. GPUs only recently topped the one-gigahertz mark, something that CPUs did more than twenty years ago. If your task involves doing many things to one piece of data, rather than one thing to many pieces of data, a CPU is far better.

GPUs are bad at dealing with data non-locality. The hardware is optimized for working on contiguous blocks of data. If your task involves picking up individual pieces of data scattered around your data set, the GPU's incredible memory bandwidth is mostly wasted.

Mark
  • 900
  • 2
  • 5
  • 11
  • Concerning: If your task involves doing many things to one piece of data.. CPU is far better, did you mean doing many things after eachother (sequentially) to one piece of data? Because from what I understand from your answer, GPU's are, in general, better at doing many things in parallel, which could, from what I imagine, be interpreted as doing many things to a copied single thing (and merging after computation). In extend to this ambiguity, perhaps you could include whether GPUs can do many different computations in parallel almost as well as a single computation in parallel? – a.t. Feb 25 '20 at 13:33
  • 8
    @a.t.: If you copy a single thing, it becomes multiple things. Then you can perform operations on those multiple things, but if it's not pointless (e.g. discarding 31 of 32 results), you have to collect the results, which takes time. – jamesqf Feb 25 '20 at 17:09
  • 2
  • AMD GPUs has scalar engine although it's limited to operations useful for address arithmetics. OTOH, modern CPUs puts most of their ALUs into SIMD engines, having although more limited functionality that GPU SIMDs. Moreover, Ice Lake has 512-bit CPU SIMD engines while 256-bit only GPU SIMDs.
  • – Bulat Feb 25 '20 at 17:33
  • 2
  • Modern GPU is 2 GHz executing 1 operation per cycle while modern CPU is 5 GHz performing 4 scalar or 2 SIMD operations/cycle.
  • – Bulat Feb 25 '20 at 17:39
  • 3
  • is incorrrect. Both CPU and GPU read memory in 32-128 byte chunks. But NVidia GPU at 2 GHz can perform 2 bln random reads/second, while desktop CPU can perfrom less than 100 mln reads/second (I tested in on my Haswell 4770 with DDR3 2chan memory, so it may be higher on server/DDR4 cpus)
  • – Bulat Feb 25 '20 at 17:43