10

How does cache work with tile based rendering?

Are there any tips on how to improve cache hit ratio for it? (for instance, if tiles are processed horizontally and I have vertical segments of triangles with the same texture, does it work worse for cache than if I had triangles layout out horizontally?)

Felipe Lira
  • 1,246
  • 1
  • 11
  • 16
  • 1
    What do you mean by vertical vs horizontal layout of triangles? – Mokosha Aug 28 '15 at 04:04
  • @Mokosha sorry, this somehow got unnoticed to me. I just saw it now. This is more a theoretical than pratical question and I don't even know if this makes sense now. Anyway, what I meant was, say a triangle intersect tiles (x, y) and (x+1, y) and that the these two tiles get processed one after another. Would that be a better for texture cache than If I had a triangle intersecting (x, y) and (x, y+1)? (Because of the border pixels and the layout of triangles not being in the same direction the tiles processing is) – Felipe Lira Sep 02 '15 at 13:06

1 Answers1

19

Whether it's a tile based GPU or not doesn't really affect the texture cache architecture. The memory layout of texture will look like some flavor of Morton order or Hilbert curve in all GPUs.

As a result, it's more efficient to render triangles that are close to equilateral triangles because GPU memory system fetches cache lines of texels.

So obviously on tile borders, it may happen that you have to fetch texels twice. This has a small cost as tile borders are only "few" pixels.

Arguably desktop GPUs behave identically to tile based GPUs as experiments such as the following demonstrate: http://www.g-truc.net/post-0597.html

The size of the tiles differ but both architecture actually process fragments into a hierarchy of tiles of different sizes.

When coding for tile based GPUs my recommendation is to always have in mind:

  1. Don't switch framebuffer objects unless you really need to.
  2. When binding a new framebuffer object, if you don't need to save the content of the current framebuffer, discard it. If don't want to load content of the new framebuffer, then you should clear the framebuffer.
Christophe
  • 306
  • 2
  • 3
  • I updated the second item as the edit wasn't what I meant. Otherwise, it looks great! – Christophe Sep 01 '15 at 23:15
  • Hi Christophe, did you mean "equilateral" triangles rather than "isosceles"?

    Rather than "Hilbert" I'd have said "Morton" order as the addressing is much easier in hardware.

    – Simon F Sep 02 '15 at 11:04
  • @Christophe thanks! This is really helpfull. So, for the border pixels, doesn't texture cache matter? That's was kind of what I was wondering. So, If I have a triangle that intersects tiles (x, y) and (x+1, y) and GPU just rasterized tile (x, y). Assuming tile (x+1, y) will be next, even if a different Execution Units processes it, won't I benefit from texture cache when sampling texels for this triangle? – Felipe Lira Sep 02 '15 at 12:57
  • Also, I got curious about the Hilbert pattern. I always assumed this was true for block compressed textures. Is this true for all textures?

    PS: I also didn't follow the last paragraph.

    – Felipe Lira Sep 02 '15 at 13:00
  • PVRTC encodes texture blocks in a morton order – ashleysmithgpu Sep 07 '15 at 19:09
  • @yuumei No. Although early MBX PVRTC files were in morton(ish) order, for later systems (e.g. SGX and Rogue), the data is in raster(ish) order and the driver/GPU loads and rearranges them into whatever is the preferred order for that particular GPU. – Simon F Sep 08 '15 at 12:23