I know that this function generates an optimized face remapping for a triangle list, but what actually this optimized remapping is? The triangle faces are remapped, but based on what?
2 Answers
Disclaimer: This is speculative. The DX specification doesn't define what it does, so it could be doing just about anything.
That said The Wine implementation might provide some insight. That actual function implementation starts on line 7173, and I would like to note the text just below that:
The face re-ordering does not use the vertex cache optimally.
One possible optimization that could be performed, highlighted by that, is grouping indices that use the same vertex. Your GPU, like most CPU's, probably has a catch on it's processor of data recently accessed from your VRAM. Reading from that catch is much faster than polling data from RAM.
So by grouping indices, you group all accesses to that part of RAM. So over the course of drawing a 2000 triangle mesh, you will only need the GPU to read each vertex from RAM a total of once. If it's unoptimized, you might need to read it once in the beginning, once in the middle, and once at the end. Coming to an unoptimized total of 3 times that you'd be polling the same data from VRAM.
Bonus fun snippet (likely not implemented by DirectX)
You could group triangles based on normal direction. By doing this, you could batch the back-face culling operation. That could save you from processing thousands of triangles that are likely facing away from the camera, and do it all with a single if statement.

- 1,522
- 8
- 15
It reorganizes the mesh data to achieve a better vertex cache hit rate by trying to group adjacent faces together in the buffer. The specifics of the algorithm are probably intentionally vague, so as to provide the possibility for the algorithm to have been updated or tweaked over time.
I believe it is (or at one point was) based on the work presented in Hugues Hoppes' 1999 SIGGRAPH paper, "Optimization of mesh locality for transparent vertex caching", which
[presents] two reordering techniques, a fast greedy strip-growing algorithm and a local optimization algorithm. The strip-growing algorithm performs lookahead simulations of the cache to adapt strip lengths to the cache capacity. The local optimization algorithm improves this initial result by exploring a set of perturbations to the face ordering.
The algorithm involves considering all mesh faces "unvisited," picking an initial face with the fewest neighbors, and trying to grow a triangle strip from that face. At each step, if there are adjacent faces, the algorithm runs a number of lookahead simulations to see how adding a face would impact the vertex cache (basically, if it would overflow the size of the cache). If possible it tries to grow the strip until it optimally fits in the vertex cache. If it would overflow or if there are no more adjacent faces, the algorithm restarts, choosing a new face. When possible new faces are chosen such that their vertices are already in the cache.
The second part of the algorithm involve perturbing the ordering of the face sequence and measuring the change in cost; the best perturbation for the vertex cache cost is chosen.
The related ID3DXMesh::Optimize
function can perform other, more explicit optimizations to the mesh. D3DXOptimizeFaces
is very similar to calling Optimize
with the D3DXMESHOPT_DEVICEINDEPENDENT
flag, which uses a vertex cache size assumption that works better on older hardware.