If you are using XNA 4.0 you can use shaders with any SpriteSortMode
. To do so you pass an Effect
to one of the overloads of SpriteBatch.Begin
that accepts it as a parameter.
In XNA 3.1 and earlier Immediate
was the fastest mode. In XNA 4.0 it is the slowest.
SpriteBatch
uses a dynamic vertex buffer internally. It is pretty much the best strategy for doing what it does (batching together arbitrary sprite draws).
You could recreate SpriteBatch
, adding or removing functionality and potentially changing the performance profile. But you should probably avoid doing that.
SpriteBatch
"uses" up performance in the following ways: It does some CPU-based transformations (cheap CPU), possibly some sorting depending on mode (cheap CPU), and sends data to the GPU in batches (somewhat heavy CPU per batch, cheap bandwidth). Then on the GPU it has negligible vertex costs, and pixel costs that will depend mostly on your pixel shader and what you are drawing (ie: very little to do with SpriteBatch
itself).
So if you can "bake" your vertices into a regular vertex buffer, instead of a dynamic one, you can shave off some time. You could also micro-optimise the draw calls and the transformations they do. These will get you very tiny performance improvements - generally not worth it unless you are doing large particle systems.
You can probably get much better optimisation results by optimising your batching (see this question and answer) - grouping textures together and particularly by not using Immediate
sort mode in XNA 4.0. Or by optimising your pixel shader if you are GPU-bound, or other areas of your program if you are CPU-bound.
Rewriting SpriteBatch is almost always not worth the effort.
As always with optimising for performance: Measure It!