What is the advantage of OpenGL's direct state access mechanism?

Question

I've been reading about OpenGL 4.5 Direct State Access (DSA) at opengl.org and not sure if I'm getting it right.

It seems to imply, that the old way is less efficient:

glBind(something)
glSetA(..)
glSetB(..)
glSetC(..)

than the new way:

glSetA(something, ..)
glSetB(something, ..)
glSetC(something, ..)

From the looks of it now each glSet has to include glBind(something) inside of it and if OpenGL still being a state-machine cannot take advantage of streamed changes applied to a single something.

Please explain the reasoning behind and advantages of the new DSA.

score 24 · Accepted Answer · edited Feb 04 '15 at 17:10

From the looks of it now each glSet has to include glBind(something) inside of it

Not exactly. It's the other way around, as described several paragraphs below.

Even if it were true, remember that GL commands from the client app to the GL server (aka driver) have a lot of dispatch overhead compared to a regular function call. Even if we assume that the DSA functions are just wrappers around existing functions, they're wrappers that live inside the GL server and hence can have (a little) less overhead.

if OpenGL still being a state-machine cannot take advantage of streamed changes applied to a single something.

GPUs aren't state machines. The GL state machine interface is an emulation that wraps DSA-like driver internals, not the other way around.

Removing one layer of wrapping - a layer that requires an excessive number of calls into the GL server - is clearly a win, even if a small one.

The state machine approach also doesn't make a ton of sense when dealing with multiple threads; GL is still terrible in this use case but drivers often use threads behind the scenes, and a state machine requires a lot of thread synchronization or really fancy parallel algorithms/constructs to make things work reliably.

The DSA extension continues to phrase its operation in terms of state changes because it is, after all, an extension to an existing state-based document and not an entirely new API, so it had to be ready to plug in to the existing GL specification document's language and terminology. Even if that existing language is pretty terribly suited to its job as a modern graphics hardware API.

Please explain the reasoning behind and advantages of the new DSA.

The biggest reasoning is that the old way was a pain. It made it very difficult to compose libraries together that might each modify or rely on GL state. It made it difficult to efficiently wrap the GL API in an object-oriented or functional style due to its deep procedural state management roots, which made wrapping the API in various non-C languages difficult and also made it difficult to provide efficient graphics device wrappers that abstract OpenGL from Direct3D.

Second was the procedural state-machine API overhead, as described previously.

Third, the DSA functions changed semantics where appropriate from the old APIs that allowed for improved efficiency. Things that were previously mutable were made immutable, for instance, which removes a lot of book-keeping code from the GL server. Calls by the application can be dispatched to the hardware or validated sooner (or in more parallel fashions) when the GL server doesn't have to deal with mutable objects.

--

Additional justification and explanation is given in the EXT_direct_state_access extension specification.

--

Hardware changes that are relevant to the API design are rather numerous.

Remember that OpenGL dates back to 1991. The target hardware wasn't consumer-grade graphics cards (those didn't exist) but big CAD workstations and the like. The hardware of that era had very different performance envelopes than today; multi-threading was rarer, memory buses and CPUs had less of a speed gap, and the GPU did little more than fixed-function triangle rendering.

More and more fixed-function features were added. Various lighting models, texture modes, etc. were all added, each needing their own piece of state. The simple state-based approach worked when you had a handful of states. As more and more states were added, the API started bursting at the seams. The API became more awkward but didn't diverge too far from hardware modes, as they were indeed based on a lot of state switches.

Then, along came programmable hardware. The hardware has become more and more programmable, to the point where now, the hardware supports a little state, some user-supplied programs, and a lot of buffers. All that state from the previous era had to be emulated, just as all the fixed-function features of that era were being emulated by the drivers.

Hardware also changed to be more and more parallel. This necessitated other hardware redesigns that made graphics state changes very expensive. The hardware works in big blocks of immutable state. Because of these changes, the driver couldn't simply apply each little bit of the state that the user set immediately, but had to batch the changes automatically and apply them when needed implicitly.

Modern hardware operates even further from the classic OpenGL model. DSA is one little change that was needed some 10+ years ago (it was originally promised as part of OpenGL 3.0), similar to what D3D10 did. Many of the hardware changes above need far more than just DSA to keep OpenGL relevant, which is why still more big extensions that drastically change the OpenGL model are available. Then there's the whole new GLnext API plus D3D12, Mantle, Metal, etc. not a single one of which keeps the outmoded state machine abstraction.

Thanks for the answer. So it seems that before some point state-machine (non-DSA) was a win, but at some point something has changed and now DSA is advantageous. Can you shed some light on what has changed? — Kromster, Feb 04 '15 at 07:48
@KromStern: did my best. If you need more details, someone more knowledgeable than I is going to have to supply it. — Sean Middleditch, Feb 04 '15 at 08:47
@KromStern I've seen (from my limited research into the history) openGL moving to less and less draw calls CPU side per frame; display lists (for what they were worth), glDrawArrays (draw in one call), VBOs (upload to GPU once), VAOs (bind buffers to attributes once), uniform buffer object (set uniforms in one go). There is more that I'm missing, I'm sure. — ratchet freak, Feb 04 '15 at 10:24
@ratchetfreak: funnily enough, we're moving the other way now. The modern APIs/extensions are focused on increasing our draw calls per frame, mostly by removing all that state that has to be set/dispatched per draw call and making the draw calls little more than "insert draw command into command queue" against a big set of static state and bindless resources. Oooh, bindless, I forgot to even mention that part in my answer. — Sean Middleditch, Feb 04 '15 at 18:18

david van brink · Answer 2 · 2015-02-04T07:25:29.750

1

The overview justifies it by:

The intent of this extension is to make it more efficient for libraries to avoid disturbing selector and latched state. The extension also allows more efficient command usage by eliminating the need for selector update commands.

I think "more efficient" here refers both to less bookkeeping overhead for library authors, and resulting higher performance. With current API, to be "well behaved" you need to query the state, stash it, change the state to do what you need, then restore the original state.

Like

oldState = glGet()
glBind()
glDoThings...
glSet(oldState)  // restore, in case anyone needs it just as they left it

Presumably, older hardware could be made more performant with the explicit state-changing API; it's a pretty strange ritual otherwise. This extension implies (and just look at the authorship list!) that avoiding that fetch, set, restore dance is now more of a performance win on current hardware, even with the additional parameter on each call.

edited Feb 04 '15 at 07:25

answered Feb 04 '15 at 06:34

david van brink

2,572
13
16

"need to query/stash/change/restore" - how it is better with DSA? – Kromster Feb 04 '15 at 06:55
..added pseudo code to show. With DSA, none of that is necessary. Presumably current hardware doesn't really need "binding" state, can just access all of it as needed. – david van brink Feb 04 '15 at 07:27
The chain get/bind/do/set is rarely used, because 'Get' is very slow. Usually apps have to maintain replica of the variables anyway, so it trims down to just bind/do. I see the point though. – Kromster Feb 04 '15 at 07:43
2

@krom get from driver state can be fast, some of the gettable state has no business being on the GPU so it can just be gotten from RAM which is fast. – ratchet freak Feb 04 '15 at 10:04

What is the advantage of OpenGL's direct state access mechanism?

2 Answers2