Unexpected performance different between D3D12 and Vulkan

Question

I'm experimenting with some usage cases of structured buffers in a side project, where I'm building parallel implementations with Vulkan and DirectX 12. My performance stress-test is brute-force lighting, just iterating over 256 point lights in the buffer in the fragment shader in a simple PBR forward shader. No tiled-forward or clustered-forward (yet), so absolute worst case, just to explore structured buffer performance.

For each frame, I update all the lights and copy positions/intensities/range (two float4s) into a buffer.

On D3D12, this copy is into a staging buffer, that gets transferred to a GPU-side resource before use.

On Vulkan, I have two paths. The first mirrors the D3D12 pattern (write to staging buffer, and call vkCmdCopyBuffer), and the second round-robins through write target buffers that are created as host-visible; I write to these and then just call vkFlushMappedMemoryRanges. These buffers around just bound directly to the pipeline for the fragment shader to read.

I was expecting the write/transfer path to be faster on both DX12 and Vulkan, since I assumed the host-visible/flush memory would be on the CPU side and basically just be read over the PCIe bus. Instead, I'm seeing better performance on D3D12 (~60ms/frame on my RTX 3090), and the same performance through either path on Vulkan (around 10% slower, at 66ms/frame).

Profiling hasn't given me anything useful; does anyone have any ideas why performance on Vulkan is so much slower (and what I can do to mitigate it)? I generally prefer Vulkan as an API, so I'd like to make sure I get this set up correctly.

"Profiling hasn't given me anything useful" - were you using a CPU profiler (i.e. the one in your compiler's IDE, etc) or a GPU based profiler (such as the NVIDIA Visual Profiler or the Radeon GPU Profiler)? — Pikalek, Apr 02 '23 at 00:53
Profiling actually helped me figure this out. I had a slightly different draw-order between the DX12 and Vulkan implementations, resulting in more efficient ordering on the DX12 path. Fixing that I see the same performance on both, with the added benefit I can use a single (round-robin) set of buffers on Vulkan with host-visible memory and just flush on updates. — Varrak, Apr 02 '23 at 04:15

Unexpected performance different between D3D12 and Vulkan

0 Answers0