[Mesa-dev] Time to merge threaded GL dispatch? (aka glthread)
gregory.hainaut at gmail.com
Mon Feb 6 10:10:32 UTC 2017
> A note on synchronizations. Borderlands 2 has 170 thread syncs per
> frame. That means the app thread has to stop and wait 170x per frame.
> Despite that, it still has 70% higher performance in some cases. My
> theory is that if you have a lot of draw calls, you can have a lot of
> syncs, because the sheer amount of draw calls will just make those
> syncs irrelevant. 200 syncs per 4k draw calls is like 1 sync per 20
> draw calls.
Here a feedback of my quick test
On PCSX2 (PS2 emulator), I noticed that synchronization badly impacts
the perf. In my case, there are mostly related to texture transfer
(CPU->GPU) and clear buffer functions. Strangely I didn't notice
anything related to BufferSubData* but I guess it is the same.
Those functions trigger a sync because of the pointer parameter.
However texture transfer could use a PBO so it isn't a real pointer.
And clear uses a pointer to a color hence a small payload (worst case
is likely around 16/32B). IMHO, it can surely be inlined/memcpy in the
gl dispatcher (otherwise the old GL2 clear API is sync free).
I hacked the code to remove the sync on texture transfer and I got a
major speed boost. I didn't count the number of draw call neither sync
ratio. But I suspect that perf impact could depends on the sync
repartition. Unlike me, I guess that Borderlands2 uploads/clears
buffers/textures/uniform at the start of the frame. Which mean various
small sync at the start of the frame (which might be optimized as a
spin lock). Therefore the hot rendering loop might be sync free hence
the speed boost.
To conclude, based on my single testcase, current state of the code
isn't yet optimal and it might explain why few apps see any perf
improvement so far. But the potential is here.
More information about the mesa-dev