[Mesa-dev] Time to merge threaded GL dispatch? (aka glthread)

Marek Olšák maraeo at gmail.com
Mon Feb 6 13:38:11 UTC 2017


Yes, I'm aware that glthread is far from perfect. However, I don't consider
that an issue. My idea is that the actual work will take place in master. I
have zero faith that any work on that will take place outside of master.

Currently I don't expect it to work with any GL4 apps, because the threaded
dispatch isn't aware of many GL4 functions. Initially we'll have a
community-maintained whitelist of apps benefitting from glthread.

Marek


On Feb 6, 2017 11:10 AM, "Gregory Hainaut" <gregory.hainaut at gmail.com>
wrote:

Hello,

> A note on synchronizations. Borderlands 2 has 170 thread syncs per
> frame. That means the app thread has to stop and wait 170x per frame.
> Despite that, it still has 70% higher performance in some cases. My
> theory is that if you have a lot of draw calls, you can have a lot of
> syncs, because the sheer amount of draw calls will just make those
> syncs irrelevant. 200 syncs per 4k draw calls is like 1 sync per 20
> draw calls.

Here a feedback of my quick test

On PCSX2 (PS2 emulator), I noticed that synchronization badly impacts
the perf. In my case, there are mostly related to texture transfer
(CPU->GPU) and clear buffer functions. Strangely I didn't notice
anything related to BufferSubData* but I guess it is the same.

Those functions trigger a sync because of the pointer parameter.
However texture transfer could use a PBO so it isn't a real pointer.
And clear uses a pointer to a color hence a small payload (worst case
is likely around 16/32B). IMHO, it can surely be inlined/memcpy in the
gl dispatcher (otherwise the old GL2 clear API is sync free).

I hacked the code to remove the sync on texture transfer and I got a
major speed boost. I didn't count the number of draw call neither sync
ratio. But I suspect that perf impact could depends on the sync
repartition. Unlike me, I guess that Borderlands2 uploads/clears
buffers/textures/uniform at the start of the frame. Which mean various
small sync at the start of the frame (which might be optimized as a
spin lock). Therefore the hot rendering loop might be sync free hence
the speed boost.

To conclude, based on my single testcase, current state of the code
isn't yet optimal and it might explain why few apps see any perf
improvement so far. But the potential is here.

Cheers,
Gregory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170206/c9819156/attachment.html>


More information about the mesa-dev mailing list