<div dir="auto"><div>Yes, I'm aware that glthread is far from perfect. However, I don't consider that an issue. My idea is that the actual work will take place in master. I have zero faith that any work on that will take place outside of master.<div dir="auto"><br></div><div dir="auto">Currently I don't expect it to work with any GL4 apps, because the threaded dispatch isn't aware of many GL4 functions. Initially we'll have a community-maintained whitelist of apps benefitting from glthread.</div><div dir="auto"><br></div><div dir="auto">Marek</div><br><div class="gmail_extra"><br><div class="gmail_quote">On Feb 6, 2017 11:10 AM, "Gregory Hainaut" <<a href="mailto:gregory.hainaut@gmail.com">gregory.hainaut@gmail.com</a>> wrote:<br type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br>
<br>
> A note on synchronizations. Borderlands 2 has 170 thread syncs per<br>
> frame. That means the app thread has to stop and wait 170x per frame.<br>
> Despite that, it still has 70% higher performance in some cases. My<br>
> theory is that if you have a lot of draw calls, you can have a lot of<br>
> syncs, because the sheer amount of draw calls will just make those<br>
> syncs irrelevant. 200 syncs per 4k draw calls is like 1 sync per 20<br>
> draw calls.<br>
<br>
Here a feedback of my quick test<br>
<br>
On PCSX2 (PS2 emulator), I noticed that synchronization badly impacts<br>
the perf. In my case, there are mostly related to texture transfer<br>
(CPU->GPU) and clear buffer functions. Strangely I didn't notice<br>
anything related to BufferSubData* but I guess it is the same.<br>
<br>
Those functions trigger a sync because of the pointer parameter.<br>
However texture transfer could use a PBO so it isn't a real pointer.<br>
And clear uses a pointer to a color hence a small payload (worst case<br>
is likely around 16/32B). IMHO, it can surely be inlined/memcpy in the<br>
gl dispatcher (otherwise the old GL2 clear API is sync free).<br>
<br>
I hacked the code to remove the sync on texture transfer and I got a<br>
major speed boost. I didn't count the number of draw call neither sync<br>
ratio. But I suspect that perf impact could depends on the sync<br>
repartition. Unlike me, I guess that Borderlands2 uploads/clears<br>
buffers/textures/uniform at the start of the frame. Which mean various<br>
small sync at the start of the frame (which might be optimized as a<br>
spin lock). Therefore the hot rendering loop might be sync free hence<br>
the speed boost.<br>
<br>
To conclude, based on my single testcase, current state of the code<br>
isn't yet optimal and it might explain why few apps see any perf<br>
improvement so far. But the potential is here.<br>
<br>
Cheers,<br>
Gregory<br>
</blockquote></div><br></div></div></div>