<div dir="ltr"><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, May 28, 2020 at 10:40 AM Christian König <<a href="mailto:christian.koenig@amd.com">christian.koenig@amd.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Am 28.05.20 um 12:06 schrieb Michel Dänzer:<br>
> On 2020-05-28 11:11 a.m., Christian König wrote:<br>
>> Well we still need implicit sync [...]<br>
> Yeah, this isn't about "we don't want implicit sync", it's about "amdgpu<br>
> doesn't ensure later jobs fully see the effects of previous implicitly<br>
> synced jobs", requiring userspace to do pessimistic flushing.<br>
<br>
Yes, exactly that.<br>
<br>
For the background: We also do this flushing for explicit syncs. And <br>
when this was implemented 2-3 years ago we first did the flushing for <br>
implicit sync as well.<br>
<br>
That was immediately reverted and then implemented differently because <br>
it caused severe performance problems in some use cases.<br>
<br>
I'm not sure of the root cause of this performance problems. My <br>
assumption was always that we then insert to many pipeline syncs, but <br>
Marek doesn't seem to think it could be that.<br>
<br>
On the one hand I'm rather keen to remove the extra handling and just <br>
always use the explicit handling for everything because it simplifies <br>
the kernel code quite a bit. On the other hand I don't want to run into <br>
this performance problem again.<br>
<br>
Additional to that what the kernel does is a "full" pipeline sync, e.g. <br>
we busy wait for the full hardware pipeline to drain. That might be <br>
overkill if you just want to do some flushing so that the next shader <br>
sees the stuff written, but I'm not an expert on that.<br></blockquote><div><br></div><div>Do we busy-wait on the CPU or in WAIT_REG_MEM?</div><div><br></div><div>WAIT_REG_MEM is what UMDs do and should be faster.</div><div><br></div><div>Marek</div></div></div>