<div dir="ltr"><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, May 28, 2020 at 10:40 AM Christian König <<a href="mailto:christian.koenig@amd.com">christian.koenig@amd.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Am 28.05.20 um 12:06 schrieb Michel Dänzer:<br> > On 2020-05-28 11:11 a.m., Christian König wrote:<br> >> Well we still need implicit sync [...]<br> > Yeah, this isn't about "we don't want implicit sync", it's about "amdgpu<br> > doesn't ensure later jobs fully see the effects of previous implicitly<br> > synced jobs", requiring userspace to do pessimistic flushing.<br> <br> Yes, exactly that.<br> <br> For the background: We also do this flushing for explicit syncs. And <br> when this was implemented 2-3 years ago we first did the flushing for <br> implicit sync as well.<br> <br> That was immediately reverted and then implemented differently because <br> it caused severe performance problems in some use cases.<br> <br> I'm not sure of the root cause of this performance problems. My <br> assumption was always that we then insert to many pipeline syncs, but <br> Marek doesn't seem to think it could be that.<br> <br> On the one hand I'm rather keen to remove the extra handling and just <br> always use the explicit handling for everything because it simplifies <br> the kernel code quite a bit. On the other hand I don't want to run into <br> this performance problem again.<br> <br> Additional to that what the kernel does is a "full" pipeline sync, e.g. <br> we busy wait for the full hardware pipeline to drain. That might be <br> overkill if you just want to do some flushing so that the next shader <br> sees the stuff written, but I'm not an expert on that.<br></blockquote><div><br></div><div>Do we busy-wait on the CPU or in WAIT_REG_MEM?</div><div><br></div><div>WAIT_REG_MEM is what UMDs do and should be faster.</div><div><br></div><div>Marek</div></div></div>