<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Am 28.05.20 um 21:35 schrieb Marek
Olšák:<br>
</div>
<blockquote type="cite" cite="mid:CAAxE2A7ORPncQnr98Z_N5uG7rPGzEh6yXUqw-=L9QRh1-ne4+w@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, May 28, 2020 at 2:12
PM Christian König <<a href="mailto:christian.koenig@amd.com" moz-do-not-send="true">christian.koenig@amd.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<div>Am 28.05.20 um 18:06 schrieb Marek Olšák:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, May 28,
2020 at 10:40 AM Christian König <<a href="mailto:christian.koenig@amd.com" target="_blank" moz-do-not-send="true">christian.koenig@amd.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px
0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">Am 28.05.20 um
12:06 schrieb Michel Dänzer:<br>
> On 2020-05-28 11:11 a.m., Christian König
wrote:<br>
>> Well we still need implicit sync [...]<br>
> Yeah, this isn't about "we don't want
implicit sync", it's about "amdgpu<br>
> doesn't ensure later jobs fully see the
effects of previous implicitly<br>
> synced jobs", requiring userspace to do
pessimistic flushing.<br>
<br>
Yes, exactly that.<br>
<br>
For the background: We also do this flushing for
explicit syncs. And <br>
when this was implemented 2-3 years ago we first
did the flushing for <br>
implicit sync as well.<br>
<br>
That was immediately reverted and then implemented
differently because <br>
it caused severe performance problems in some use
cases.<br>
<br>
I'm not sure of the root cause of this performance
problems. My <br>
assumption was always that we then insert to many
pipeline syncs, but <br>
Marek doesn't seem to think it could be that.<br>
<br>
On the one hand I'm rather keen to remove the
extra handling and just <br>
always use the explicit handling for everything
because it simplifies <br>
the kernel code quite a bit. On the other hand I
don't want to run into <br>
this performance problem again.<br>
<br>
Additional to that what the kernel does is a
"full" pipeline sync, e.g. <br>
we busy wait for the full hardware pipeline to
drain. That might be <br>
overkill if you just want to do some flushing so
that the next shader <br>
sees the stuff written, but I'm not an expert on
that.<br>
</blockquote>
<div><br>
</div>
<div>Do we busy-wait on the CPU or in WAIT_REG_MEM?</div>
<div><br>
</div>
<div>WAIT_REG_MEM is what UMDs do and should be
faster.</div>
</div>
</div>
</blockquote>
<br>
We use WAIT_REG_MEM to wait for an EOP fence value to
reach memory.<br>
<br>
We use this for a couple of things, especially to make
sure that the hardware is idle before changing VMID to
page table associations.<br>
<br>
What about your idea of having an extra dw in the shared
BOs indicating that they are flushed?<br>
<br>
As far as I understand it an EOS or other event might be
sufficient for the caches as well. And you could insert
the WAIT_REG_MEM directly before the first draw using the
texture and not before the whole IB.<br>
<br>
Could be that we can optimize this even more than what we
do in the kernel.<br>
<br>
Christian.<br>
</div>
</blockquote>
<div><br>
</div>
Adding fences into BOs would be bad, because all UMDs would
have to handle them.</div>
</div>
</blockquote>
<br>
Yeah, already assumed that this is the biggest blocker.<br>
<br>
<blockquote type="cite" cite="mid:CAAxE2A7ORPncQnr98Z_N5uG7rPGzEh6yXUqw-=L9QRh1-ne4+w@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">Is it possible to do this in the ring
buffer:</div>
<div class="gmail_quote">if (fence_signalled) {</div>
<div class="gmail_quote">
<div class="gmail_quote"> indirect_buffer(dependent_IB);<br>
</div>
indirect_buffer(other_IB);<br>
</div>
<div class="gmail_quote">} else {</div>
<div class="gmail_quote"> indirect_buffer(other_IB);</div>
<div class="gmail_quote"> wait_reg_mem(fence);<br>
</div>
<div class="gmail_quote"> indirect_buffer(dependent_IB);<br>
</div>
}</div>
</blockquote>
<br>
That's maybe possible, but at least not easily implementable.<br>
<br>
<blockquote type="cite" cite="mid:CAAxE2A7ORPncQnr98Z_N5uG7rPGzEh6yXUqw-=L9QRh1-ne4+w@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">Or we might have to wait for a hw
scheduler.<br>
</div>
</div>
</blockquote>
<br>
I'm still fine doing the pipeline sync for implicit sync as well, I
just need somebody to confirm me that this doesn't backfire in some
case.<br>
<br>
<blockquote type="cite" cite="mid:CAAxE2A7ORPncQnr98Z_N5uG7rPGzEh6yXUqw-=L9QRh1-ne4+w@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote"><br>
</div>
<div class="gmail_quote">
<div class="gmail_quote">Does the kernel sync when the driver
fd is different, or when the context is different?</div>
</div>
</div>
</blockquote>
<br>
Only when the driver fd is different.<br>
<br>
Christian.<br>
<br>
<blockquote type="cite" cite="mid:CAAxE2A7ORPncQnr98Z_N5uG7rPGzEh6yXUqw-=L9QRh1-ne4+w@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div class="gmail_quote"><br>
</div>
</div>
<div class="gmail_quote">Marek<br>
</div>
</div>
</blockquote>
<br>
</body>
</html>