amdgpu doesn't do implicit sync, requires drivers to do it in IBs

Fri May 29 09:05:40 UTC 2020

Am 28.05.20 um 21:35 schrieb Marek Olšák:
> On Thu, May 28, 2020 at 2:12 PM Christian König 
> <christian.koenig at amd.com <mailto:christian.koenig at amd.com>> wrote:
>
>     Am 28.05.20 um 18:06 schrieb Marek Olšák:
>>     On Thu, May 28, 2020 at 10:40 AM Christian König
>>     <christian.koenig at amd.com <mailto:christian.koenig at amd.com>> wrote:
>>
>>         Am 28.05.20 um 12:06 schrieb Michel Dänzer:
>>         > On 2020-05-28 11:11 a.m., Christian König wrote:
>>         >> Well we still need implicit sync [...]
>>         > Yeah, this isn't about "we don't want implicit sync", it's
>>         about "amdgpu
>>         > doesn't ensure later jobs fully see the effects of previous
>>         implicitly
>>         > synced jobs", requiring userspace to do pessimistic flushing.
>>
>>         Yes, exactly that.
>>
>>         For the background: We also do this flushing for explicit
>>         syncs. And
>>         when this was implemented 2-3 years ago we first did the
>>         flushing for
>>         implicit sync as well.
>>
>>         That was immediately reverted and then implemented
>>         differently because
>>         it caused severe performance problems in some use cases.
>>
>>         I'm not sure of the root cause of this performance problems. My
>>         assumption was always that we then insert to many pipeline
>>         syncs, but
>>         Marek doesn't seem to think it could be that.
>>
>>         On the one hand I'm rather keen to remove the extra handling
>>         and just
>>         always use the explicit handling for everything because it
>>         simplifies
>>         the kernel code quite a bit. On the other hand I don't want
>>         to run into
>>         this performance problem again.
>>
>>         Additional to that what the kernel does is a "full" pipeline
>>         sync, e.g.
>>         we busy wait for the full hardware pipeline to drain. That
>>         might be
>>         overkill if you just want to do some flushing so that the
>>         next shader
>>         sees the stuff written, but I'm not an expert on that.
>>
>>
>>     Do we busy-wait on the CPU or in WAIT_REG_MEM?
>>
>>     WAIT_REG_MEM is what UMDs do and should be faster.
>
>     We use WAIT_REG_MEM to wait for an EOP fence value to reach memory.
>
>     We use this for a couple of things, especially to make sure that
>     the hardware is idle before changing VMID to page table associations.
>
>     What about your idea of having an extra dw in the shared BOs
>     indicating that they are flushed?
>
>     As far as I understand it an EOS or other event might be
>     sufficient for the caches as well. And you could insert the
>     WAIT_REG_MEM directly before the first draw using the texture and
>     not before the whole IB.
>
>     Could be that we can optimize this even more than what we do in
>     the kernel.
>
>     Christian.
>
>
> Adding fences into BOs would be bad, because all UMDs would have to 
> handle them.

Yeah, already assumed that this is the biggest blocker.

> Is it possible to do this in the ring buffer:
> if (fence_signalled) {
>    indirect_buffer(dependent_IB);
>    indirect_buffer(other_IB);
> } else {
>    indirect_buffer(other_IB);
>    wait_reg_mem(fence);
>    indirect_buffer(dependent_IB);
> }

That's maybe possible, but at least not easily implementable.

> Or we might have to wait for a hw scheduler.

I'm still fine doing the pipeline sync for implicit sync as well, I just 
need somebody to confirm me that this doesn't backfire in some case.

>
> Does the kernel sync when the driver fd is different, or when the 
> context is different?

Only when the driver fd is different.

Christian.

>
> Marek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20200529/5d63670a/attachment-0001.htm>