RFC: Add write flag to reservation object fences

Thu Aug 9 14:22:29 UTC 2018

On Thu, Aug 9, 2018 at 3:58 PM, Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
> Am 09.08.2018 um 15:38 schrieb Daniel Vetter:
>>
>> On Thu, Aug 09, 2018 at 01:37:07PM +0200, Christian König wrote:
>>>
>>> Hi everyone,
>>>
>>> This set of patches tries to improve read after write hazard handling
>>> for reservation objects.
>>>
>>> It allows us to specify for each shared fence if it represents a write
>>> operation.
>>>
>>> Based on this the i915 driver is modified to always wait for all writes
>>> before pageflip and the previously used workaround is removed from
>>> amdgpu.
>>
>> Hm, I thought after the entire discussions we agreed again that it's _not_
>> the write hazard we want to track, but whether there's an exclusive fence
>> that must be observed for implicit buffer sync. That's why it's called the
>> exclusive fence, not the write fence!
>>
>> If you want multiple of those, I guess we could add those, but that
>> doesn't really make sense - how exactly did you end up with multiple
>> exclusive fences in the first place?
>
>
> Maybe you misunderstood me, we don't have multiple exclusive fences.
>
> What we have are multiple writers which write to the BO. In other words
> multiple engines which compose the content of the BO at the same time.
>
> For page flipping we need to wait for all of them to completed.
>
>> i915 (and fwiw, any other driver) does _not_ want to observe all write
>> fences attached to a dma-buf. We want to _only_ observe the single
>> exclusive fence used for implicit buffer sync, which might or might not
>> exist. Otherwise the entire point of having explicit sync and explicit
>> fences in the atomic ioctl is out of the window and the use case of 2
>> draw/flip loops using a single buffer is defeated.
>
>
> What do you mean with that?
>
> Even for the atomic IOCTL with implicit fencing I strongly suspect that we
> can wait for multiple fences before doing the flip. Otherwise it would not
> really be useful to us.
>
>> Again: How exactly you construct that exclusive fences, and how exactly
>> the kernel and userspace cooperate to figure out when to set the exclusive
>> fences, is 100% up to amdgpu. If you do explicit sync by default, and only
>> switch to implicit sync (and setting the exclusive fence) as needed,
>> that's perfectly fine.  No need at all to leak that into core code and
>> confuse everyone that there's multiple exclusive fences they need to
>> somehow observe.
>
>
> I simply never have a single exclusive fence provided by userspace.
>
> I always have multiple command submissions accessing the buffer at the same
> time.
>
> See to me the explicit fence in the reservation object is not even remotely
> related to implicit or explicit synchronization.

Hm, I guess that's the confusion then. The only reason we have the
exclusive fence is to implement cross-driver implicit syncing. What
else you do internally in your driver doesn't matter, as long as you
keep up that contract.

And it's intentionally not called write_fence or anything like that,
because that's not what it tracks.

Of course any buffer moves the kernel does also must be tracked in the
exclusive fence, because userspace cannot know about these. So you
might have an exclusive fence set and also an explicit fence passed in
through the atomic ioctl. Aside: Right now all drivers only observe
one or the other, not both, so will break as soon as we start moving
shared buffers around. At least on Android or anything else using
explicit fencing.

So here's my summary, as I understanding things right now:
- for non-shared buffers at least, amdgpu uses explicit fencing, and
hence all fences caused by userspace end up as shared fences, whether
that's writes or reads. This means you end up with possibly multiple
write fences, but never any exclusive fences.
- for non-shared buffers the only exclusive fences amdgpu sets are for
buffer moves done by the kernel.
- amgpu (kernel + userspace combo here) does not seem to have a
concept/tracking for when a buffer is used with implicit or explicit
fencing. It does however track all writes.
- as a consequence, amdgpu needs to pessimistically assume that all
writes to shared buffer need to obey implicit fencing rules.
- for shared buffers (across process or drivers) implicit fencing does
_not_ allow concurrent writers. That limitation is why people want to
do explicit fencing, and it's the reason why there's only 1 slot for
an exclusive. Note I really mean concurrent here, a queue of in-flight
writes by different batches is perfectly fine. But it's a fully
ordered queue of writes.
- but as a consequence of amdgpu's lack of implicit fencing and hence
need to pessimistically assume there's multiple write fences amdgpu
needs to put multiple fences behind the single exclusive slot. This is
a limitation imposed by by the amdgpu stack, not something inherit to
how implicit fencing works.
- Chris Wilson's patch implements all this (and afaics with a bit more
coffee, correctly).

If you want to be less pessimistic in amdgpu for shared buffers, you
need to start tracking which shared buffer access need implicit and
which explicit sync. What you can't do is suddenly create more than 1
exclusive fence, that's not how implicit fencing works. Another thing
you cannot do is force everyone else (in non-amdgpu or core code) to
sync against _all_ writes, because that forces implicit syncing. Which
people very much don't want.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch