[PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.

Thu Sep 22 13:22:42 UTC 2016

Am 22.09.2016 um 15:05 schrieb Daniel Vetter:
> On Thu, Sep 22, 2016 at 2:44 PM, Christian König
> <deathsimple at vodafone.de> wrote:
>>> - explicit fencing: Userspace passes around distinct fence objects for
>>> any work going on on the gpu. The kernel doesn't insert any stall of
>>> it's own (except for moving buffer objects around ofc). This is what
>>> Android. This also seems to be what amdgpu is doing within one
>>> process/owner.
>>
>> No, that is clearly not my understanding of explicit fencing.
>>
>> Userspace doesn't necessarily need to pass around distinct fence objects
>> with all of it's protocols and the kernel is still responsible for inserting
>> stalls whenever an userspace protocol or application requires this
>> semantics.
>>
>> Otherwise you will never be able to use explicit fencing on the Linux
>> desktop with protocols like DRI2/DRI3.
> This is about mixing them. Explicit fencing still means userspace has
> an explicit piece, separate from buffers, (either sync_file fd, or a
> driver-specific cookie, or similar).
>
>> I would expect that every driver in the system waits for all fences of a
>> reservation object as long as it isn't told otherwise by providing a
>> distinct fence object with the IOCTL in question.
> Yup agreed. This way if your explicitly-fencing driver reads a shared
> buffer passed over a protocol that does implicit fencing (like
> DRI2/3), then it will work.
>
> The other interop direction is explicitly-fencing driver passes a
> buffer to a consumer which expects implicit fencing. In that case you
> must attach the right fence to the exclusive slot, but _only_ in that
> case.

Ok well sounds like you are close to understand why I can't do exactly 
this: There simply is no right fence I could attach.

When amdgpu makes the command submissions it doesn't necessarily know 
that the buffer will be exported and shared with another device later on.

So when the buffer is exported and given to the other device you might 
have a whole bunch of fences which run concurrently and not in any 
serial order.

> Otherwise you end up stalling your explicitly-fencing userspace,
> since implicit fencing doesn't allow more than 1 writer. For amdgpu
> one possible way to implement this might be to count how many users a
> dma-buf has, and if it's more than just the current context set the
> exclusive fence. Or do an uabi revision and let userspace decide (or
> at least overwrite it).

I mean I can pick one fence and wait for the rest to finish manually, 
but that would certainly defeat the whole effort, doesn't it?

I completely agree that you have only 1 writer with implicit fencing, 
but when you switch from explicit fencing back to implicit fencing you 
can have multiple ones.

> But the current approach in amdgpu_sync.c of declaring a fence as
> exclusive after the fact (if owners don't match) just isn't how
> reservation_object works. You can of course change that, but that
> means you must change all drivers implementing support for implicit
> fencing of dma-buf. Fixing amdgpu will be easier ;-)

Well as far as I can see there is no way I can fix amdgpu in this case.

The handling clearly needs to be changed on the receiving side of the 
reservation objects if I don't completely want to disable concurrent 
access to BOs in amdgpu.

Regards,
Christian.

> -Daniel