[PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.

Fri Sep 23 10:00:25 UTC 2016

On 22/09/16 10:22 PM, Christian König wrote:
> Am 22.09.2016 um 15:05 schrieb Daniel Vetter:
>> On Thu, Sep 22, 2016 at 2:44 PM, Christian König
>> <deathsimple at vodafone.de> wrote:
>>>> - explicit fencing: Userspace passes around distinct fence objects for
>>>> any work going on on the gpu. The kernel doesn't insert any stall of
>>>> it's own (except for moving buffer objects around ofc). This is what
>>>> Android. This also seems to be what amdgpu is doing within one
>>>> process/owner.
>>>
>>> No, that is clearly not my understanding of explicit fencing.
>>>
>>> Userspace doesn't necessarily need to pass around distinct fence objects
>>> with all of it's protocols and the kernel is still responsible for
>>> inserting
>>> stalls whenever an userspace protocol or application requires this
>>> semantics.
>>>
>>> Otherwise you will never be able to use explicit fencing on the Linux
>>> desktop with protocols like DRI2/DRI3.
>> This is about mixing them. Explicit fencing still means userspace has
>> an explicit piece, separate from buffers, (either sync_file fd, or a
>> driver-specific cookie, or similar).
>>
>>> I would expect that every driver in the system waits for all fences of a
>>> reservation object as long as it isn't told otherwise by providing a
>>> distinct fence object with the IOCTL in question.
>> Yup agreed. This way if your explicitly-fencing driver reads a shared
>> buffer passed over a protocol that does implicit fencing (like
>> DRI2/3), then it will work.
>>
>> The other interop direction is explicitly-fencing driver passes a
>> buffer to a consumer which expects implicit fencing. In that case you
>> must attach the right fence to the exclusive slot, but _only_ in that
>> case.
> 
> Ok well sounds like you are close to understand why I can't do exactly
> this: There simply is no right fence I could attach.
> 
> When amdgpu makes the command submissions it doesn't necessarily know
> that the buffer will be exported and shared with another device later on.
> 
> So when the buffer is exported and given to the other device you might
> have a whole bunch of fences which run concurrently and not in any
> serial order.

I feel like you're thinking too much of buffers shared between GPUs as
being short-lived and only shared late. In the use-cases I know about,
shared buffers are created separately and shared ahead of time, the
actual rendering work is done to non-shared buffers and then just copied
to the shared buffers for transfer between GPUs. These copies are always
performed by the same context in such a way that they should always be
performed by the same HW engine and thus implicitly serialized.

Do you have any specific use-cases in mind where buffers are only shared
between GPUs after the rendering operations creating the buffer contents
to be shared have already been submitted?

>> Otherwise you end up stalling your explicitly-fencing userspace,
>> since implicit fencing doesn't allow more than 1 writer. For amdgpu
>> one possible way to implement this might be to count how many users a
>> dma-buf has, and if it's more than just the current context set the
>> exclusive fence. Or do an uabi revision and let userspace decide (or
>> at least overwrite it).
> 
> I mean I can pick one fence and wait for the rest to finish manually,
> but that would certainly defeat the whole effort, doesn't it?

I'm afraid it's not clear to me why it would. Can you elaborate?

>> But the current approach in amdgpu_sync.c of declaring a fence as
>> exclusive after the fact (if owners don't match) just isn't how
>> reservation_object works. You can of course change that, but that
>> means you must change all drivers implementing support for implicit
>> fencing of dma-buf. Fixing amdgpu will be easier ;-)
> 
> Well as far as I can see there is no way I can fix amdgpu in this case.
> 
> The handling clearly needs to be changed on the receiving side of the
> reservation objects if I don't completely want to disable concurrent
> access to BOs in amdgpu.

Anyway, we need a solution for this between radeon and amdgpu, and I
don't think a solution which involves those drivers using reservation
object semantics between them which are different from all other drivers
is a good idea.

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer