[PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.

Wed Sep 21 15:07:24 UTC 2016

On 21/09/16 09:56 PM, Daniel Vetter wrote:
> On Wed, Sep 21, 2016 at 1:19 PM, Christian König
> <deathsimple at vodafone.de> wrote:
>> Am 21.09.2016 um 13:04 schrieb Daniel Vetter:
>>> On Wed, Sep 21, 2016 at 12:30 PM, Christian König
>>> <deathsimple at vodafone.de> wrote:
>>>> Am 21.09.2016 um 11:56 schrieb Michel Dänzer:
>>>>>
>>>>>
>>>>> Looks like there are different interpretations of the semantics of
>>>>> exclusive vs. shared fences. Where are these semantics documented?
>>>>
>>>>
>>>> Yeah, I think as well that this is the primary question here.
>>>>
>>>> IIRC the fences were explicitly called exclusive/shared instead of
>>>> writing/reading on purpose.
>>>>
>>>> I absolutely don't mind switching to them to writing/reading semantics,
>>>> but
>>>> amdgpu really needs multiple writers at the same time.
>>>>
>>>> So in this case the writing side of a reservation object needs to be a
>>>> collection of fences as well.
>>>
>>> You can't have multiple writers with implicit syncing. That confusion
>>> is exactly why we called them shared/exclusive. Multiple writers
>>> generally means that you do some form of fencing in userspace
>>> (unsync'ed gl buffer access is the common one). What you do for
>>> private buffers doesn't matter, but when you render into a
>>> shared/winsys buffer you really need to set the exclusive fence (and
>>> there can only ever be one). So probably needs some userspace
>>> adjustments to make sure you don't accidentally set an exclusive write
>>> hazard when you don't really want that implicit sync.
>>
>>
>> Nope, that isn't true.
>>
>> We use multiple writers without implicit syncing between processes in the
>> amdgpu stack perfectly fine.
>>
>> See amdgpu_sync.c for the implementation. What we do there is taking a look
>> at all the fences associated with a reservation object and only sync to
>> those who are from another process.
>>
>> Then we use implicit syncing for command submissions in the form of
>> "dependencies". E.g. for each CS we report back an identifier of that
>> submission to user space and on the next submission you can give this
>> identifier as dependency which needs to be satisfied before the command
>> submission can start running.
> 
> This is called explicit fencing. Implemented with a driver-private
> primitive (and not sync_file fds like on android), but still
> conceptually explicit fencing. Implicit fencing really only can handle
> one writer, at least as currently implemented by struct
> reservation_object.
> 
>> This was done to allow multiple engines (3D, DMA, Compute) to compose a
>> buffer while still allow compatibility with protocols like DRI2/DRI3.
> 
> Instead of the current solution you need to stop attaching exclusive
> fences to non-shared buffers (which are coordinated using the
> driver-private explicit fencing you're describing),

Err, the current issue is actually that amdgpu never sets an exclusive
fence, only ever shared ones. :)

> and only attach exclusive fences to shared buffers (DRI2/3, PRIME,
> whatever).

Still, it occurred to me in the meantime that amdgpu setting the
exclusive fence for buffers shared via PRIME (no matter if it's a write
or read operation) might be a solution. Christian, what do you think?

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer