[PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.

Wed Sep 21 16:23:35 UTC 2016

Am 21.09.2016 um 17:29 schrieb Michel Dänzer:
> On 22/09/16 12:15 AM, Christian König wrote:
>> Am 21.09.2016 um 17:07 schrieb Michel Dänzer:
>>> On 21/09/16 09:56 PM, Daniel Vetter wrote:
>>>> On Wed, Sep 21, 2016 at 1:19 PM, Christian König
>>>> <deathsimple at vodafone.de> wrote:
>>>>> We use multiple writers without implicit syncing between processes
>>>>> in the
>>>>> amdgpu stack perfectly fine.
>>>>>
>>>>> See amdgpu_sync.c for the implementation. What we do there is taking
>>>>> a look
>>>>> at all the fences associated with a reservation object and only sync to
>>>>> those who are from another process.
>>>>>
>>>>> Then we use implicit syncing for command submissions in the form of
>>>>> "dependencies". E.g. for each CS we report back an identifier of that
>>>>> submission to user space and on the next submission you can give this
>>>>> identifier as dependency which needs to be satisfied before the command
>>>>> submission can start running.
>>>> This is called explicit fencing. Implemented with a driver-private
>>>> primitive (and not sync_file fds like on android), but still
>>>> conceptually explicit fencing. Implicit fencing really only can handle
>>>> one writer, at least as currently implemented by struct
>>>> reservation_object.
>>>>
>>>>> This was done to allow multiple engines (3D, DMA, Compute) to compose a
>>>>> buffer while still allow compatibility with protocols like DRI2/DRI3.
>>>> Instead of the current solution you need to stop attaching exclusive
>>>> fences to non-shared buffers (which are coordinated using the
>>>> driver-private explicit fencing you're describing),
>>> Err, the current issue is actually that amdgpu never sets an exclusive
>>> fence, only ever shared ones. :)
>> Actually amdgpu does set the exclusive fence for buffer migrations,
>> cause that is an operation user space doesn't know about and so it needs
>> to be "exclusive" access to the buffer.
>>
>>
>>>> and only attach exclusive fences to shared buffers (DRI2/3, PRIME,
>>>> whatever).
>>> Still, it occurred to me in the meantime that amdgpu setting the
>>> exclusive fence for buffers shared via PRIME (no matter if it's a write
>>> or read operation) might be a solution. Christian, what do you think?
>> The problem is that we don't have one fence, but many.
>>
>> E.g. there can be many operation on a buffer at the same time and we
>> need to wait for all of them to complete before it can be displayed.
> Maybe in theory, but with the problem we have in practice right now, the
> amdgpu GPU should only ever access the shared BO with the same engine.

That clearly won't work. Take a look at what both Mesa and the pro stack 
do with the BO before it is displayed makes it mandatory to execute 
things in parallel (at least for the not shared case).

> Anyway, this should be solvable by setting some kind of meta-fence as
> the exclusive fence, which can internally be mapped to multiple fences,
> maybe up to one for each ring which can access the BO?

I've thought about that as well, but this approach would also only work 
when we keep a collection of fences and not just an array because of the 
scheduler.

For a quick workaround I suggest to just serialize all accesses to BO 
shared with different drivers, but essentially I think it is a perfectly 
valid requirement to have multiple writers to one BO.

Christian.