[PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.

Wed Sep 21 09:56:29 UTC 2016

On 13/09/16 09:52 PM, Christian König wrote:
> Am 13.09.2016 um 11:39 schrieb Chris Wilson:
>> On Tue, Sep 13, 2016 at 10:44:11AM +0200, Christian König wrote:
>>> Am 09.09.2016 um 03:15 schrieb Michel Dänzer:
>>>> On 09/09/16 01:23 AM, Chris Wilson wrote:
>>>>> On Thu, Sep 08, 2016 at 05:21:42PM +0200, Mario Kleiner wrote:
>>>>>> On 09/08/2016 08:30 AM, Chris Wilson wrote:
>>>>>>> On Thu, Sep 08, 2016 at 02:14:43AM +0200, Mario Kleiner wrote:
>>>>>>>> amdgpu-kms uses shared fences for its prime exported dmabufs,
>>>>>>>> instead of an exclusive fence. Therefore we need to wait for
>>>>>>>> all fences of the dmabuf reservation object to prevent
>>>>>>>> unsynchronized rendering and flipping.
>>>>>>> No. Fix the root cause as this affects not just flips but copies -
>>>>>>> this implies that everybody using the resv object must wait for all
>>>>>>> fences. The resv object is not just used for prime, but all
>>>>>>> fencing, so
>>>>>>> this breaks the ability to schedule parallel operations across
>>>>>>> engine.
>>>>>>> -Chris
>>>>>>>
>>>>>> Ok. I think i now understand the difference, but let's check: The
>>>>>> exclusive fence is essentially acting a bit like a write-lock, and
>>>>>> the shared fences as readers-locks? So you can have multiple readers
>>>>>> but only one writer at a time?
>>>>> That's how we (i915.ko and I hope the rest of the world) are using
>>>>> them.
>>>>> In the model where here is just one reservation object on the GEM
>>>>> object, that reservation object is then shared between internal driver
>>>>> scheduling and external. We are reliant on being able to use
>>>>> buffers on
>>>>> multiple engines through the virtue of the shared fences, and to
>>>>> serialise
>>>>> after writes by waiting on the exclusive fence. (So we can have
>>>>> concurrent
>>>>> reads on the display engine, render engines and on the CPU - or
>>>>> alternatively an exclusive writer.)
>>>>>
>>>>> In the near future, i915 flips will wait on the common reservation
>>>>> object
>>>>> not only for dma-bufs, but also its own GEM objects.
>>>>>> Ie.:
>>>>>>
>>>>>> Writer must wait for all fences before starting write access to a
>>>>>> buffer, then attach the exclusive fence and signal it on end of
>>>>>> write access. E.g., write to renderbuffer, write to texture etc.
>>>>> Yes.
>>>>>> Readers must wait for exclusive fence, then attach a shared fence
>>>>>> per reader and signal it on end of read access? E.g., read from
>>>>>> texture, fb, scanout?
>>>>> Yes.
>>>>>> Is that correct? In that case we'd have a missing exclusive fence in
>>>>>> amdgpu for the linear target dmabuf? Probably beyond my level of
>>>>>> knowledge to fix this?
>>>>> i915.ko requires the client to mark which buffers are written to.
>>>>>
>>>>> In ttm, there are ttm_validate_buffer objects which mark whether they
>>>>> should be using shared or exclusive fences. Afaict, in amdgpu they are
>>>>> all set to shared, the relevant user interface seems to be
>>>>> amdgpu_bo_list_set().
>>>> This all makes sense to me.
>>>>
>>>> Christian, why is amdgpu setting only shared fences? Can we fix that?
>>> No, amdgpu relies on the fact that we even allow concurrent write
>>> accesses by userspace.
>>>
>>> E.g. one part of the buffer can be rendered by one engine while
>>> another part could be rendered by another engine.
>>>
>>> Just imagine X which is composing a buffer with both the 3D engine
>>> as well as the DMA engine.
>>>
>>> All engines need to run in parallel and you need to wait for all of
>>> them to finish before scanout.
>>>
>>> Everybody which needs exclusive access to the reservation object
>>> (like scanouts do) needs to wait for all fences, not just the
>>> exclusive one.
>>>
>>> The Intel driver clearly needs to be fixed here.
>> If you are not using implicit fencing, you have to pass explicit fences
>> instead.
> 
> Which is exactly what we do, but only for the driver internally command
> submissions.
> 
> All command submissions from the same process can run concurrently with
> amdgpu, only when we see a fence from another driver or process we wait
> for it to complete before starting to run a command submission.
> 
> Other drivers can't make any assumption on what a shared access is
> actually doing (e.g. writing or reading) with a buffer.
> 
> So the model i915.ko is using the reservation object and it's shared
> fences is certainly not correct and needs to be fixed.

Looks like there are different interpretations of the semantics of
exclusive vs. shared fences. Where are these semantics documented?

FWIW, we seem to have the same issue with radeon vs. amdgpu: radeon only
seems to wait for exclusive fences, so e.g. running Xorg on amdgpu and
using PRIME slave scanout on radeon leaves artifacts.

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer