[PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.

Thu Sep 22 12:26:46 UTC 2016

On Thu, Sep 22, 2016 at 12:55 PM, Christian König
<deathsimple at vodafone.de> wrote:
> Am 22.09.2016 um 08:36 schrieb Daniel Vetter:
>>
>> On Wed, Sep 21, 2016 at 06:23:35PM +0200, Christian König wrote:
>>>
>>> For a quick workaround I suggest to just serialize all accesses to BO
>>> shared
>>> with different drivers, but essentially I think it is a perfectly valid
>>> requirement to have multiple writers to one BO.
>>
>> It is, but it's not possible with implicit sync. If you want parallel
>> write access to the same shared buffer, you _must_ carry around some
>> explicit fences. Within amdgpu you can use driver-specific cookies, for
>> shared buffer we now have sync_file. But multiple writers with implicit
>> sync simply fundamentally doesn't work. Because you have no idea with
>> which
>> writer, touching the same subrange you want to touch.
>
>
> You don't need to separate the BO into subranges which are touched by
> different engines for allowing multiple writers.
>
> AMD hardware and I'm pretty sure others as well are perfectly capable of
> writing to the same memory from multiple engines and even multiple GPUs at
> the same time.
>
> For a good hint of what is possible see the public AMD ISA documentation
> about atomic operations, but that is only the start of it.
>
>
> The crux here is that we need to assume that we will have implicit and
> explicit sync mixed for backward compatibility.
>
> This implies that we need some mechanism like amdgpu uses in it's sync
> implementation where every fence is associated with an owner which denotes
> the domain in which implicit sync happens. If you leave this domain you will
> automatically run into explicit sync.
>
> Currently we define the borders of this domain in amdgpu on process boundary
> to keep things like DRI2/DRI3 working as expected.
>
> I really don't see how you want to solve this with a single explicit fence
> for each reservation object. As long as you have multiple concurrently
> running operations accessing the same buffer you need to keep one fence for
> each operation no matter what.

I can't make sense of what you're saying, and I suspect we put
different meaning to different words. So let me define here:

- implicit fencing: Userspace does not track read/writes to buffers,
but only the kernel does that. This is the assumption DRI2/3 has.
Since synchronization is by necessity on a per-buffer level you can
only have 1 writer. In the kernel the cross-driver interface for this
is struct reservation_object attached to dma-bufs. If you don't fill
out/wait for the exclusive fence in there, you're driver is _not_
doing (cross-device) implicit fencing.

- explicit fencing: Userspace passes around distinct fence objects for
any work going on on the gpu. The kernel doesn't insert any stall of
it's own (except for moving buffer objects around ofc). This is what
Android. This also seems to be what amdgpu is doing within one
process/owner.

Given that I'm not sure what you mean with "one explicit fence per
reservation_object", since explicit fencing should not attach anything
(at least not any exclusive fences) to a reservation_object. It does
sound a bit like you have the meaning the other way round (as in
explicit fencing = the kernel explicitly takes care of fencing, but
it's explicit as in explicit fences visible to userspace).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch