[PATCH] drm/i915: Before pageflip, also wait for shared dmabuf fences.

Thu Sep 22 12:44:45 UTC 2016

Am 22.09.2016 um 14:26 schrieb Daniel Vetter:
> On Thu, Sep 22, 2016 at 12:55 PM, Christian König
> <deathsimple at vodafone.de> wrote:
>> Am 22.09.2016 um 08:36 schrieb Daniel Vetter:
>>> On Wed, Sep 21, 2016 at 06:23:35PM +0200, Christian König wrote:
>>>> For a quick workaround I suggest to just serialize all accesses to BO
>>>> shared
>>>> with different drivers, but essentially I think it is a perfectly valid
>>>> requirement to have multiple writers to one BO.
>>> It is, but it's not possible with implicit sync. If you want parallel
>>> write access to the same shared buffer, you _must_ carry around some
>>> explicit fences. Within amdgpu you can use driver-specific cookies, for
>>> shared buffer we now have sync_file. But multiple writers with implicit
>>> sync simply fundamentally doesn't work. Because you have no idea with
>>> which
>>> writer, touching the same subrange you want to touch.
>>
>> You don't need to separate the BO into subranges which are touched by
>> different engines for allowing multiple writers.
>>
>> AMD hardware and I'm pretty sure others as well are perfectly capable of
>> writing to the same memory from multiple engines and even multiple GPUs at
>> the same time.
>>
>> For a good hint of what is possible see the public AMD ISA documentation
>> about atomic operations, but that is only the start of it.
>>
>>
>> The crux here is that we need to assume that we will have implicit and
>> explicit sync mixed for backward compatibility.
>>
>> This implies that we need some mechanism like amdgpu uses in it's sync
>> implementation where every fence is associated with an owner which denotes
>> the domain in which implicit sync happens. If you leave this domain you will
>> automatically run into explicit sync.
>>
>> Currently we define the borders of this domain in amdgpu on process boundary
>> to keep things like DRI2/DRI3 working as expected.
>>
>> I really don't see how you want to solve this with a single explicit fence
>> for each reservation object. As long as you have multiple concurrently
>> running operations accessing the same buffer you need to keep one fence for
>> each operation no matter what.
> I can't make sense of what you're saying, and I suspect we put
> different meaning to different words. So let me define here:
>
> - implicit fencing: Userspace does not track read/writes to buffers,
> but only the kernel does that. This is the assumption DRI2/3 has.
> Since synchronization is by necessity on a per-buffer level you can
> only have 1 writer. In the kernel the cross-driver interface for this
> is struct reservation_object attached to dma-bufs. If you don't fill
> out/wait for the exclusive fence in there, you're driver is _not_
> doing (cross-device) implicit fencing.

I can confirm that my understanding of implicit fencing is exactly the 
same as yours.

> - explicit fencing: Userspace passes around distinct fence objects for
> any work going on on the gpu. The kernel doesn't insert any stall of
> it's own (except for moving buffer objects around ofc). This is what
> Android. This also seems to be what amdgpu is doing within one
> process/owner.

No, that is clearly not my understanding of explicit fencing.

Userspace doesn't necessarily need to pass around distinct fence objects 
with all of it's protocols and the kernel is still responsible for 
inserting stalls whenever an userspace protocol or application requires 
this semantics.

Otherwise you will never be able to use explicit fencing on the Linux 
desktop with protocols like DRI2/DRI3.

I would expect that every driver in the system waits for all fences of a 
reservation object as long as it isn't told otherwise by providing a 
distinct fence object with the IOCTL in question.

Regards,
Christian.