[PATCH v3] drm/amdgpu: Transfer fences to dmabuf importer

Tue Aug 7 14:04:12 UTC 2018

On Tue, Aug 7, 2018 at 3:54 PM, Christian König
<christian.koenig at amd.com> wrote:
> Am 07.08.2018 um 15:37 schrieb Daniel Vetter:
>>
>> On Tue, Aug 07, 2018 at 03:17:06PM +0200, Christian König wrote:
>>>
>>> Am 07.08.2018 um 14:59 schrieb Daniel Vetter:
>>>>
>>>> On Tue, Aug 07, 2018 at 02:51:50PM +0200, Christian König wrote:
>>>>>
>>>>> Am 07.08.2018 um 14:43 schrieb Daniel Vetter:
>>>>>>
>>>>>> On Tue, Aug 07, 2018 at 01:23:19PM +0200, Christian König wrote:
>>>>>>>
>>>>>>> Am 07.08.2018 um 13:05 schrieb Chris Wilson:
>>>>>>>>
>>>>>>>> amdgpu only uses shared-fences internally, but dmabuf importers rely
>>>>>>>> on
>>>>>>>> implicit write hazard tracking via the
>>>>>>>> reservation_object.fence_excl.
>>>>>>>
>>>>>>> Well I would rather suggest a solution where we stop doing this.
>>>>>>>
>>>>>>> The problem here is that i915 assumes that a write operation always
>>>>>>> needs
>>>>>>> exclusive access to an object protected by an reservation object.
>>>>>>>
>>>>>>> At least for amdgpu, radeon and nouveau this assumption is incorrect,
>>>>>>> but
>>>>>>> only amdgpu has a design where userspace is not lying to the kernel
>>>>>>> about
>>>>>>> it's read/write access.
>>>>>>>
>>>>>>> What we should do instead is to add a flag to each shared fence to
>>>>>>> note if
>>>>>>> it is a write operation or not. Then we can trivially add a function
>>>>>>> to wait
>>>>>>> on on those in i915.
>>>>>>>
>>>>>>> I should have pushed harder for this solution when the problem came
>>>>>>> up
>>>>>>> initially,
>>>>>>
>>>>>> For shared buffers in an implicit syncing setup like dri2 the
>>>>>> exclusive
>>>>>> fence _is_ your write fence.
>>>>>
>>>>> And exactly that is absolutely not correct if you ask me.
>>>>>
>>>>> The exclusive fence is for two use cases, the first one is for drivers
>>>>> which
>>>>> don't care about concurrent accesses and only use serialized accesses
>>>>> and
>>>>> the second is for kernel uses under the hood of userspace, evictions,
>>>>> buffer
>>>>> moves etc..
>>>>>
>>>>> What i915 does is to abuse that concept for write once read many
>>>>> situations,
>>>>> and that in turn is the bug we need to fix here.
>>>>
>>>> Again, the exclusive fence was added for implicit sync usage like
>>>> dri2/3.
>>>> _Not_ for your own buffer manager. If you want to separate these two
>>>> usages, then I guess we can do that, but claiming that i915 is the odd
>>>> one
>>>> out just aint true. You're arguing against all other kms drivers we have
>>>> here.
>>>
>>> No I'm not. I discussed exactly that use case with Maarten when the
>>> reservation object was introduced.
>>>
>>> I think that Maarten named the explicit fence on purpose "explicit" fence
>>> and not "write" fence to make that distinction clear.
>>>
>>> I have to admit that this wasn't really documented, but it indeed was the
>>> original purpose to get away from the idea that writes needs to be
>>> exclusive.
>>>
>>>>>> That's how this stuff works. Note it's only
>>>>>> for implicit fencing for dri2 shared buffers. i915 lies as much as
>>>>>> everyone else for explicit syncing.
>>>>>
>>>>> That is really really ugly and should be fixed instead.
>>>>
>>>> It works and is uapi now. What's the gain in "fixing" it?
>>>
>>> It allows you to implement explicit fencing in the uapi without breaking
>>> backward compatibility.
>>>
>>> See the situation with amdgpu and intel is the same as with userspace
>>> with
>>> mixed implicit and explicit fencing.
>>>
>>> If that isn't fixed we will run into the same problem when DRI3 gets
>>> implicit fencing and some applications starts to use it while others
>>> still
>>> rely on the explicit fencing.
>>
>> I think we're a bit cross-eyed on exact semantics, but yes this is exactly
>> the use-case. Chris Wilson's use-case tries to emulate exactly what would
>> happen if implicitly fenced amdgpu rendered buffer should be displayed on
>> an i915 output. Then you need to set the exclusive fence correctly.
>
>
> Yes, exactly. That's what I totally agree on.
>
>> And yes we called them exclusive/shared because shared fences could also
>> be write fences. But for the implicitly synced userspace case, exclusive =
>> The write fence.
>>
>> So probably Chris' patch ends up syncing too much (since for explicit
>> syncing you don't want to attach an exclusive fence, because userspace
>> passes the fence around already to the atomic ioctl). But at least it's
>> correct for the implicit case. But that's entirely an optimization problem
>> within amdgpu.
>
>
> Completely agree as well, but Chris patch actually just optimizes things. It
> is not necessary for correct operation.

Duh, I didn't read the patch carefully enough ...

> See previously we just waited for the BO to be idle, now Chris patch
> collects all fences (shared and exclusive) and adds that as single elusive
> fence to avoid blocking.
>
>> Summary: If you have a shared buffer used in implicitly synced buffer
>> sharing, you _must_ set the exclusive fence to cover all write access.
>
>
> And how should command submission know that?
>
> I mean we add the exclusive or shared fence during command submission, but
> that IOCTL has not the slightest idea if the BO is then used for explicit or
> for implicit fencing.

The ioctl doesn't know, but the winsys in userspace does know. Well,
with one exception: Bare metal egl on gbm or similar doesn't, so there
the heuristics is that we assume implicit fencing until userspace
starts using the explicit fence extensions. Then we switch over.

And once your winsys knows whether shared buffers should be implicit
or explicit synced, it can tell the kernel. Either through a new flag,
our by retroshoehorning the semantics into existing flags. The latter
is what we've done for i915 and freedrone afaiui.

>> In any other case (not shared, or not as part of implicitly synced
>> protocol), you can do whatever you want. So we're not conflagrating
>> exclusive with write here at all. And yes this is exactly what i915 and
>> all other drivers do - for explicit fencing we don't set the exclusive
>> fence, but leave that all to userspace. Also, userspace tells us (because
>> it knows how the protocol works, not the kernel) when to set an exclusive
>> fence.
>
>
> To extend that at least the lower layers of the user space driver doesn't
> know if we have explicit or implicit fencing either.
>
> The only component who really does know that is the DDX or more general the
> driver instance inside the display server.
>
> And when that component gets the buffer to display it command submission
> should already be long done.

Why does only the DDX know? If you're running gl on GLX your driver
knows how to sync the shared buffer - it also knows how to hand it
over to the DDX after all. Same for gles on android/surfaceflinger,
you know it's going to be explicit sync for shared buffers.

There's some cases where you can't decide this right away, but a
context flag that enables explicit sync once that's clear seemed to be
all that's needed. Worst case the first frame is artifially limited
due to the implicit fencing hurting a bit. And in general that issue
is only for bare metal buffers, i.e. your compositor takes a small hit
once at startup. If this is a real issue we could add a gbm interface
to select implict/explicit before we start allocating anything.

Again this is only for shared buffers, anything private to a given
context can be handled however you want really.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch