[RFC] Implicit vs explicit user fence sync

Tue May 11 15:32:29 UTC 2021

Am 11.05.21 um 16:23 schrieb Daniel Vetter:
> On Tue, May 11, 2021 at 09:47:56AM +0200, Christian König wrote:
>> Am 11.05.21 um 09:31 schrieb Daniel Vetter:
>>> [SNIP]
>>>>> And that's just the one ioctl I know is big trouble, I'm sure we'll find
>>>>> more funny corner cases when we roll out explicit user fencing.
>>>> I think we can just ignore sync_file. As far as it concerns me that UAPI is
>>>> pretty much dead.
>>> Uh that's rather bold. Android is built on it. Currently atomic kms is
>>> built on it.
>> To be honest I don't think we care about Android at all.
> we = amd or we = upstream here?

we = amd, for everybody else that is certainly a different topic.

But for now AMD is the only one running into this problem.

Could be that Nouveau sees this as well with the next hw generation, but 
who knows?

>>> Why is this not much of a problem if it's just within one driver?
>> Because inside the same driver I can easily add the waits before submitting
>> the MM work as necessary.
> What is MM work here now?

MM=multimedia, e.g. UVD, VCE, VCN engines on AMD hardware.

>>>>>> Adding implicit synchronization on top of that is then rather trivial.
>>>>> Well that's what I disagree with, since I already see some problems that I
>>>>> don't think we can overcome (the atomic ioctl is one). And that's with us
>>>>> only having a fairly theoretical understanding of the overall situation.
>>>> But how should we then ever support user fences with the atomic IOCTL?
>>>>
>>>> We can't wait in user space since that will disable the support for waiting
>>>> in the hardware.
>>> Well, figure it out :-)
>>>
>>> This is exactly why I'm not seeing anything solved with just rolling a
>>> function call to a bunch of places, because it's pretending all things are
>>> solved when clearly that's not the case.
>>>
>>> I really think what we need is to first figure out how to support
>>> userspace fences as explicit entities across the stack, maybe with
>>> something like this order:
>>> 1. enable them purely within a single userspace driver (like vk with
>>> winsys disabled, or something else like that except not amd because
>>> there's this amdkfd split for "real" compute)
>>> 1a. including atomic ioctl, e.g. for vk direct display support this can be
>>> used without cross-process sharing, new winsys protocols and all that fun
>>> 2. figure out how to transport these userspace fences with something like
>>> drm_syncobj
>>> 2a. figure out the compat story for drivers which dont do userspace fences
>>> 2b. figure out how to absorb the overhead if the winsys/compositor doesn't
>>> support explicit sync
>>> 3. maybe figure out how to make this all happen magically with implicit
>>> sync, if we really, really care
>>>
>>> If we do 3 before we've nailed all these problems, we're just guaranteeing
>>> we'll get the wrong solutions and so we'll then have 3 ways of doing
>>> userspace fences
>>> - the butchered implicit one that didn't quite work
>>> - the explicit one
>>> - the not-so-butchered implicit one with the lessons from the properly
>>>     done explicit one
>>>
>>> The thing is, if you have no idea how to integrate userspace fences
>>> explicitly into atomic ioctl, then you definitely have no idea how to do
>>> it implicitly :-)
>> Well I agree on that. But the question is still how would you do explicit
>> with atomic?
> If you supply an userpace fence (is that what we call them now) as
> in-fence, then your only allowed to get a userspace fence as out-fence.

Yeah, that part makes perfectly sense. But I don't see the problem with 
that?

> That way we
> - don't block anywhere we shouldn't
> - don't create a dma_fence out of a userspace fence
>
> The problem is this completely breaks your "magically make implicit
> fencing with userspace fences" plan.

Why?

> So I have a plan here, what was yours?

As far as I see that should still work perfectly fine and I have the 
strong feeling I'm missing something here.

>> Transporting fences between processes is not the fundamental problem here,
>> but rather the question how we represent all this in the kernel?
>>
>> In other words I think what you outlined above is just approaching it from
>> the wrong side again. Instead of looking what the kernel needs to support
>> this you take a look at userspace and the requirements there.
> Uh ... that was my idea here? That's why I put "build userspace fences in
> userspace only" as the very first thing. Then extend to winsys and
> atomic/display and all these cases where things get more tricky.
>
> I agree that transporting the fences is easy, which is why it's not
> interesting trying to solve that problem first. Which is kinda what you're
> trying to do here by adding implicit userspace fences (well not even that,
> just a bunch of function calls without any semantics attached to them).
>
> So if there's more here, you need to flesh it out more or I just dont get
> what you're actually trying to demonstrate.

Well I'm trying to figure out why you see it as such a problem to keep 
implicit sync around.

As far as I can tell it is completely octagonal if we use 
implicit/explicit and dma_fence/user_fence.

It's just a different implementation inside the kernel.

Christian.

> -Daniel
>
>> Regards,
>> Christian.
>>
>>> And "just block" might be good enough for a quick demo, it still breaks
>>> the contract. Same holds for a bunch of the winsys problems we'll have to
>>> deal with here.
>>> -Daniel
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Like here at intel we have internal code for compute, and we're starting
>>>>> to hit some interesting cases with interop with media already, but that's
>>>>> it. Nothing even close to desktop/winsys/kms, and that's where I expect
>>>>> will all the pain be at.
>>>>>
>>>>> Cheers, Daniel