[Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

Tue Apr 20 18:39:25 UTC 2021

On Tue, Apr 20, 2021 at 6:25 PM Marek Olšák <maraeo at gmail.com> wrote:
>
> Daniel, imagine hardware that can only do what Windows does: future fences signalled by userspace whenever userspace wants, and no kernel queues like we have today.
>
> The only reason why current AMD GPUs work is because they have a ring buffer per queue with pointers to userspace command buffers followed by fences. What will we do if that ring buffer is removed?

Well this is an entirely different problem than what you set out to
describe. This is essentially the problem where hw does not have any
support for priviledged commands and separate priviledges command
buffer, and direct userspace submit is the only thing that is
available.

I think if this is your problem, then you get to implement some very
interesting compat shim. But that's an entirely different problem from
what you've described in your mail. This pretty much assumes at the hw
level the only thing that works is ATS/pasid, and vram is managed with
HMM exclusively. Once you have that pure driver stack you get to fake
it in the kernel for compat with everything that exists already. How
exactly that will look and how exactly you best construct your
dma_fences for compat will depend highly upon how much is still there
in this hw (e.g. wrt interrupt generation). A lot of the
infrastructure was also done as part of drm_syncobj. I mean we have
entirely fake kernel drivers like vgem/vkms that create dma_fence, so
a hw ringbuffer is really not required.

So ... is this your problem underneath it all, or was that more a wild
strawman for the discussion?
-Daniel

> Marek
>
> On Tue, Apr 20, 2021 at 11:50 AM Daniel Stone <daniel at fooishbar.org> wrote:
>>
>> Hi,
>>
>> On Tue, 20 Apr 2021 at 16:16, Christian König <ckoenig.leichtzumerken at gmail.com> wrote:
>>>
>>> Am 20.04.21 um 17:07 schrieb Daniel Stone:
>>>
>>> If the compositor no longer has a guarantee that the buffer will be ready for composition in a reasonable amount of time (which dma_fence gives us, and this proposal does not appear to give us), then the compositor isn't trying to use the buffer for compositing, it's waiting asynchronously on a notification that the fence has signaled before it attempts to use the buffer.
>>>
>>> Marek's initial suggestion is that the kernel signal the fence, which would unblock composition (and presumably show garbage on screen, or at best jump back to old content).
>>>
>>> My position is that the compositor will know the process has crashed anyway - because its socket has been closed - at which point we destroy all the client's resources including its windows and buffers regardless. Signaling the fence doesn't give us any value here, _unless_ the compositor is just blindly waiting for the fence to signal ... which it can't do because there's no guarantee the fence will ever signal.
>>>
>>>
>>> Yeah, but that assumes that the compositor has change to not blindly wait for the client to finish rendering and as Daniel explained that is rather unrealistic.
>>>
>>> What we need is a fallback mechanism which signals the fence after a timeout and gives a penalty to the one causing the timeout.
>>>
>>> That gives us the same functionality we have today with the in software scheduler inside the kernel.
>>
>>
>> OK, if that's the case then I think I'm really missing something which isn't explained in this thread, because I don't understand what the additional complexity and API change gains us (see my first reply in this thread).
>>
>> By way of example - say I have a blind-but-explicit compositor that takes a drm_syncobj along with a dmabuf with each client presentation request, but doesn't check syncobj completion, it just imports that into a VkSemaphore + VkImage and schedules work for the next frame.
>>
>> Currently, that generates an execbuf ioctl for the composition (ignore KMS for now) with a sync point to wait on, and the kernel+GPU scheduling guarantees that the composition work will not begin until the client rendering work has retired. We have a further guarantee that this work will complete in reasonable time, for some value of 'reasonable'.
>>
>> My understanding of this current proposal is that:
>> * userspace creates a 'present fence' with this new ioctl
>> * the fence becomes signaled when a value is written to a location in memory, which is visible through both CPU and GPU mappings of that page
>> * this 'present fence' is imported as a VkSemaphore (?) and the userspace Vulkan driver will somehow wait on this value  either before submitting work or as a possibly-hardware-assisted GPU-side wait (?)
>> * the kernel's scheduler is thus eliminated from the equation, and every execbuf is submitted directly to hardware, because either userspace knows that the fence has already been signaled, or it will issue a GPU-side wait (?)
>> * but the kernel is still required to monitor completion of every fence itself, so it can forcibly complete, or penalise the client (?)
>>
>> Lastly, let's say we stop ignoring KMS: what happens for the render-with-GPU-display-on-KMS case? Do we need to do the equivalent of glFinish() in userspace and only submit the KMS atomic request when the GPU work has fully retired?
>>
>> Clarifying those points would be really helpful so this is less of a strawman. I have some further opinions, but I'm going to wait until I understand what I'm actually arguing against before I go too far. :) The last point is very salient though.
>>
>> Cheers,
>> Daniel
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch