[Mesa-dev] Linux Graphics Next: Userspace submission update
Michel Dänzer
michel at daenzer.net
Tue Jun 1 13:24:40 UTC 2021
On 2021-06-01 3:01 p.m., Marek Olšák wrote:
>
>
> On Tue., Jun. 1, 2021, 08:51 Christian König, <ckoenig.leichtzumerken at gmail.com <mailto:ckoenig.leichtzumerken at gmail.com>> wrote:
>
> Am 01.06.21 um 14:30 schrieb Daniel Vetter:
> > On Tue, Jun 1, 2021 at 2:10 PM Christian König
> > <ckoenig.leichtzumerken at gmail.com <mailto:ckoenig.leichtzumerken at gmail.com>> wrote:
> >> Am 01.06.21 um 12:49 schrieb Michel Dänzer:
> >>> On 2021-06-01 12:21 p.m., Christian König wrote:
> >>>> Am 01.06.21 um 11:02 schrieb Michel Dänzer:
> >>>>> On 2021-05-27 11:51 p.m., Marek Olšák wrote:
> >>>>>> 3) Compositors (and other privileged processes, and display flipping) can't trust imported/exported fences. They need a timeout recovery mechanism from the beginning, and the following are some possible solutions to timeouts:
> >>>>>>
> >>>>>> a) use a CPU wait with a small absolute timeout, and display the previous content on timeout
> >>>>>> b) use a GPU wait with a small absolute timeout, and conditional rendering will choose between the latest content (if signalled) and previous content (if timed out)
> >>>>>>
> >>>>>> The result would be that the desktop can run close to 60 fps even if an app runs at 1 fps.
> >>>>> FWIW, this is working with
> >>>>> https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880> , even with implicit sync (on current Intel GPUs; amdgpu/radeonsi would need to provide the same dma-buf poll semantics as other drivers and high priority GFX contexts via EGL_IMG_context_priority which can preempt lower priority ones).
> >>>> Yeah, that is really nice to have.
> >>>>
> >>>> One question is if you wait on the CPU or the GPU for the new surface to become available?
> >>> It's based on polling dma-buf fds, i.e. CPU.
> >>>
> >>>> The former is a bit bad for latency and power management.
> >>> There isn't a choice for Wayland compositors in general, since there can be arbitrary other state which needs to be applied atomically together with the new buffer. (Though in theory, a compositor might get fancy and special-case surface commits which can be handled by waiting on the GPU)
> >>>
> >>> Latency is largely a matter of scheduling in the compositor. The latency incurred by the compositor shouldn't have to be more than single-digit milliseconds. (I've seen total latency from when the client starts processing a (static) frame to when it starts being scanned out as low as ~6 ms with https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1620 <https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1620>, lower than typical with Xorg)
> >> Well let me describe it like this:
> >>
> >> We have an use cases for 144 Hz guaranteed refresh rate. That
> >> essentially means that the client application needs to be able to spit
> >> out one frame/window content every ~6.9ms. That's tough, but doable.
> >>
> >> When you now add 6ms latency in the compositor that means the client
> >> application has only .9ms left for it's frame which is basically
> >> impossible to do.
> >>
> >> See for the user fences handling the display engine will learn to read
> >> sequence numbers from memory and decide on it's own if the old frame or
> >> the new one is scanned out. To get the latency there as low as possible.
> > This won't work with implicit sync at all.
> >
> > If you want to enable this use-case with driver magic and without the
> > compositor being aware of what's going on, the solution is EGLStreams.
> > Not sure we want to go there, but it's definitely a lot more feasible
> > than trying to stuff eglstreams semantics into dma-buf implicit
> > fencing support in a desperate attempt to not change compositors.
EGLStreams are a red herring here. Wayland has atomic state transactions, similar to the atomic KMS API. These semantics could be achieved even with EGLStreams, at least via additional EGL extensions.
Any fancy technique we're discussing here would have to be completely between the Wayland compositor and the kernel, transparent to anything else.
> > I still think the most reasonable approach here is that we wrap a
> > dma_fence compat layer/mode over new hw for existing
> > userspace/compositors. And then enable userspace memory fences and the
> > fancy new features those allow with a new model that's built for them.
>
> Yeah, that's basically the same direction I'm heading. Question is how
> to fix all those details.
>
> > Also even with dma_fence we could implement your model of staying with
> > the previous buffer (or an older buffer at that's already rendered),
> > but it needs explicit involvement of the compositor. At least without
> > adding eglstreams fd to the kernel and wiring up all the relevant
> > extensions.
>
> Question is do we already have some extension which allows different
> textures to be selected on the fly depending on some state?
>
>
> There is no such extension for sync objects, but it can be done with queries, like occlusion queries. There is also no timeout option and it can only do "if" and "if not", but not "if .. else"
The Wayland compositor would also need to know which texture the shader ended up sampling from.
--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and X developer
More information about the dri-devel
mailing list