[PATCH 4/4] RFC: dma-buf: Add an API for importing sync files (v6)

Wed May 26 13:52:46 UTC 2021

Hi,
Just making sure this thread doesn't get too short ...

On Wed, 26 May 2021 at 12:08, Daniel Stone <daniel at fooishbar.org> wrote:
> On Mon, 24 May 2021 at 18:11, Jason Ekstrand <jason at jlekstrand.net> wrote:
> > I'd like to address this one as it's a comment you've made several
> > times.  Once you've fixed raw X11 (not just XWayland) and a new
> > release has been made (hah!) and is shipping in distros with said
> > support, then we can talk.  Sorry if that comes off as overly snarky
> > but that's reality that we (driver devs) are living with.  To make
> > things even worse, when we're in Vulkan land (as opposed to GL), we
> > can't tell up-front whether or not our window-system supports foo
> > fences and adjust accordingly.  We have to start up, begin rendering,
> > and only later figure out "oops, this one goes to X11".  We really
> > can't say things like "when running on modern Wayland, do things the
> > new way" because Vulkan doesn't have a concept of "running on" a
> > window system.
>
> Hey, no offence taken, and even if there was, there's no point denying
> the reality that we don't still have Wayland absolutely everywhere,
> and even when we do, there's still a Streams-sized elephant in the
> room that no-one wants to talk about.
>
> Ultimately though, there will always be two codepaths. Xorg (as
> opposed to Xwayland) is currently unmaintained, and I expect it to
> remain so forever. The last times we tried to plumb synchronisation
> through to native-Xorg, it collapsed in conflicting requests for
> rewriting the presentation mechanism (which very few people understand
> and even fewer people can make work reliably), and I don't see that
> changing any time soon. Plus the single biggest remaining use for
> native-Xorg is that it implements tearflips by blitting to a single
> internal frontbuffer which is currently being scanned out, which is
> going to be ... difficult ... to square with the goal of actually
> synchronising things.
>
> What I mean though is that I don't think it makes sense to have three
> design points.
>
> Previously, the only effort put into explicit synchronisation on the
> winsys side was using it was a means to surface the timings for both
> tracing and being able to calibrate compositor repaints. Given that we
> need to support interop with the implicit world forever, and given
> that there was no explicit interaction with the implicit resv slots
> (solved by this new uAPI), and given that the sync_file vs.
> drm_syncobj vs. ?? userspace fence story wasn't clear, it didn't seem
> like a practical benefit to sink time into supporting it, just that it
> ticked a 'yep we're doing things explicitly' box, which was not
> practically useful since EGL doesn't let you opt out of implicit
> semantics anyway.
>
> Now with the details having come out of AMD/Arm/Intel about future
> gens, and the driver-side pain being properly understood, as well as
> the new uAPI actually giving us clear benefit, there's a good reason
> to work on it. So we'll do that on the winsys side, and support the
> new explicitly-managed-implicit-slot world, and everyone's lives will
> be better. Which gives us two design points: unaware implicit-only
> users (Xorg, old Wayland, old GPU, media), and fully-aware
> extremely-explicit users (Mutter, Weston, wlroots, new GPU).

So I said why I think X11 is totally fine (sync on client, let the
server stay dumb and unsynchronised) for the new world, but not why
I'm so confident that it's totally fine for Wayland and we can do it
quickly, despite previously banging on about how much work it was and
why it was so difficult. You already know this Jason, but for the rest
of the class ...

All Wayland compositors I've ever seen have a strong separation
between the protocol-visible objects, and per-output repaint loops,
not least because this is strongly encoded into the protocol. Creating
a surface, attaching a buffer to it, and committing the surface state,
will only update a compositor list of the current state of the
protocol-visible objects.

At 'a good time' for each output (often next vblank minus a few
milliseconds), the compositor will effectively snapshot that state,
generate a scene graph from it, and go through repaint for that output
(GPU composition and/or KMS planes and/or media encode and/or RDP etc)
to turn that to light. This is unlike X11 where you can post rendering
commands and then call XGetImage to pull the pixel result back. We
also explicitly chose not to follow the X11 window of a global
co-ordinate space with windows explicitly treed down from a shared
root window: inter-surface operations are very much disconnected from
each other with no ordering guarantee, and since clients are totally
isolated from each other, there are no inter-client operations.

So that already gives us a clean break between clients posting state
and visible effect, with a lot of latitude on timing: you post some
updates, and a compositor which users actually want will make those
updates real at some point, but nothing is guaranteed.

There is one cut-out from inter-surface operations, which is our
subsurface protocol. An example of this is having your browser UI as
the primary window, which embeds a separate rendered window for each
tab, which further embeds a window with your pop-over ad^W^Wmedia
content. To make resizing and scrolling work without being all
Flash-on-Navigator jank, subsurfaces have a 'synchronised' mode, where
clients can make protocol-visible updates to their surfaces, but the
updates will not be recorded into the visible scene graph until
explicitly released by the topmost parent - think of it like a display
controller's 'go bit' or updating the tail pointer on a ring buffer,
where we record updates but they're stashed away until everything has
become coherent.

Every useful compositor already implements this subsurface protocol as
well, which gives us a _further_ gap between protocol-visible
client-recorded state and the scene graph which is used for painting
outputs. Handling wait-before-sync is still a lot of typing, and
difficult for corner cases, and will need experimentation in the real
world before we can all land on useful common semantics, but we've
already done the most conceptually difficult work, which was to
separate client/protocol-visible object state from the scene graph
that gets generated/snapshotted on every output update.

So that's why I'm bullish on how quickly we can move on the Wayland
side. It's a lot of typing, but we already have two clean breaks from
recorded client surface state -> scene graph -> pixels lit up, with
each break already handling fragmentation between different surface
groups rather than being a global freeze/thaw.

(Having typed all of that, I'm reminded that we never finished up the
EGL extension to opt out of implicit sync on a per-EGLImage basis,
which was mostly blocked on explicit sync being pretty immature in
drivers and none of the uAPI fitting together e.g. per-CS vs. per-BO
... now seems like a pretty apt time to revive that too.)

Cheers,
Daniel