[Mesa-dev] Plumbing explicit synchronization through the Linux ecosystem
Marek Olšák
maraeo at gmail.com
Sat Mar 14 02:02:43 UTC 2020
There is no synchronization between processes (e.g. 3D app and compositor)
within X on AMD hw. It works because of some hacks in Mesa.
Marek
On Wed, Mar 11, 2020 at 1:31 PM Jason Ekstrand <jason at jlekstrand.net> wrote:
> All,
>
> Sorry for casting such a broad net with this one. I'm sure most people
> who reply will get at least one mailing list rejection. However, this
> is an issue that affects a LOT of components and that's why it's
> thorny to begin with. Please pardon the length of this e-mail as
> well; I promise there's a concrete point/proposal at the end.
>
>
> Explicit synchronization is the future of graphics and media. At
> least, that seems to be the consensus among all the graphics people
> I've talked to. I had a chat with one of the lead Android graphics
> engineers recently who told me that doing explicit sync from the start
> was one of the best engineering decisions Android ever made. It's
> also the direction being taken by more modern APIs such as Vulkan.
>
>
> ## What are implicit and explicit synchronization?
>
> For those that aren't familiar with this space, GPUs, media encoders,
> etc. are massively parallel and synchronization of some form is
> required to ensure that everything happens in the right order and
> avoid data races. Implicit synchronization is when bits of work (3D,
> compute, video encode, etc.) are implicitly based on the absolute
> CPU-time order in which API calls occur. Explicit synchronization is
> when the client (whatever that means in any given context) provides
> the dependency graph explicitly via some sort of synchronization
> primitives. If you're still confused, consider the following
> examples:
>
> With OpenGL and EGL, almost everything is implicit sync. Say you have
> two OpenGL contexts sharing an image where one writes to it and the
> other textures from it. The way the OpenGL spec works, the client has
> to make the API calls to render to the image before (in CPU time) it
> makes the API calls which texture from the image. As long as it does
> this (and maybe inserts a glFlush?), the driver will ensure that the
> rendering completes before the texturing happens and you get correct
> contents.
>
> Implicit synchronization can also happen across processes. Wayland,
> for instance, is currently built on implicit sync where the client
> does their rendering and then does a hand-off (via wl_surface::commit)
> to tell the compositor it's done at which point the compositor can now
> texture from the surface. The hand-off ensures that the client's
> OpenGL API calls happen before the server's OpenGL API calls.
>
> A good example of explicit synchronization is the Vulkan API. There,
> a client (or multiple clients) can simultaneously build command
> buffers in different threads where one of those command buffers
> renders to an image and the other textures from it and then submit
> both of them at the same time with instructions to the driver for
> which order to execute them in. The execution order is described via
> the VkSemaphore primitive. With the new VK_KHR_timeline_semaphore
> extension, you can even submit the work which does the texturing
> BEFORE the work which does the rendering and the driver will sort it
> out.
>
> The #1 problem with implicit synchronization (which explicit solves)
> is that it leads to a lot of over-synchronization both in client space
> and in driver/device space. The client has to synchronize a lot more
> because it has to ensure that the API calls happen in a particular
> order. The driver/device have to synchronize a lot more because they
> never know what is going to end up being a synchronization point as an
> API call on another thread/process may occur at any time. As we move
> to more and more multi-threaded programming this synchronization (on
> the client-side especially) becomes more and more painful.
>
>
> ## Current status in Linux
>
> Implicit synchronization in Linux works via a the kernel's internal
> dma_buf and dma_fence data structures. A dma_fence is a tiny object
> which represents the "done" status for some bit of work. Typically,
> dma_fences are created as a by-product of someone submitting some bit
> of work (say, 3D rendering) to the kernel. The dma_buf object has a
> set of dma_fences on it representing shared (read) and exclusive
> (write) access to the object. When work is submitted which, for
> instance renders to the dma_buf, it's queued waiting on all the fences
> on the dma_buf and and a dma_fence is created representing the end of
> said rendering work and it's installed as the dma_buf's exclusive
> fence. This way, the kernel can manage all its internal queues (3D
> rendering, display, video encode, etc.) and know which things to
> submit in what order.
>
> For the last few years, we've had sync_file in the kernel and it's
> plumbed into some drivers. A sync_file is just a wrapper around a
> single dma_fence. A sync_file is typically created as a by-product of
> submitting work (3D, compute, etc.) to the kernel and is signaled when
> that work completes. When a sync_file is created, it is guaranteed by
> the kernel that it will become signaled in finite time and, once it's
> signaled, it remains signaled for the rest of time. A sync_file is
> represented in UAPIs as a file descriptor and can be used with normal
> file APIs such as dup(). It can be passed into another UAPI which
> does some bit of queue'd work and the submitted work will wait for the
> sync_file to be triggered before executing. A sync_file also supports
> poll() if you want to wait on it manually.
>
> Unfortunately, sync_file is not broadly used and not all kernel GPU
> drivers support it. Here's a very quick overview of my understanding
> of the status of various components (I don't know the status of
> anything in the media world):
>
> - Vulkan: Explicit synchronization all the way but we have to go
> implicit as soon as we interact with a window-system. Vulkan has APIs
> to import/export sync_files to/from it's VkSemaphore and VkFence
> synchronization primitives.
> - OpenGL: Implicit all the way. There are some EGL extensions to
> enable some forms of explicit sync via sync_file but OpenGL itself is
> still implicit.
> - Wayland: Currently depends on implicit sync in the kernel (accessed
> via EGL/OpenGL). There is an unstable extension to allow passing
> sync_files around but it's questionable how useful it is right now
> (more on that later).
> - X11: With present, it has these "explicit" fence objects but
> they're always a shmfence which lets the X server and client do a
> userspace CPU-side hand-off without going over the socket (and
> round-tripping through the kernel). However, the only thing that
> fence does is order the OpenGL API calls in the client and server and
> the real synchronization is still implicit.
> - linux/i915/gem: Fully supports using sync_file or syncobj for explicit
> sync.
> - linux/amdgpu: Supports sync_file and syncobj but it still
> implicitly syncs sometimes due to it's internal memory residency
> handling which can lead to over-synchronization.
> - KMS: Implicit sync all the way. There are no KMS APIs which take
> explicit sync primitives.
> - v4l: ???
> - gstreamer: ???
> - Media APIs such as vaapi etc.: ???
>
>
> ## Chicken and egg problems
>
> Ok, this is where it starts getting depressing. I made the claim
> above that Wayland has an explicit synchronization protocol that's of
> questionable usefulness. I would claim that basically any bit of
> plumbing we do through window systems is currently of questionable
> usefulness. Why?
>
> From my perspective, as a Vulkan driver developer, I have to deal with
> the fact that Vulkan is an explicit sync API but Wayland and X11
> aren't. Unfortunately, the Wayland extension solves zero problems for
> me because I can't really use it unless it's implemented in all of the
> compositors. Until every Wayland compositor I care about my users
> being able to use (which is basically all of them) supports the
> extension, I have to continue carry around my pile of hacks to keep
> implicit sync and Vulkan working nicely together.
>
> From the perspective of a Wayland compositor (I used to play in this
> space), they'd love to implement the new explicit sync extension but
> can't. Sure, they could wire up the extension, but the moment they go
> to flip a client buffer to the screen directly, they discover that KMS
> doesn't support any explicit sync APIs. So, yes, they can technically
> implement the extension assuming the EGL stack they're running on has
> the sync_file extensions but any client buffers which come in using
> the explicit sync Wayland extension have to be composited and can't be
> scanned out directly. As a 3D driver developer, I absolutely don't
> want compositors doing that because my users will complain about
> performance issues due to the extra blit.
>
> Ok, so let's say we get KMS wired up with implicit sync. That solves
> all our problems, right? It does, right up until someone decides that
> they wan to screen capture their Wayland session via some hardware
> media encoder that doesn't support explicit sync. Now we have to
> plumb it all the way through the media stack, gstreamer, etc. Great,
> so let's do that! Oh, but gstreamer won't want to plumb it through
> until they're guaranteed that they can use explicit sync when
> displaying on X11 or Wayland. Are you seeing the problem?
>
> To make matters worse, since most things are doing implicit
> synchronization today, it's really easy to get your explicit
> synchronization wrong and never notice. If you forget to pass a
> sync_file into one place (say you never notice KMS doesn't support
> them), it will probably work anyway thanks to all the implicit sync
> that's going on elsewhere.
>
> So, clearly, we all need to go write piles of code that we can't
> actually properly test until everyone else has written their piece and
> then we use explicit sync if and only if all components support it.
> Really? We're going to do multiple years of development and then just
> hope it works when we finally flip the switch? That doesn't sound
> like a good plan to me.
>
>
> ## A proposal: Implicit and explicit sync together
>
> How to solve all these chicken-and-egg problems is something I've been
> giving quite a bit of thought (and talking with many others about) in
> the last couple of years. One motivation for this is that we have to
> deal with a mismatch in Vulkan. Another motivation is that I'm
> becoming increasingly unhappy with the way that synchronization,
> memory residency, and command submission are inherently intertwined in
> i915 and would like to break things apart. Towards that end, I have
> an actual proposal.
>
> A couple weeks ago, I sent a series of patches to the dri-devel
> mailing list which adds a pair of new ioctls to dma-buf which allow
> userspace to manually import or export a sync_file from a dma-buf.
> The idea is that something like a Wayland compositor can switch to
> 100% explicit sync internally once the ioctl is available. If it gets
> buffers in from a client that doesn't use the explicit sync extension,
> it can pull a sync_file from the dma-buf and use that exactly as it
> would a sync_file passed via the explicit sync extension. When it
> goes to scan out a user buffer and discovers that KMS doesn't accept
> sync_files (or if it tries to use that pesky media encoder no one has
> converted), it can take it's sync_file for display and stuff it into
> the dma-buf before handing it to KMS.
>
> Along with the kernel patches, I've also implemented support for this
> in the Vulkan WSI code used by ANV and RADV. With those patches, the
> only requirement on the Vulkan drivers is that you be able to export
> any VkSemaphore as a sync_file and temporarily import a sync_file into
> any VkFence or VkSemaphore. As long as that works, the core Vulkan
> driver only ever sees explicit synchronization via sync_file. The WSI
> code uses these new ioctls to translate the implicit sync of X11 and
> Wayland to the explicit sync the Vulkan driver wants.
>
> I'm hoping (and here's where I want a sanity check) that a simple API
> like this will allow us to finally start moving the Linux ecosystem
> over to explicit synchronization one piece at a time in a way that's
> actually correct. (No Wayland explicit sync with compositors hoping
> KMS magically works even though it doesn't have a sync_file API.)
> Once some pieces in the ecosystem start moving, there will be
> motivation to start moving others and maybe we can actually build the
> momentum to get most everything converted.
>
> For reference, you can find the kernel RFC patches and mesa MR here:
>
> https://lists.freedesktop.org/archives/dri-devel/2020-March/258833.html
>
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037
>
> At this point, I welcome your thoughts, comments, objections, and
> maybe even help/review. :-)
>
> --Jason Ekstrand
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20200313/6fd88390/attachment-0001.htm>
More information about the dri-devel
mailing list