[Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

Daniel Vetter daniel at ffwll.ch
Tue Apr 20 19:29:53 UTC 2021

On Tue, Apr 20, 2021 at 9:14 PM Daniel Stone <daniel at fooishbar.org> wrote:
> Hi,
> On Tue, 20 Apr 2021 at 19:54, Daniel Vetter <daniel at ffwll.ch> wrote:
>> So I can mostly get behind this, except it's _not_ going to be
>> dma_fence. That thing has horrendous internal ordering constraints
>> within the kernel, and the one thing that doesn't allow you is to make
>> a dma_fence depend upon a userspace fence.
>> But what we can do is use the same currently existing container
>> objects like drm_syncobj or sync_file (timeline syncobj would fit best
>> tbh), and stuff a userspace fence behind it. The only trouble is that
>> currently timeline syncobj implement vulkan's spec, which means if you
>> build a wait-before-signal deadlock, you'll wait forever. Well until
>> the user ragequits and kills your process.
>> So for winsys we'd need to be able to specify the wait timeout
>> somewhere for waiting for that dma_fence to materialize (plus the
>> submit thread, but userspace needs that anyway to support timeline
>> syncobj) if you're importing an untrusted timeline syncobj. And I
>> think that's roughly it.
> Right. The only way you get to materialise a dma_fence from an execbuf is that you take a hard timeout, with a penalty for not meeting that timeout. When I say dma_fence I mean dma_fence, because there is no extant winsys support for drm_symcobj, so this is greenfield: the winsys gets to specify its terms of engagement, and again, we've been the orange/green-site enemies of users for quite some time already, so we're happy to continue doing so. If the actual underlying primitive is not a dma_fence, and compositors/protocol/clients need to eat a bunch of typing to deal with a different primitive which offers the same guarantees, then that's fine, as long as there is some tangible whole-of-system benefit.

So atm sync_file doesn't support future fences, but we could add the
support for those there. And since vulkan doesn't really say anything
about those, we could make the wait time out by default.

> How that timeout is actually realised is an implementation detail. Whether it's a property of the last GPU job itself that the CPU-side driver can observe, or that the kernel driver guarantees that there is a GPU job launched in parallel which monitors the memory-fence status and reports back through a mailbox/doorbell, or the CPU-side driver enqueues kqueue work for $n milliseconds' time to check the value in memory and kill the context if it doesn't meet expectations - whatever. I don't believe any of those choices meaningfully impact on kernel driver complexity relative to the initial proposal, but they do allow us to continue to provide the guarantees we do today when buffers cross security boundaries.

The thing is, you can't do this in drm/scheduler. At least not without
splitting up the dma_fence in the kernel into separate memory fences
and sync fences, and the work to get there is imo just not worth it.
We've bikeshedded this ad nauseaum for vk timeline syncobj, and the
solution was to have the submit thread in the userspace driver.

It won't really change anything wrt what applications can observe from
the egl/gl side of things though.

> There might well be an argument for significantly weakening those security boundaries and shifting the complexity from the DRM scheduler into userspace compositors. So far though, I have yet to see that argument made coherently.

Ah we've had that argument. We have moved that into userspace as part
of vk submit threads. It aint pretty, but it's better than the other
option :-)
Daniel Vetter
Software Engineer, Intel Corporation

More information about the mesa-dev mailing list