[Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

Tue Apr 20 19:16:51 UTC 2021

On Tue, Apr 20, 2021 at 1:54 PM Daniel Vetter <daniel at ffwll.ch> wrote:
>
> On Tue, Apr 20, 2021 at 7:45 PM Daniel Stone <daniel at fooishbar.org> wrote:
>
> > And something more concrete:
> >
> > dma_fence.
> >
> > This already has all of the properties described above. Kernel-wise, it already devolves to CPU-side signaling when it crosses device boundaries. We need to support it roughly forever since it's been plumbed so far and so wide. Any primitive which is acceptable for winsys-like usage which crosses so many device/subsystem/process/security boundaries has to meet the same requirements. So why reinvent something which looks so similar, and has the same requirements of the kernel babysitting completion, providing little to no benefit for that difference?
>
> So I can mostly get behind this, except it's _not_ going to be
> dma_fence. That thing has horrendous internal ordering constraints
> within the kernel, and the one thing that doesn't allow you is to make
> a dma_fence depend upon a userspace fence.

Let me elaborate on this a bit.  One of the problems I mentioned
earlier is the conflation of fence types inside the kernel.  dma_fence
is used for solving two different semi-related but different problems:
client command synchronization and memory residency synchronization.
In the old implicit GL world, we conflated these two and thought we
were providing ourselves a service.  Not so much....

It's all well and good to say that we should turn the memory fence
into a dma_fence and throw a timeout on it.  However, these
window-system sync primitives, as you said, have to be able to be
shared across everything.  In particular, we have to be able to share
them with drivers that don't make a good separation between command
and memory synchronization.

Let's say we're rendering on ANV with memory fences and presenting on
some USB display adapter whose kernel driver is a bit old-school.
When we pass that fence to the other driver via a sync_file or
similar, that driver may shove that dma_fence into the dma_resv on
some buffer somewhere.  Then our client, completely unaware of
internal kernel dependencies, binds that buffer into its address space
and kicks off another command buffer.  So i915 throws in a dependency
on that dma_resv which contains the previously created dma_fence and
refuses to execute any more command buffers until it signals.
Unfortunately, unbeknownst to i915, that command buffer which the
client kicked off after doing that bind was required for signaling the
memory fence on which our first dma_fence depends.  Deadlock.

Sure, we put a timeout on the dma_fence and it will eventually fire
and unblock everything.  However, there's one very important point
that's easy to miss here:  Neither i915 nor the client did anything
wrong in the above scenario.  The Vulkan footgun approach works
because there are a set of rules and, if you follow those rules,
you're guaranteed everything works.  In the above scenario, however,
the client followed all of the rules and got a deadlock anyway.  We
can't have that.

> But what we can do is use the same currently existing container
> objects like drm_syncobj or sync_file (timeline syncobj would fit best
> tbh), and stuff a userspace fence behind it. The only trouble is that
> currently timeline syncobj implement vulkan's spec, which means if you
> build a wait-before-signal deadlock, you'll wait forever. Well until
> the user ragequits and kills your process.

Yeah, it may be that this approach can be made to work.  Instead of
reusing dma_fence, maybe we can reuse syncobj and have another form of
syncobj which is a memory fence, a value to wait on, and a timeout.

--Jason