[PATCH 4/4] RFC: dma-buf: Add an API for importing sync files (v6)

Wed May 26 15:13:00 UTC 2021

On Wed, 26 May 2021 at 14:44, Daniel Vetter <daniel at ffwll.ch> wrote:
> On Wed, May 26, 2021 at 02:08:19PM +0100, Daniel Stone wrote:
> > Are you saying that if a compositor imports a client-provided dmabuf
> > as an EGLImage to use as a source texture for its rendering, and then
> > provides it to VA-API or V4L2 to use as a media encode source (both
> > purely read-only ops), that these will both serialise against each
> > other? Like, my media decode job won't begin execution until the
> > composition read has fully retired?
> >
> > If so, a) good lord that hurts, and b) what are shared fences actually ... for?
>
> Shared is shared, I just meant to say that we always add the shared fence.
> So an explicit ioctl to add more shared fences is kinda pointless.
>
> So yeah on a good driver this will run in parallel. On a not-so-good
> driver (which currently includes amdgpu and panfrost) this will serialize,
> because those drivers don't have the concept of a non-exclusive fence for
> such shared buffers (amdgpu does not sync internally, but will sync as
> soon as it's cross-drm_file).

When you say 'we always add the shared fence', add it to ... where?
And which shared fence? (I'm going to use 'fence' below to refer to
anything from literal sync_file to timeline-syncobj to userspace
fence.)

I'll admit that I've typed out an argument twice for always export
from excl+shared, and always import to excl, results in oversync. And
I keep tying myself in knots trying to do it. It's arguably slightly
contrived, but here's my third attempt ...

Vulkan Wayland client, full-flying-car-sync Wayland protocol,
Vulkan-based compositor. Part of the contract when the server exposes
that protocol is that it guarantees to do explicit sync in both
directions, so the client provides a fence at QueueSubmit time and the
server provides one back when releasing the image for return to ANI.
Neither side ever record fences into the dma_resv because they've
opted out by being fully explicit-aware.

Now add media encode out on the side because you're streaming. The
compositor knows this is a transition between explicit and implicit
worlds, so it imports the client's fence into the exclusive dma_resv
slot, which makes sense: the media encode has to sync against the
client work, but is indifferent to the parallel compositor work. The
shared fence is exported back out so the compositor can union the
encode-finished fence with its composition-finished fence to send back
to the client with release/ANI.

Now add a second media encode because you want a higher-quality local
capture to upload to YouTube later on. The compositor can do the exact
same import/export dance, and the two encodes can safely run in
parallel. Which is good.

Where it starts to become complex is: what if your compositor is fully
explicit-aware but your clients aren't, so your compositor has more
import/export points to record into the resv? What if you aren't
actually a compositor but a full-blown media pipeline, where you have
a bunch of threads all launching reads in parallel, to the extent
where it's not practical to manage implicit/explicit transitions
globally, but each thread has to more pessimistically import and
export around each access?

I can make the relatively simple usecases work, but it really feels
like in practice we'll end up with massive oversync in some fairly
complex usecases, and we'll regret not having had it from the start,
plus people will just rely on implicit sync for longer because it has
better (more parallel) semantics in some usecases.

Cheers,
Daniel