[PATCH 4/4] RFC: dma-buf: Add an API for importing sync files (v6)

Thu May 27 07:21:17 UTC 2021

Am 26.05.21 um 18:52 schrieb Daniel Vetter:
> [SNIP]
>> I can make the relatively simple usecases work, but it really feels
>> like in practice we'll end up with massive oversync in some fairly
>> complex usecases, and we'll regret not having had it from the start,
>> plus people will just rely on implicit sync for longer because it has
>> better (more parallel) semantics in some usecases.
> Things fall apart in implicit sync if you have more than one logical
> writer into the same buffer. Trivial example is two images in one
> buffer, but you could also do funky stuff like interleaved/tiled
> rendering with _indepedent_ consumers. If the consumers are not
> independent, then you can again just stuff the two writer fences into
> the exclusive slot with the new ioctl (they'll get merged without
> additional overhead into one fence array fence).
>
> And the fundamental thing is: This is just not possible with implicit
> sync. There's only one fence slot (even if that resolves to an array
> of fences for all the producers), so anytime you do multiple
> independent things in the same buffer you either:
> - must split the buffers so there's again a clear&unique handoff at
> each stage of the pipeline
> - or use explicit sync

Well exactly that is the problem we had with amdgpu and why we came up 
with the special handling there.

And you don't even need two images in one buffer, just special hardware 
which handles multiple writers gracefully is sufficient. The simplest 
example is a depth buffer, but we also have things like ordered append 
for ring buffers.

> So in your example, options are
> - per-client buffers, which you then blend into a composite buffer to
> handle the N implicit fences from N buffers into a single implicit
> fence for libva conversion. This single buffer then also allows you to
> again fan out to M libva encoders, or whatever it is that you fancy
> - explicit fencing and clients render into a single buffer with no
> copying, and libva encodes from that single buffer (but again needs
> explicit fences or it all comes crashing down)
>
> There's really no option C where you somehow do multiple implicitly
> fenced things into a single buffer and expect it to work out in
> parallel.

You could also fallback to a dummy submission, e.g. compose the image 
with multiple engines in parallel and then make a single dummy 
submission to collect all the shared fences into the single exclusive fence.

But this needs an extra IOCTL and unfortunately the stack above also 
needs to know when to make that dummy submission.

Christian.

> -Daniel