[RFC] dma-fence: dma-buf synchronization (v2)

Fri Jul 13 10:35:26 PDT 2012

Hi Rob,

Yes, sorry we've been a bit slack progressing KDS publicly. Your
approach looks interesting and seems like it could enable both implicit
and explicit synchronization. A good compromise.

> From: Rob Clark <rob at ti.com>
> 
> A dma-fence can be attached to a buffer which is being filled or
> consumed by hw, to allow userspace to pass the buffer without waiting
> to another device.  For example, userspace can call page_flip ioctl to
> display the next frame of graphics after kicking the GPU but while the
> GPU is still rendering.  The display device sharing the buffer with the
> GPU would attach a callback to get notified when the GPU's rendering-
> complete IRQ fires, to update the scan-out address of the display,
> without having to wake up userspace.
> 
> A dma-fence is transient, one-shot deal.  It is allocated and attached
> to dma-buf's list of fences.  When the one that attached it is done,
> with the pending operation, it can signal the fence removing it from
> the dma-buf's list of fences:
> 
>   + dma_buf_attach_fence()
>   + dma_fence_signal()

It would be useful to have two lists of fences, those around writes to
the buffer and those around reads. The idea being that if you only want
to read from a buffer, you don't need to wait for fences around other
read operations, you only need to wait for the "last" writer fence. If
you do want to write to the buffer however, you need to wait for all
the read fences and the last writer fence. The use-case is when EGL
swap behaviour is EGL_BUFFER_PRESERVED. You have the display controller
reading the buffer with its fence defined to be signalled when it is
no-longer scanning out that buffer. It can only stop scanning out that
buffer when it is given another buffer to scan-out. If that next buffer
must be rendered by copying the currently scanned-out buffer into it
(one possible option for implementing EGL_BUFFER_PRESERVED) then you
essentially deadlock if the scan-out job blocks the "render the next
frame" job. 

There's probably variations of this idea, perhaps you only need a flag
to indicate if a fence is around a read-only or rw access?

> The intention is to provide a userspace interface (presumably via
> eventfd) later, to be used in conjunction with dma-buf's mmap support
> for sw access to buffers (or for userspace apps that would prefer to
> do their own synchronization).

>From our experience of our own KDS, we've come up with an interesting
approach to synchronizing userspace applications which have a buffer
mmap'd. We wanted to avoid userspace being able to block jobs running
on hardware while still allowing userspace to participate. Our original
idea was to have a lock/unlock ioctl interface on a dma_buf but have
a timeout whereby the application's lock would be broken if held for
too long. That at least bounded how long userspace could potentially
block hardware making progress, though was pretty "harsh".

The approach we have now settled on is to instead only allow an
application to wait for all jobs currently pending for a buffer. So
there's no way userspace can prevent anything else from using a
buffer, other than not issuing jobs which will use that buffer.
Also, the interface we settled on was to add a poll handler to
dma_buf, that way userspace can select() on multiple dma_buff
buffers in one syscall. It can also chose if it wants to wait for
only the last writer fence, I.e. wait until it can read (POLLIN)
or wait for all fences as it wants to write to the buffer (POLLOUT).
We kinda like this, but does restrict the utility a little. An idea
worth considering anyway.

My other thought is around atomicity. Could this be extended to
(safely) allow for hardware devices which might want to access
multiple buffers simultaneously? I think it probably can with
some tweaks to the interface? An atomic function which does 
something like "give me all the fences for all these buffers 
and add this fence to each instead/as-well-as"?

Cheers,

Tom