[Linaro-mm-sig] [RFC] Synchronizing access to buffers shared with dma-buf between drivers/devices

Thu Jun 7 04:35:16 PDT 2012

> >>> The bigger issue is the previous point about how to deal
> >>> with cases where the CPU doesn't really need to get involved as an
> >>> intermediary.
> >>>
> >>> CPU fallback access to the buffer is the only legit case where we
> >>> need a standardized API to userspace (since CPU access isn't already
> >>> associated w/ some other kernel device file where some extra ioctl
> >>> can be added)
> >>
> >> The CPU case will still need to wait on an arbitrarily backed sync
> >> primitive.  It shouldn't need to know if it's backed by the gpu,
> >> camera, or dsp.
> >
> > Right, this is the one place we definitely need something.. some
> > userspace code would just get passed a dmabuf file descriptor and
> > want to mmap it and do something, without really knowing where it
> > came from.  I *guess* we'll have to add some ioctl's to the dmabuf
> > fd.
> 
> I personally favor having sync primitives have their own anon inode
> vs. strictly coupling them with dma_buf.

I think this is really the crux of the matter - do we associate sync
objects with buffers or not. The approach ARM are suggesting _is_ to
associate the sync objects with the buffer and do this by adding
kds_resource* as a member of struct dma_buf. The main reason I want
to do this is because it doesn't require changes to existing
interfaces. Specifically, DRM/KMS & v4l2. These user/kernel interfaces
already allow userspace to specify the handle of a buffer the driver
should perform an operation on. What dma_buf has done is allowed those
driver-specific buffer handles to be exported from one driver and
imported into another. While new ioctls have been added to the v4l2 &
DRM interfaces for dma_buf, they have only been to allow the import &
export of driver-specific buffer objects. Once imported as a driver
specific buffer object, existing ioctls are re-used to perform
operations on those buffers (at least this is what PRIME does for DRM,
I'm not so sure about v4l2?). But my point is that no new "page flip
to this dma_buf fd" ioctl has been added to KMS, you use the existing
drm_mode_crtc_page_flip and specify an fb_id which has been imported
from a dma_buf.

If we associate sync objects with buffers, none of those device
specific ioctls which perform operations on buffer objects need to
be modified. It's just that internally, those drivers use kds or
something similar to make sure they don't tread on each other's
toes.

The alternate is to not associate sync objects with buffers and
have them be distinct entities, exposed to userspace. This gives
userpsace more power and flexibility and might allow for use-cases
which an implicit synchronization mechanism can't satisfy - I'd
be curious to know any specifics here. However, every driver which
needs to participate in the synchronization mechanism will need
to have its interface with userspace modified to allow the sync
objects to be passed to the drivers. This seemed like a lot of
work to me, which is why I prefer the implicit approach. However
I don't actually know what work is needed and think it should be
explored. I.e. How much work is it to add explicit sync object
support to the DRM & v4l2 interfaces?

E.g. I believe DRM/GEM's job dispatch API is "in-order"
in which case it might be easy to just add "wait for this fence"
and "signal this fence" ioctls. Seems like vmwgfx already has
something similar to this already? Could this work over having
to specify a list of sync objects to wait on and another list
of sync objects to signal for every operation (exec buf/page
flip)? What about for v4l2?

I guess my other thought is that implicit vs explicit is not
mutually exclusive, though I'd guess there'd be interesting
deadlocks to have to debug if both were in use _at the same
time_. :-)

Cheers,

Tom