[RFC 00/29] De-stage android's sync framework

Gustavo Padovan gustavo at padovan.org
Tue Jan 19 07:23:09 PST 2016


Hi Daniel, 

2016-01-19 Daniel Vetter <daniel at ffwll.ch>:

> On Fri, Jan 15, 2016 at 12:55:10PM -0200, Gustavo Padovan wrote:
> > From: Gustavo Padovan <gustavo.padovan at collabora.co.uk>
> > 
> > This patch series de-stage the sync framework, and in order to accomplish that
> > a bunch of cleanups/improvements on the sync and fence were made.
> > 
> > The sync framework contained some abstractions around struct fence and those
> > were removed in the de-staging process among other changes:
> > 
> > Userspace visible changes
> > -------------------------
> > 
> >  * The sw_sync file was moved from /dev/sw_sync to <debugfs>/sync/sw_sync. No
> >  other change.
> > 
> > Kernel API changes
> > ------------------
> > 
> >  * struct sync_timeline is now struct fence_timeline
> >  * sync_timeline_ops is now fence_timeline_ops and they now carry struct
> >  fence as parameter instead of struct sync_pt
> >  * a .cleanup() fence op was added to allow sync_fence to run a cleanup when
> >  the fence_timeline is destroyed
> >  * added fence_add_used_data() to pass a private point to struct fence. This
> >  pointer is sent back on the .cleanup op.
> >  * The sync timeline function were moved to be fence_timeline functions:
> > 	 - sync_timeline_create()	-> fence_timeline_create()
> > 	 - sync_timeline_get()		-> fence_timeline_get()
> > 	 - sync_timeline_put()		-> fence_timeline_put()
> > 	 - sync_timeline_destroy()	-> fence_timeline_destroy()
> > 	 - sync_timeline_signal()	-> fence_timeline_signal()
> > 
> >   * sync_pt_create() was replaced be fence_create_on_timeline()
> > 
> > Internal changes
> > ----------------
> > 
> >  * fence_timeline_ops was removed in favor of direct use fence_ops
> >  * fence default functions were created for fence_ops
> >  * removed structs sync_pt, sw_sync_timeline and sw_sync_pt
> 
> Bunch of fairly random comments all over:
> 
> - include/uapi/linux/sw_sync.h imo should be dropped, it's just a private
>   debugfs interface between fence fds and the testsuite. Since the plan is
>   to have the testcases integrated into the kernel tree too we don't need
>   a public header.
> 
> - similar for include/linux/sw_sync.h Imo that should all be moved into
>   sync_debug.c. Same for sw_sync.c, that should all land in sync_debug
>   imo, and made optional with a Kconfig option. At least we should reuse
>   CONFIG_DEBUGFS.

These two items sounds reasonable to me.

> 
> - fence_context and fence_timeline are really the same. timeline has some
>   super-basic support for doing sw-only fence timelines, but imo that's
>   not really worth keeping (and if so better to keep seperate in a
>   sw-fence.c or similar, like seqno-fence.c). The other main thing
>   timeline provides is support to clean up fences on a timeline. And imo
>   that cleanup should be done by the core fence support, not by the add-on
>   stuff.

Yes, they are. But I currently doesn't know how to merge them best, so I
decided to go for a RFC instead of trying some crazy solution touching
all fence_context users.

> 
> Interlude about fence cleanup on driver unload:
> 
> Working drivers imo should never call timeline_destroy when there's still
> an unsignalled fence around for that timeline/context. That just means
> they're broken and failed to clean up all the pending work. So the problem
> really is only what to do with fences where the driver disappeared, and
> for that we essentially need a fence_revoke() function (which could be
> called internally from timeline_free). So here's what I think
> timeline_free should do:
> 
> for_each_fence_on_timel() {
> 	WARN_ON(!fence_is_signalled());
> 
> 	fence_revoke(fence);
> }
> 
> Implementing fence_revoke is a bit tricky since we need to make sure the
> memory contained ->ops and similar stuff doesn't disappear. Simplest
> option might be to grab a temporary reference (using
> kref_get_unless_zero), and then exchange ->ops with one that has only a
> release function. We don't need anything else as long as all fence_*
> functions the kernel might call check for signalling correctly first
> (fence_wait is broken at least).
> 
> Or we just give up (for now) and declare module unload as slightly racy.
> dma-buf is similar. An intermediate option might be to at least add a
> THIS_MODULE reference to each fence (but that's a bit expensive ...).

I'd say we just give up for now as we don't have any driver using
timeline_destroy for now. So we could go for other improvements first.

> - back to timeline vs. context: I have no idea how to best clean up this
>   mess, but least painful option long-term is probably to switch over all
>   current users of fence_context_alloc to timelines and remove the plain
>   context interface.

Agreed.

> 
> - Imo the interface in include/linux/sync.h is duplicating too much of
>   fence.h. I think the only bits we need are the refcounting, creating,
>   fd-install and that's it. Plus a macro to loop over all the fences in a
>   sync_fence. With that drivers will only ever deal with a pile of
>   struct fence, making implicit fencing (using the fence list in dma-buf)
>   and explicit fencing (using the fence list in sync_fence) much more
>   similar.

Yes, most of the sync_fence waiting should not be exported. Drivers
should only wait for fence imo, not sync_fences.

> 
>   And we can easily do that since no internal users ;-)
> 
> - get_timeline_name and get_driver_name are imo too much indirection, just
>   add ->(drv_)name field to each of these.
> 
> - struct sync_fence is a major confusion imo against struct fence. It
>   made much more sense in the pure-android world where fence == sync_pt.
>   Maybe we can rename sync_fence to sync_fence_fd (a bit long, and fd is a
>   bit inaccurate), sync_file (like this best), fence_file (sounds silly
>   imo), or something else?

sync_file sounds good for me. fence_file feels like it a file for a
single fence but we may have many fences on one sync_file.

> 
> - I guess just not yet part of this rfc, but moving the testsuite and
>   adding kerneldoc for this is planned I guess? If you feel like I think
>   it'd be best. We pull the current dma-buf stuff into
>   device-drivers.tmpl, but it's completely lacking overview docs and all
>   that. And I'd like to duplicate at least the dma-buf/fence sections into
>   the gpu.tmpl docbook.

We have converted testsuite from android's libsync but we need to wait
for Google to re-license it to send it upstream.

kerneldoc is planned for sure, but I'd say it will be better to have
some users first, DRM for example.

> 
> - If we make timelines first class objects I think we could move some of
>   the fields from struct fence to struct fence_timeline. E.g. the ops
>   struct. That also makes it clearer that some of the vfuncs really should
>   be taking a struct fence_timeline *timeline instead of a struct fence
>   *fence as their primary parameter.

I'll keep that as a final goal and work RFC v2 and see how far we can
get.

	Gustavo


More information about the dri-devel mailing list