[Linaro-mm-sig] thoughts of looking at android fences

Wed Oct 2 11:13:46 PDT 2013

On Wed, Oct 2, 2013 at 12:35 AM, Maarten Lankhorst
<maarten.lankhorst at canonical.com> wrote:
> The timeline is similar to what I called a fence context. Each command stream on a gpu can have a context. Because
> nvidia hardware can have 4095 separate timelines, I didn't want to keep the bookkeeping for each timeline, although
> I guess that it's already done. Maybe it could be done in a unified way for each driver, making a transition to
> timelines that can be used by android easier.
>
> I did not have an explicit syncpoint addition, but I think that sync points + sync_fence were similar to what I did with
> my dma-fence stuff, except slightly different.
> In my approach the dma-fence is signaled after all sync_points are done AND the queued commands are executed.
> In effect the dma-fence becomes the next syncpoint, depending on all previous dma-fence syncpoints.

What makes queued command completion different than any other sync point?

> An important thing to note is that dma-fence is kernelspace only, so it might be better to rename it to syncpoint,
> and use fence for the userspace interface.
>
> A big difference is locking, I assume in my code that most fences emitted are not waited on, so the fastpath
> fence_signal is a test_and_set_bit plus test_bit. A single lock is used for the waitqueue and callbacks,
> with the waitqueue being implemented internally as an asynchronous callback.

I assume very little lock contention so the performance impact is
negligible.  Also, because sync_pts on a timeline are strictly
ordered, it's necessary to check all active pts on a timeline signal.
A future optimization could involve keeping active pts in a sorted
list or other data structure so that you only need to iterate over the
pts that are about to signal.  So far we've not seen any bottlenecks
here so I've kept it simple.

> The lock is provided by the driver, which makes adding support for old hardware that has no reliable way of notifying completion of events easier.

I'm a bit confused here how it's possible to implement sync on
hardware with "no reliable way of notifying completion of events."  That
seems like a non-starter to me.

> I avoided using global locks, but I think for debugfs support I may end up having to add some.

As did I, except for debugfs support.

> One thing though: is it really required to merge fences? It seems to me that if I add a poll callback userspace
> could simply do a poll on a list of fences. This would give userspace all the information it needs about each
> individual fence.

This is very important.  It greatly simplifies they way the userspace
deals with fences.  It means that it only has to track one fd per
buffer and both the kernel API and userspace RPC apis don't have to
take a variable number of fds per buffer.  FWIW the android sync
driver already implements poll.

> Depending on feedback I'll try reflashing my nexus 7 to stock android, and work on trying to convert android
> syncpoints to dma-fence, which I'll probably rename to syncpoints.

I thought the plan decided at plumbers was to investigate backing
dma_buf with the android sync solution not the other way around.  It
doesn't make sense to me to take a working, tested, end-to-end
solution with a released compositing system built around it, throw it
out, and replace it with new un-tested code to
support a system which is not yet built.

Cheers,
   Erik