[Intel-gfx] [PATCH 7/7] dma-buf/sw-sync: Use an rbtree to sort fences in the timeline

Thu Jun 29 18:29:28 UTC 2017

Quoting Sean Paul (2017-06-29 19:10:11)
> On Thu, Jun 29, 2017 at 01:59:30PM +0100, Chris Wilson wrote:
> > Reduce the list iteration when incrementing the timeline by storing the
> > fences in increasing order.
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Sumit Semwal <sumit.semwal at linaro.org>
> > Cc: Sean Paul <seanpaul at chromium.org>
> > Cc: Gustavo Padovan <gustavo at padovan.org>
> > ---
> >  drivers/dma-buf/sw_sync.c    | 49 ++++++++++++++++++++++++++++++++++++++------
> >  drivers/dma-buf/sync_debug.h |  5 +++++
> >  2 files changed, 48 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> > index e51fe11bbbea..8cef5d345316 100644
> > --- a/drivers/dma-buf/sw_sync.c
> > +++ b/drivers/dma-buf/sw_sync.c
> > @@ -96,6 +96,7 @@ static struct sync_timeline *sync_timeline_create(const char *name)
> >       obj->context = dma_fence_context_alloc(1);
> >       strlcpy(obj->name, name, sizeof(obj->name));
> >  
> > +     obj->pt_tree = RB_ROOT;
> >       INIT_LIST_HEAD(&obj->pt_list);
> >       spin_lock_init(&obj->lock);
> >  
> > @@ -142,9 +143,13 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
> >  
> >       obj->value += inc;
> >  
> > -     list_for_each_entry_safe(pt, next, &obj->pt_list, link)
> > -             if (dma_fence_is_signaled_locked(&pt->base))
> > -                     list_del_init(&pt->link);
> > +     list_for_each_entry_safe(pt, next, &obj->pt_list, link) {
> > +             if (!dma_fence_is_signaled_locked(&pt->base))
> > +                     break;
> > +
> > +             list_del_init(&pt->link);
> > +             rb_erase(&pt->node, &obj->pt_tree);
> > +     }
> >  
> >       spin_unlock_irq(&obj->lock);
> >  }
> > @@ -174,8 +179,38 @@ static struct sync_pt *sync_pt_create(struct sync_timeline *obj,
> >       INIT_LIST_HEAD(&pt->link);
> >  
> >       spin_lock_irq(&obj->lock);
> > -     if (!dma_fence_is_signaled_locked(&pt->base))
> > -             list_add_tail(&pt->link, &obj->pt_list);
> > +     if (!dma_fence_is_signaled_locked(&pt->base)) {
> > +             struct rb_node **p = &obj->pt_tree.rb_node;
> > +             struct rb_node *parent = NULL;
> > +
> > +             while (*p) {
> > +                     struct sync_pt *other;
> > +                     int cmp;
> > +
> > +                     parent = *p;
> > +                     other = rb_entry(parent, typeof(*pt), node);
> > +                     cmp = value - other->base.seqno;
> 
> We're imposing an implicit limitation on userspace here that points cannot be
> > INT_MAX apart (since they'll be in the wrong order in the tree). 

That's not a new limitation. It's an artifact of using the u32 timeline/seqno.

> I wonder how much this patch will actually save, given that the number of active points
> on a given timeline should be small. Do we have any evidence that this
> optimization is warranted?

The good news is that for empty/small trees, the overhead is tiny, less
than the cost of the syscall. I honestly don't know who uses sw_sync and
so would benefit. I figured I would throw it out here since it was
trivial.
-Chris