[Intel-gfx] [PATCH 7/7] dma-buf/sw-sync: Use an rbtree to sort fences in the timeline
Chris Wilson
chris at chris-wilson.co.uk
Thu Jun 29 18:29:28 UTC 2017
Quoting Sean Paul (2017-06-29 19:10:11)
> On Thu, Jun 29, 2017 at 01:59:30PM +0100, Chris Wilson wrote:
> > Reduce the list iteration when incrementing the timeline by storing the
> > fences in increasing order.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Sumit Semwal <sumit.semwal at linaro.org>
> > Cc: Sean Paul <seanpaul at chromium.org>
> > Cc: Gustavo Padovan <gustavo at padovan.org>
> > ---
> > drivers/dma-buf/sw_sync.c | 49 ++++++++++++++++++++++++++++++++++++++------
> > drivers/dma-buf/sync_debug.h | 5 +++++
> > 2 files changed, 48 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> > index e51fe11bbbea..8cef5d345316 100644
> > --- a/drivers/dma-buf/sw_sync.c
> > +++ b/drivers/dma-buf/sw_sync.c
> > @@ -96,6 +96,7 @@ static struct sync_timeline *sync_timeline_create(const char *name)
> > obj->context = dma_fence_context_alloc(1);
> > strlcpy(obj->name, name, sizeof(obj->name));
> >
> > + obj->pt_tree = RB_ROOT;
> > INIT_LIST_HEAD(&obj->pt_list);
> > spin_lock_init(&obj->lock);
> >
> > @@ -142,9 +143,13 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
> >
> > obj->value += inc;
> >
> > - list_for_each_entry_safe(pt, next, &obj->pt_list, link)
> > - if (dma_fence_is_signaled_locked(&pt->base))
> > - list_del_init(&pt->link);
> > + list_for_each_entry_safe(pt, next, &obj->pt_list, link) {
> > + if (!dma_fence_is_signaled_locked(&pt->base))
> > + break;
> > +
> > + list_del_init(&pt->link);
> > + rb_erase(&pt->node, &obj->pt_tree);
> > + }
> >
> > spin_unlock_irq(&obj->lock);
> > }
> > @@ -174,8 +179,38 @@ static struct sync_pt *sync_pt_create(struct sync_timeline *obj,
> > INIT_LIST_HEAD(&pt->link);
> >
> > spin_lock_irq(&obj->lock);
> > - if (!dma_fence_is_signaled_locked(&pt->base))
> > - list_add_tail(&pt->link, &obj->pt_list);
> > + if (!dma_fence_is_signaled_locked(&pt->base)) {
> > + struct rb_node **p = &obj->pt_tree.rb_node;
> > + struct rb_node *parent = NULL;
> > +
> > + while (*p) {
> > + struct sync_pt *other;
> > + int cmp;
> > +
> > + parent = *p;
> > + other = rb_entry(parent, typeof(*pt), node);
> > + cmp = value - other->base.seqno;
>
> We're imposing an implicit limitation on userspace here that points cannot be
> > INT_MAX apart (since they'll be in the wrong order in the tree).
That's not a new limitation. It's an artifact of using the u32 timeline/seqno.
> I wonder how much this patch will actually save, given that the number of active points
> on a given timeline should be small. Do we have any evidence that this
> optimization is warranted?
The good news is that for empty/small trees, the overhead is tiny, less
than the cost of the syscall. I honestly don't know who uses sw_sync and
so would benefit. I figured I would throw it out here since it was
trivial.
-Chris
More information about the Intel-gfx
mailing list