[Intel-gfx] [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object
Chris Wilson
chris at chris-wilson.co.uk
Tue Nov 1 09:56:00 UTC 2016
On Tue, Nov 01, 2016 at 09:43:53AM +0000, Tvrtko Ursulin wrote:
>
> On 31/10/2016 10:26, Chris Wilson wrote:
> >With full-ppgtt one of the main bottlenecks is the lookup of the VMA
> >underneath the object. For execbuf there is merit in having a very fast
> >direct lookup of ctx:handle to the vma using a hashtree, but that still
> >leaves a large number of other lookups. One way to speed up the lookup
> >would be to use a rhashtable, but that requires extra allocations and
> >may exhibit poor worse case behaviour. An alternative is to use an
> >embedded rbtree, i.e. no extra allocations and deterministic behaviour,
> >but at the slight cost of O(lgN) lookups (instead of O(1) for
> >rhashtable). The major of such tree will be very shallow and so not much
> >slower, and still scales much, much better than the current unsorted
> >list.
> >
> >References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
> >Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >---
> > drivers/gpu/drm/i915/i915_drv.h | 1 +
> > drivers/gpu/drm/i915/i915_gem_gtt.c | 80 +++++++++++++++++++++++++------------
> > drivers/gpu/drm/i915/i915_gem_gtt.h | 1 +
> > 3 files changed, 57 insertions(+), 25 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >index 7a18bf66f797..e923d6596cac 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.h
> >+++ b/drivers/gpu/drm/i915/i915_drv.h
> >@@ -2230,6 +2230,7 @@ struct drm_i915_gem_object {
> >
> > /** List of VMAs backed by this object */
> > struct list_head vma_list;
> >+ struct rb_root vma_tree;
> >
> > /** Stolen memory for this object, instead of being backed by shmem. */
> > struct drm_mm_node *stolen;
> >diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >index e7afad585929..aa2d21c41091 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> >@@ -3399,6 +3399,7 @@ void i915_vma_destroy(struct i915_vma *vma)
> > GEM_BUG_ON(!i915_vma_is_closed(vma));
> > GEM_BUG_ON(vma->fence);
> >
> >+ rb_erase(&vma->obj_node, &vma->obj->vma_tree);
> > list_del(&vma->vm_link);
> > if (!i915_vma_is_ggtt(vma))
> > i915_ppgtt_put(i915_vm_to_ppgtt(vma->vm));
> >@@ -3416,12 +3417,33 @@ void i915_vma_close(struct i915_vma *vma)
> > WARN_ON(i915_vma_unbind(vma));
> > }
> >
> >+static inline int vma_compare(struct i915_vma *vma,
> >+ struct i915_address_space *vm,
> >+ const struct i915_ggtt_view *view)
> >+{
> >+ GEM_BUG_ON(view && !i915_vma_is_ggtt(vma));
> >+
> >+ if (vma->vm != vm)
> >+ return vma->vm - vm;
>
> This can theoretically overflow so I think long should be the return type.
Yeah, the same thought cross through my mind. gcc should be smart enough
to only use the cc flags, but still...
I miss the spaceship operator.
> >+
> >+ if (!view)
> >+ return vma->ggtt_view.type;
> >+
> >+ if (vma->ggtt_view.type != view->type)
> >+ return vma->ggtt_view.type - view->type;
> >+
> >+ return memcmp(&vma->ggtt_view.params,
> >+ &view->params,
> >+ sizeof(view->params));
> >+}
> >+
> > static struct i915_vma *
> > __i915_vma_create(struct drm_i915_gem_object *obj,
> > struct i915_address_space *vm,
> > const struct i915_ggtt_view *view)
> > {
> > struct i915_vma *vma;
> >+ struct rb_node *rb, **p;
> > int i;
> >
> > GEM_BUG_ON(vm->closed);
> >@@ -3455,33 +3477,28 @@ __i915_vma_create(struct drm_i915_gem_object *obj,
> >
> > if (i915_is_ggtt(vm)) {
> > vma->flags |= I915_VMA_GGTT;
> >+ list_add(&vma->obj_link, &obj->vma_list);
> > } else {
> > i915_ppgtt_get(i915_vm_to_ppgtt(vm));
> >+ list_add_tail(&vma->obj_link, &obj->vma_list);
> > }
>
> Will this make a difference to anything since it is not used for
> lookups? I suppose it makes a nicer debugfs output if nothing else
> so not complaining.
It does. The code assumes GGTT entries are first (some of my patches
came from a time before the regression was introduced by Daniel because
he claimed it made no difference but had not benchmarked it!).
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list