[Intel-gfx] [PATCH 6/6] drm/i915: Store the vma in an rbtree under the object

Tue Nov 1 09:45:37 UTC 2016

On Tue, Nov 01, 2016 at 09:20:33AM +0000, Tvrtko Ursulin wrote:
> 
> On 01/11/2016 09:06, Chris Wilson wrote:
> >On Tue, Nov 01, 2016 at 08:54:09AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 01/11/2016 08:50, Chris Wilson wrote:
> >>>On Tue, Nov 01, 2016 at 08:41:01AM +0000, Tvrtko Ursulin wrote:
> >>>>
> >>>>On 31/10/2016 10:26, Chris Wilson wrote:
> >>>>>With full-ppgtt one of the main bottlenecks is the lookup of the VMA
> >>>>>underneath the object. For execbuf there is merit in having a very fast
> >>>>>direct lookup of ctx:handle to the vma using a hashtree, but that still
> >>>>>leaves a large number of other lookups. One way to speed up the lookup
> >>>>>would be to use a rhashtable, but that requires extra allocations and
> >>>>>may exhibit poor worse case behaviour. An alternative is to use an
> >>>>>embedded rbtree, i.e. no extra allocations and deterministic behaviour,
> >>>>>but at the slight cost of O(lgN) lookups (instead of O(1) for
> >>>>>rhashtable). The major of such tree will be very shallow and so not much
> >>>>>slower, and still scales much, much better than the current unsorted
> >>>>>list.
> >>>>>
> >>>>>References: https://bugs.freedesktop.org/show_bug.cgi?id=87726
> >>>>>Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>>>
> >>>>I suggest leaving this out of the mini-series which fixes the
> >>>>recently introduced bugs.
> >>>
> >>>Just review it now, it's a two year old bug that was impacting the test
> >>>cases I was running... :-p Then we won't have to review it next week ;)
> >>
> >>I'll review it, just split it out please so the CI fire can be
> >>extinguished first.
> >
> >Sure, I'm pushing the fixes as soon as they've been baked. There's a
> >regression fix that builds upon this patch for multiple timelines (the
> >linear walk in reservation_object is particularly nasty for scaling).
> 
> Which one is that, the reservation_object_get_fences_rcu in
> i915_gem_object_wait_reservation?

Mostly reservation_object_add_shared_fence()

The reservation object is not autopruning, those fences stay there until
replaced. So as an object is used by more and more contexts (such as
many GL clients sharing the same resource) that shared array keeps on
growing, and we have to walk over it on every execbuf, both to update
the fence and indeed often to compute the await.

We can use a radixtree/idr or a rhashtable here since they share the RCU
share lookup required for reservation_object (and couple in the seqlock
for update tracking), the stumbling point is the that
reservation_object_reserve_shared_fence() guarrantees the next
add_shared_fence() will not fail (ENOMEM). Both idr/radixtree have
preload, but both use per-cpu caches and so require preemption disabling
from the point of reservation to use; i.e. not suitable. Changing them
to preload onto a specific tree seems reasonably easy except for
convincing core of their merit. (There changing idr looks like it would
be easier.) radixtree is easier to use than idr, but iterating the
radixtree for (get_fences_rcu) is not as cheap as I expect. :|
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre