[Intel-gfx] [PATCH] drm/i915: Move VMAs to inactive as request are retired

Thu Nov 26 06:41:16 PST 2015

On Thu, Nov 26, 2015 at 02:07:57PM +0000, Tvrtko Ursulin wrote:
> 
> On 26/11/15 10:01, Daniel Vetter wrote:
> >On Wed, Nov 25, 2015 at 10:16:37AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 24/11/15 17:47, Daniel Vetter wrote:
> >>>On Mon, Nov 23, 2015 at 03:12:35PM +0000, Tvrtko Ursulin wrote:
> >>>>From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>>>
> >>>>Current code moves _any_ VMA to the inactive list only when
> >>>>_all_ rendering on an object (so from any context or VM) has
> >>>>been completed.
> >>>>
> >>>>This creates an un-natural situation where the context (and
> >>>>VM) destructors can run with VMAs still on the respective
> >>>>active list.
> >>>>
> >>>>Change here is to move VMAs to the inactive list as the
> >>>>requests are getting retired.
> >>>>
> >>>>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>>>Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92638
> >>>>Testcase: igt/gem_request_retire/retire-vma-not-inactive
> >>>>Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> >>>>Cc: Chris Wilson <chris at chris-wilson.co.uk>
> >>>>---
> >>>>  drivers/gpu/drm/i915/i915_gem.c | 17 ++++++++++++++++-
> >>>>  1 file changed, 16 insertions(+), 1 deletion(-)
> >>>>
> >>>>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>>>index cd7e102720f4..47a743246d2c 100644
> >>>>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>>>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>>>@@ -2413,17 +2413,32 @@ static void
> >>>>  i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
> >>>>  {
> >>>>  	struct i915_vma *vma;
> >>>>+	struct i915_address_space *vm;
> >>>>
> >>>>  	RQ_BUG_ON(obj->last_read_req[ring] == NULL);
> >>>>  	RQ_BUG_ON(!(obj->active & (1 << ring)));
> >>>>
> >>>>  	list_del_init(&obj->ring_list[ring]);
> >>>>-	i915_gem_request_assign(&obj->last_read_req[ring], NULL);
> >>>>
> >>>>  	if (obj->last_write_req && obj->last_write_req->ring->id == ring)
> >>>>  		i915_gem_object_retire__write(obj);
> >>>>
> >>>>  	obj->active &= ~(1 << ring);
> >>>>+
> >>>>+	if (obj->last_read_req[ring]->ctx->ppgtt)
> >>>>+		vm = &obj->last_read_req[ring]->ctx->ppgtt->base;
> >>>>+	else
> >>>>+		vm = &obj->last_read_req[ring]->i915->gtt.base;
> >>>>+
> >>>>+	list_for_each_entry(vma, &obj->vma_list, vma_link) {
> >>>>+		if (vma->vm == vm &&
> >>>>+		    vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL &&
> >>>>+		    !list_empty(&vma->mm_list))
> >>>>+			list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
> >>>>+	}
> >>>
> >>>This is only a partial solution since with schedulers and semaphores and a
> >>>few depencies on a given object, but in different vm you can still end up
> >>>with an object that is idle in a vm, but slipped through here.
> >>
> >>Could you describe the exact scenario you had in mind? I won't pretend it
> >>this is simple code and I have it all figured out.
> >
> >Well tbh I don't fully understand what exactly your code will do in all
> >corner-cases since there's a lot of ifs and special cases. But
> >fundamentally the problem is that an object can be active in a given vm
> >and there's no request pointing at the corresponding context in either
> >last_read_req or last_write_req. It works like this:
> >
> >- ctx A uses obj 1
> >- ctx B uses obj 1 on the same engine
> >
> >Your code above will miss to retire obj 1 on ctx 1's vm, so if you then
> >destroy ctx A you'll hit the nice warn above (presuming ctx B keeps obj 1
> >busy for all that time). So not even scheduler needed.
> 
> Ah yes... I was even close to figuring this one out but got too confused and
> decided I am imagining things...
> 
> >Fundamentally if you really want to accurately keep track of vma instaed
> >of obj activeness, then you need to have per-vma tracking of all of it,
> >i.e. all the last_*_req stuff.
> >
> >Without that we essentially only keep track of an lru on each vm, and the
> >active/inactive split is totally bogus (well not quite, but since it
> >reflects obj->active there's not really any value, but it does create tons
> >of confusion).
> >
> >To unconfuse this we'd need to have proper vma active tracking (not sure
> >whether that's worth it) or just merge the per-vm active/inactive lists
> >since they're really just one big list.
> 
> ... so all in all best to drop this for now. Since it is a half solution,
> plus Chris says Synmark, if I read it right, likes to use an object from a
> zillion of contexts simultaneously so would be hurt by traversing the
> obj->vma_list here.

Yeah, walking obj->vma_list pretty much means unclean design.
Unfortunately we still have a pile of those left from the original ppgtt
enabling, and the fixes for those are burried in Chris' tree somewhere.

> >>>Also, checking for the view type is some strange layering. Why that?
> >>
> >>!ppgtt to skip potential other views I thought, no?
> >
> >The vma lrus should be irrespective of ggtt vs. ppgtt or exactly what kind
> >of view it is. If we exclude special case everywhere the might pop up but
> >should the code will become unreadable (and the abstraction useless). What
> >do you fear could blow up without this check?
> 
> Nothing really, since other view types will never end up on the active list.
> Just a never ending dilemma of whether to be generic or optimal. :)

Well in this case there shouldn't be a dilemma: Since the view special
case can't happen and doesn't matter, being generic also means less code
and hence more optimal ;-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch