[Intel-gfx] [PATCH] drm/i915: More vma fixups around unbind/destroy

Ben Widawsky ben at bwidawsk.net
Thu Aug 22 22:37:54 CEST 2013


On Thu, Aug 22, 2013 at 09:41:07PM +0200, Daniel Vetter wrote:

[snip]

> >
> > It's not clear from the commit message if this actually fixes the bug
> > for you (which you seemed to be able to reproduce). Did it?
> 
> Nope, just hard looking at the Oopses, haven't yet reproduced the bug
> here. But now that I have a theory what's busted that should help a
> bit with writing igts ...
> 

In that case, I think you need the same check in the failure path in
bind_to_vm. I have that fixed locally, but haven't actually tested it.

> >> ---
> >>  drivers/gpu/drm/i915/i915_gem.c | 15 +++++++--------
> >>  1 file changed, 7 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >> index ef92c69..3e855ff 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >> @@ -2616,6 +2616,9 @@ int i915_vma_unbind(struct i915_vma *vma)
> >>       drm_i915_private_t *dev_priv = obj->base.dev->dev_private;
> >>       int ret;
> >>
> >> +     /* For now we only ever use 1 vma per object */
> >> +     WARN_ON(!list_is_singular(&obj->vma_list));
> >> +
> >>       if (list_empty(&vma->vma_link))
> >>               return 0;
> >>
> >> @@ -2661,12 +2664,12 @@ int i915_vma_unbind(struct i915_vma *vma)
> >>       drm_mm_remove_node(&vma->node);
> >>
> >>  destroy:
> >> -     i915_gem_vma_destroy(vma);
> >> +     /* Keep the vma as a placeholder in the execbuffer reservation lists */
> >> +     if (!list_empty(&vma->exec_list))
> >> +             i915_gem_vma_destroy(vma);
> >
> > Please see my vma->busy question at the top. To me this looks like the
> > behavior I had with vma->busy where you don't unlink from the vma_list
> > (which actually defies the original convention I had about the
> > vma_list, whereby a vma exists on a vma list IFF it has space allocated
> > in some address space). I think this is necessary though, as there are
> > really three states:
> > 1. Bound
> > 2. Temporarily unbound <-- this is the problematic one
> > 3. Unbound
> 
> Yes.
> 
> >
> >>
> >>       /* Since the unbound list is global, only move to that list if
> >> -      * no more VMAs exist.
> >> -      * NB: Until we have real VMAs there will only ever be one */
> >> -     WARN_ON(!list_empty(&obj->vma_list));
> >> +      * no more VMAs exist. */
> >>       if (list_empty(&obj->vma_list))
> >>               list_move_tail(&obj->global_list, &dev_priv->mm.unbound_list);
> >>
> >> @@ -4171,10 +4174,6 @@ void i915_gem_vma_destroy(struct i915_vma *vma)
> >>       WARN_ON(vma->node.allocated);
> >>       list_del(&vma->vma_link);
> >>
> >> -     /* Keep the vma as a placeholder in the execbuffer reservation lists */
> >> -     if (!list_empty(&vma->exec_list))
> >> -             return;
> >> -
> >>       kfree(vma);
> >>  }
> >
> > If you really wan to remove this hunk, I think you should do it as a
> > revert (or remove it from the history completely if it isn't too late).
> > Instead though I think turning this into a WARN is still a valid thing
> > to do. Unless I've missed something. It shouldn't ever be safe to
> > destroy a VMA on an exec_list because we use the same list element in
> > eviction and if we accidentally destroy in eviction (called from
> > execbuf) we're in pretty big trouble.
> 
> I think I'll squash this in when merging with Chris' patch and fixup
> the commit message a bit. For more paranoia in the evict code Chris
> suggested to add a check in mark_free for list_empty(vma->exec_list).
> Would be good in a follow-up patch imo.

I have this exact same check in my current code to debug this, I think
it makes a lot of sense.

> 
> > Finally, I still think Chris' original suggestion of creating a new
> > member for eviction would make a lot of these bugs at least a little bit
> > easier to find. It's always hard to say though until you actually do it.
> 
> Yeah, we could just go back to the old ways of using the lru lists to
> unwind the eviction roaster (i.e. active/inactive vm lists now).
> That's also why I think we could just merge the active/inactive list
> together into one, since without that merge it'll be trouble. But when
> Chris merged the fair lru eviction code he frobbed it a bit to use the
> current temporary list.
> 
> Adding yet another list is imo wasteful ;-)

I guess we'll see how difficult the bugs are to fix. It uses extra
memory, that much is indisputable.


-- 
Ben Widawsky, Intel Open Source Technology Center



More information about the Intel-gfx mailing list