[Intel-gfx] [PATCH] drm/i915: More vma fixups around unbind/destroy
Ben Widawsky
ben at bwidawsk.net
Thu Aug 22 22:37:54 CEST 2013
On Thu, Aug 22, 2013 at 09:41:07PM +0200, Daniel Vetter wrote:
[snip]
> >
> > It's not clear from the commit message if this actually fixes the bug
> > for you (which you seemed to be able to reproduce). Did it?
>
> Nope, just hard looking at the Oopses, haven't yet reproduced the bug
> here. But now that I have a theory what's busted that should help a
> bit with writing igts ...
>
In that case, I think you need the same check in the failure path in
bind_to_vm. I have that fixed locally, but haven't actually tested it.
> >> ---
> >> drivers/gpu/drm/i915/i915_gem.c | 15 +++++++--------
> >> 1 file changed, 7 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >> index ef92c69..3e855ff 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >> @@ -2616,6 +2616,9 @@ int i915_vma_unbind(struct i915_vma *vma)
> >> drm_i915_private_t *dev_priv = obj->base.dev->dev_private;
> >> int ret;
> >>
> >> + /* For now we only ever use 1 vma per object */
> >> + WARN_ON(!list_is_singular(&obj->vma_list));
> >> +
> >> if (list_empty(&vma->vma_link))
> >> return 0;
> >>
> >> @@ -2661,12 +2664,12 @@ int i915_vma_unbind(struct i915_vma *vma)
> >> drm_mm_remove_node(&vma->node);
> >>
> >> destroy:
> >> - i915_gem_vma_destroy(vma);
> >> + /* Keep the vma as a placeholder in the execbuffer reservation lists */
> >> + if (!list_empty(&vma->exec_list))
> >> + i915_gem_vma_destroy(vma);
> >
> > Please see my vma->busy question at the top. To me this looks like the
> > behavior I had with vma->busy where you don't unlink from the vma_list
> > (which actually defies the original convention I had about the
> > vma_list, whereby a vma exists on a vma list IFF it has space allocated
> > in some address space). I think this is necessary though, as there are
> > really three states:
> > 1. Bound
> > 2. Temporarily unbound <-- this is the problematic one
> > 3. Unbound
>
> Yes.
>
> >
> >>
> >> /* Since the unbound list is global, only move to that list if
> >> - * no more VMAs exist.
> >> - * NB: Until we have real VMAs there will only ever be one */
> >> - WARN_ON(!list_empty(&obj->vma_list));
> >> + * no more VMAs exist. */
> >> if (list_empty(&obj->vma_list))
> >> list_move_tail(&obj->global_list, &dev_priv->mm.unbound_list);
> >>
> >> @@ -4171,10 +4174,6 @@ void i915_gem_vma_destroy(struct i915_vma *vma)
> >> WARN_ON(vma->node.allocated);
> >> list_del(&vma->vma_link);
> >>
> >> - /* Keep the vma as a placeholder in the execbuffer reservation lists */
> >> - if (!list_empty(&vma->exec_list))
> >> - return;
> >> -
> >> kfree(vma);
> >> }
> >
> > If you really wan to remove this hunk, I think you should do it as a
> > revert (or remove it from the history completely if it isn't too late).
> > Instead though I think turning this into a WARN is still a valid thing
> > to do. Unless I've missed something. It shouldn't ever be safe to
> > destroy a VMA on an exec_list because we use the same list element in
> > eviction and if we accidentally destroy in eviction (called from
> > execbuf) we're in pretty big trouble.
>
> I think I'll squash this in when merging with Chris' patch and fixup
> the commit message a bit. For more paranoia in the evict code Chris
> suggested to add a check in mark_free for list_empty(vma->exec_list).
> Would be good in a follow-up patch imo.
I have this exact same check in my current code to debug this, I think
it makes a lot of sense.
>
> > Finally, I still think Chris' original suggestion of creating a new
> > member for eviction would make a lot of these bugs at least a little bit
> > easier to find. It's always hard to say though until you actually do it.
>
> Yeah, we could just go back to the old ways of using the lru lists to
> unwind the eviction roaster (i.e. active/inactive vm lists now).
> That's also why I think we could just merge the active/inactive list
> together into one, since without that merge it'll be trouble. But when
> Chris merged the fair lru eviction code he frobbed it a bit to use the
> current temporary list.
>
> Adding yet another list is imo wasteful ;-)
I guess we'll see how difficult the bugs are to fix. It uses extra
memory, that much is indisputable.
--
Ben Widawsky, Intel Open Source Technology Center
More information about the Intel-gfx
mailing list