[Intel-gfx] [PATCH] drm/i915/userptr: Avoid struct_mutex recursion for mmu_invalidate_range_start

Daniel Vetter daniel at ffwll.ch
Fri Oct 26 13:48:07 UTC 2018


On Fri, Oct 26, 2018 at 11:02 AM Daniel Vetter <daniel at ffwll.ch> wrote:
>
> On Thu, Oct 25, 2018 at 9:39 PM Chris Wilson <chris at chris-wilson.co.uk> wrote:
> >
> > Quoting Daniel Vetter (2018-10-25 20:16:50)
> > > On Thu, Oct 25, 2018 at 01:45:42PM +0100, Chris Wilson wrote:
> > > > Since commit 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu
> > > > notifiers") we have been able to report failure from
> > > > mmu_invalidate_range_start which allows us to use a trylock on the
> > > > struct_mutex to avoid potential recursion and report -EBUSY instead.
> > > > Furthermore, this allows us to pull the work into the main callback and
> > > > avoid the sleight-of-hand in using a workqueue to avoid lockdep.
> > > >
> > > > However, not all paths to mmu_invalidate_range_start are prepared to
> > > > handle failure, so instead of reporting the recursion, deal with it.
> > > >
> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108375
> > > > References: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers")
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_drv.h         |   4 +-
> > > >  drivers/gpu/drm/i915/i915_gem.c         |  12 +-
> > > >  drivers/gpu/drm/i915/i915_gem_userptr.c | 219 ++++++++++++------------
> > > >  3 files changed, 115 insertions(+), 120 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > > > index 2d7761b8ac07..c3b94c3f930f 100644
> > > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > @@ -3076,8 +3076,8 @@ enum i915_mm_subclass { /* lockdep subclass for obj->mm.lock */
> > > >       I915_MM_SHRINKER
> > > >  };
> > > >
> > > > -void __i915_gem_object_put_pages(struct drm_i915_gem_object *obj,
> > > > -                              enum i915_mm_subclass subclass);
> > > > +int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj,
> > > > +                             enum i915_mm_subclass subclass);
> > > >  void __i915_gem_object_invalidate(struct drm_i915_gem_object *obj);
> > > >
> > > >  enum i915_map_type {
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > > index 93d09282710d..de7ccb3eb7b8 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > > @@ -2454,17 +2454,18 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
> > > >       return pages;
> > > >  }
> > > >
> > > > -void __i915_gem_object_put_pages(struct drm_i915_gem_object *obj,
> > > > -                              enum i915_mm_subclass subclass)
> > > > +int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj,
> > > > +                             enum i915_mm_subclass subclass)
> > > >  {
> > > >       struct sg_table *pages;
> > > > +     int ret = -EBUSY;
> > > >
> > > >       if (i915_gem_object_has_pinned_pages(obj))
> > > > -             return;
> > > > +             return -EBUSY;
> > > >
> > > >       GEM_BUG_ON(obj->bind_count);
> > > >       if (!i915_gem_object_has_pages(obj))
> > > > -             return;
> > > > +             return 0;
> > > >
> > > >       /* May be called by shrinker from within get_pages() (on another bo) */
> > > >       mutex_lock_nested(&obj->mm.lock, subclass);
> > > > @@ -2480,8 +2481,11 @@ void __i915_gem_object_put_pages(struct drm_i915_gem_object *obj,
> > > >       if (!IS_ERR(pages))
> > > >               obj->ops->put_pages(obj, pages);
> > > >
> > > > +     ret = 0;
> > > >  unlock:
> > > >       mutex_unlock(&obj->mm.lock);
> > > > +
> > > > +     return ret;
> > > >  }
> > > >
> > > >  bool i915_sg_trim(struct sg_table *orig_st)
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > > index 2c9b284036d1..a8f3c304db55 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > > @@ -50,79 +50,84 @@ struct i915_mmu_notifier {
> > > >       struct hlist_node node;
> > > >       struct mmu_notifier mn;
> > > >       struct rb_root_cached objects;
> > > > -     struct workqueue_struct *wq;
> > > > +     struct i915_mm_struct *mm;
> > > >  };
> > > >
> > > >  struct i915_mmu_object {
> > > >       struct i915_mmu_notifier *mn;
> > > >       struct drm_i915_gem_object *obj;
> > > >       struct interval_tree_node it;
> > > > -     struct list_head link;
> > > > -     struct work_struct work;
> > > > -     bool attached;
> > > >  };
> > > >
> > > > -static void cancel_userptr(struct work_struct *work)
> > > > -{
> > > > -     struct i915_mmu_object *mo = container_of(work, typeof(*mo), work);
> > > > -     struct drm_i915_gem_object *obj = mo->obj;
> > > > -     struct work_struct *active;
> > > > -
> > > > -     /* Cancel any active worker and force us to re-evaluate gup */
> > > > -     mutex_lock(&obj->mm.lock);
> > > > -     active = fetch_and_zero(&obj->userptr.work);
> > > > -     mutex_unlock(&obj->mm.lock);
> > > > -     if (active)
> > > > -             goto out;
> > > > -
> > > > -     i915_gem_object_wait(obj, I915_WAIT_ALL, MAX_SCHEDULE_TIMEOUT, NULL);
> > > > -
> > > > -     mutex_lock(&obj->base.dev->struct_mutex);
> > > > -
> > > > -     /* We are inside a kthread context and can't be interrupted */
> > > > -     if (i915_gem_object_unbind(obj) == 0)
> > > > -             __i915_gem_object_put_pages(obj, I915_MM_NORMAL);
> > > > -     WARN_ONCE(i915_gem_object_has_pages(obj),
> > > > -               "Failed to release pages: bind_count=%d, pages_pin_count=%d, pin_global=%d\n",
> > > > -               obj->bind_count,
> > > > -               atomic_read(&obj->mm.pages_pin_count),
> > > > -               obj->pin_global);
> > > > -
> > > > -     mutex_unlock(&obj->base.dev->struct_mutex);
> > > > -
> > > > -out:
> > > > -     i915_gem_object_put(obj);
> > > > -}
> > > > -
> > > >  static void add_object(struct i915_mmu_object *mo)
> > > >  {
> > > > -     if (mo->attached)
> > > > +     if (!RB_EMPTY_NODE(&mo->it.rb))
> > > >               return;
> > > >
> > > >       interval_tree_insert(&mo->it, &mo->mn->objects);
> > > > -     mo->attached = true;
> > > >  }
> > > >
> > > >  static void del_object(struct i915_mmu_object *mo)
> > > >  {
> > > > -     if (!mo->attached)
> > > > +     if (RB_EMPTY_NODE(&mo->it.rb))
> > > >               return;
> > > >
> > > >       interval_tree_remove(&mo->it, &mo->mn->objects);
> > > > -     mo->attached = false;
> > > > +     RB_CLEAR_NODE(&mo->it.rb);
> > > > +}
> > > > +
> > > > +static void
> > > > +__i915_gem_userptr_set_active(struct drm_i915_gem_object *obj, bool value)
> > > > +{
> > > > +     struct i915_mmu_object *mo = obj->userptr.mmu_object;
> > > > +
> > > > +     /*
> > > > +      * During mm_invalidate_range we need to cancel any userptr that
> > > > +      * overlaps the range being invalidated. Doing so requires the
> > > > +      * struct_mutex, and that risks recursion. In order to cause
> > > > +      * recursion, the user must alias the userptr address space with
> > > > +      * a GTT mmapping (possible with a MAP_FIXED) - then when we have
> > > > +      * to invalidate that mmaping, mm_invalidate_range is called with
> > > > +      * the userptr address *and* the struct_mutex held.  To prevent that
> > > > +      * we set a flag under the i915_mmu_notifier spinlock to indicate
> > > > +      * whether this object is valid.
> > > > +      */
> > > > +     if (!mo)
> > > > +             return;
> > > > +
> > > > +     spin_lock(&mo->mn->lock);
> > > > +     if (value)
> > > > +             add_object(mo);
> > > > +     else
> > > > +             del_object(mo);
> > > > +     spin_unlock(&mo->mn->lock);
> > > > +}
> > > > +
> > > > +static struct mutex *__i915_mutex_lock_recursive(struct mutex *m)
> > > > +{
> > > > +     switch (mutex_trylock_recursive(m)) {
> > > > +     default:
> > > > +     case MUTEX_TRYLOCK_FAILED:
> > > > +             mutex_lock(m);
> > > > +     case MUTEX_TRYLOCK_SUCCESS:
> > > > +             return m;
> > > > +
> > > > +     case MUTEX_TRYLOCK_RECURSIVE:
> > > > +             return ERR_PTR(-EEXIST);
> > > > +     }
> > > >  }
> > > >
> > > >  static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> > > > -                                                    struct mm_struct *mm,
> > > > -                                                    unsigned long start,
> > > > -                                                    unsigned long end,
> > > > -                                                    bool blockable)
> > > > +                                                   struct mm_struct *mm,
> > > > +                                                   unsigned long start,
> > > > +                                                   unsigned long end,
> > > > +                                                   bool blockable)
> > > >  {
> > > >       struct i915_mmu_notifier *mn =
> > > >               container_of(_mn, struct i915_mmu_notifier, mn);
> > > > -     struct i915_mmu_object *mo;
> > > >       struct interval_tree_node *it;
> > > > -     LIST_HEAD(cancelled);
> > > > +     struct mutex *unlock = NULL;
> > > > +     int ret = 0;
> > > >
> > > >       if (RB_EMPTY_ROOT(&mn->objects.rb_root))
> > > >               return 0;
> > > > @@ -133,11 +138,16 @@ static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> > > >       spin_lock(&mn->lock);
> > > >       it = interval_tree_iter_first(&mn->objects, start, end);
> > > >       while (it) {
> > > > +             struct drm_i915_gem_object *obj;
> > > > +             bool has_pages = false;
> > > > +
> > > >               if (!blockable) {
> > > > -                     spin_unlock(&mn->lock);
> > > > -                     return -EAGAIN;
> > > > +                     ret = -EAGAIN;
> > > > +                     break;
> > > >               }
> > > > -             /* The mmu_object is released late when destroying the
> > > > +
> > > > +             /*
> > > > +              * The mmu_object is released late when destroying the
> > > >                * GEM object so it is entirely possible to gain a
> > > >                * reference on an object in the process of being freed
> > > >                * since our serialisation is via the spinlock and not
> > > > @@ -146,21 +156,44 @@ static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> > > >                * use-after-free we only acquire a reference on the
> > > >                * object if it is not in the process of being destroyed.
> > > >                */
> > > > -             mo = container_of(it, struct i915_mmu_object, it);
> > > > -             if (kref_get_unless_zero(&mo->obj->base.refcount))
> > > > -                     queue_work(mn->wq, &mo->work);
> > > > +             obj = container_of(it, struct i915_mmu_object, it)->obj;
> > > > +             if (!kref_get_unless_zero(&obj->base.refcount)) {
> > > > +                     it = interval_tree_iter_next(it, start, end);
> > > > +                     continue;
> > > > +             }
> > > > +             spin_unlock(&mn->lock);
> > > >
> > > > -             list_add(&mo->link, &cancelled);
> > > > -             it = interval_tree_iter_next(it, start, end);
> > > > +             /* Cancel any pending worker and force us to re-evaluate gup */
> > > > +             mutex_lock_nested(&obj->mm.lock, I915_MM_SHRINKER);
> > >
> > > mutex_lock_nested is meant for static nesting. This here is rather
> > > dynamic, and I don't really see where the guarantee is that the classic
> > > deadlock isn't possible:
> >
> > It's meant to be after put_pages == 0 to ensure it only applied to a
> > different object. A bit more involved to do it exactly how I want, that
> > is to basically incorporate it into put_pages itself. Actually, if it is
> > in the put_pages it should just work...
>
> So for the shrinker I guess you can make this work. For my taste
> there's not enough GEM_WARN_ON to check the invariants of the nesting
> scheme (I assume it is list_empty(obj->mm.link) -> use I915_MM_NORMAL,
> and I915_MM_SHRINKER for anyone who discovers an object through the
> obj->mm.link lists, but would be nice to double-check that every time
> we take the obj->mm.lock with a GEM_WARN_ON).

After a bit more reading around I think this isn't enough, and we also
rely on list_del(obj->mm.link) being serialized by dev->struct_mutex.
But I'm not entirely sure. We definitely seem to rely on the magic
"holding dev->struct_mutex extends the lifetime of anything with a
non-empty obj->mm.link" on the shrinker side still.
-Daniel

> But for the mmu notifier I'm not seeing that same guarantee, and
> didn't spot any other, so not clear why this works. And assuming I'm
> reading the code correctly, we do insert the object into the rb tree
> already when gup worker is still doing it's thing.
>
> Or maybe there's another invariant that guarnatees the nesting, but I
> checked a few and they all don't seem to work (like pages_pin_count).
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the Intel-gfx mailing list