[Intel-gfx] [RFC PATCH] mm, oom: distinguish blockable mode for mmu notifiers

Michal Hocko mhocko at kernel.org
Fri Jun 22 16:19:05 UTC 2018


[Hmm, the cc list got mangled somehow - you have just made many people
to work for suse ;) and to kvack.org in the preious one - fixed up
hopefully]

On Fri 22-06-18 17:07:21, Chris Wilson wrote:
> Quoting Michal Hocko (2018-06-22 16:57:16)
> > On Fri 22-06-18 16:36:49, Chris Wilson wrote:
> > > Quoting Michal Hocko (2018-06-22 16:02:42)
> > > > Hi,
> > > > this is an RFC and not tested at all. I am not very familiar with the
> > > > mmu notifiers semantics very much so this is a crude attempt to achieve
> > > > what I need basically. It might be completely wrong but I would like
> > > > to discuss what would be a better way if that is the case.
> > > > 
> > > > get_maintainers gave me quite large list of people to CC so I had to trim
> > > > it down. If you think I have forgot somebody, please let me know
> > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > > index 854bd51b9478..5285df9331fa 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> > > > @@ -112,10 +112,11 @@ static void del_object(struct i915_mmu_object *mo)
> > > >         mo->attached = false;
> > > >  }
> > > >  
> > > > -static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> > > > +static int i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> > > >                                                        struct mm_struct *mm,
> > > >                                                        unsigned long start,
> > > > -                                                      unsigned long end)
> > > > +                                                      unsigned long end,
> > > > +                                                      bool blockable)
> > > >  {
> > > >         struct i915_mmu_notifier *mn =
> > > >                 container_of(_mn, struct i915_mmu_notifier, mn);
> > > > @@ -124,7 +125,7 @@ static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> > > >         LIST_HEAD(cancelled);
> > > >  
> > > >         if (RB_EMPTY_ROOT(&mn->objects.rb_root))
> > > > -               return;
> > > > +               return 0;
> > > 
> > > The principle wait here is for the HW (even after fixing all the locks
> > > to be not so coarse, we still have to wait for the HW to finish its
> > > access).
> > 
> > Is this wait bound or it can take basically arbitrary amount of time?
> 
> Arbitrary. It waits for the last operation in the queue that needs that
> set of backing pages, and that queue is unbounded and not even confined
> to the local driver. (Though each operation should be bounded to be
> completed within an interval or be cancelled, that interval is on the
> order of 10s!)

OK, I see. We should rather not wait that long so backoff is just
better. The whole point of the oom_reaper is to tear down and free some
memory. We do not really need to reclaim all of it.

It would be great if we could do something like - kick the tear down of
the device memory but have it done in the background. We wouldn't tear
the vma down in that case but the whole process would start at least.
I am not sure something like that is possible.
 
> > > The first pass would be then to not do anything here if
> > > !blockable.
> > 
> > something like this? (incremental diff)
> 
> Yup.

Cool, I will start with that because even that is an improvement from
the oom_reaper POV.

Thanks!
-- 
Michal Hocko
SUSE Labs


More information about the amd-gfx mailing list