[Intel-gfx] [PATCH 4/6] drm/i915: Set our shrinker->batch to 4096 (~16MiB)

Wed Aug 16 13:55:04 UTC 2017

Quoting Joonas Lahtinen (2017-08-16 14:39:00)
> On Sat, 2017-08-12 at 12:51 +0100, Chris Wilson wrote:
> > Prefer to defer activating our GEM shrinker until we have a few
> > megabytes to free; or we have accumulated sufficient mempressure by
> > deferring the reclaim to force a shrink. The intent is that because our
> > objects may typically be large, we are too effective at shrinking and
> > are not rewarded for freeing more pages than the batch. It will also
> > defer the initial shrinking to hopefully put it at a lower priority than
> > say the buffer cache (although it will balance out over a number of
> > reclaims, with GEM being more bursty).
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_shrinker.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > index 5b8bc0e4f336..8bb17e9a52de 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > @@ -461,6 +461,7 @@ void i915_gem_shrinker_init(struct drm_i915_private *dev_priv)
> >       dev_priv->mm.shrinker.scan_objects = i915_gem_shrinker_scan;
> >       dev_priv->mm.shrinker.count_objects = i915_gem_shrinker_count;
> >       dev_priv->mm.shrinker.seeks = DEFAULT_SEEKS;
> > +     dev_priv->mm.shrinker.batch = 4096;
> 
> Did you try how this alone effects the runtime of two consequtive
> gem.testlist runs? Is there some specific test/usecase that benefits
> from this. We'd be the first one to set this, md/raid5.c sets it to 128
> which is the default (0).

My testing was trying to play a game that was hitting swap on an old
hdd. So not very quantifiable, and vmscan is very unintuitive. 

Note also that we are special in that we don't report objects but pages.
Not that it makes any difference, upon reclaim every slab is basically
asked to give up some %% of what it reports, with some hysteresis thrown
in on top.

The only way we can do anything here is to throw it at lots of systems
and see how that helps. My gut feeling says that the batch size should
be approximately the typical object size in the freeable list, to try to
reduce the amount of inefficient work. Now, the value is read before
scan->count is called, but we can always improve the estimate for the
next pass.
-Chris