[Intel-gfx] [PATCH] drm/i915: Ratelimit i915_globals_park
Chris Wilson
chris at chris-wilson.co.uk
Wed Dec 18 12:57:00 UTC 2019
Quoting Tvrtko Ursulin (2019-12-18 12:49:26)
>
> On 18/12/2019 09:40, Chris Wilson wrote:
> > When doing our global park, we like to be a good citizen and shrink our
> > slab caches (of which we have quite a few now), but each
> > kmem_cache_shrink() incurs a stop_machine() and so ends up being quite
> > expensive, causing machine-wide stalls. While ideally we would like to
> > throw away unused pages in our slab caches whenever it appears that we
> > are idling, doing so will require a much cheaper mechanism. In the
> > meantime use a delayed worked to impose a rate-limit that means we have
> > to have been idle for more than 2 seconds before we start shrinking.
>
> I was thinking about this yesterday, while looking at rapid park-unpark
> cycles!
>
> > References: https://gitlab.freedesktop.org/drm/intel/issues/848
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_globals.c | 53 ++++++++++++++++++++++++-----
> > 1 file changed, 44 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c
> > index be127cd28931..3aa213684293 100644
> > --- a/drivers/gpu/drm/i915/i915_globals.c
> > +++ b/drivers/gpu/drm/i915/i915_globals.c
> > @@ -20,7 +20,10 @@ static LIST_HEAD(globals);
> > static atomic_t active;
> > static atomic_t epoch;
> > static struct park_work {
> > - struct rcu_work work;
> > + struct delayed_work work;
> > + struct rcu_head rcu;
> > + unsigned long flags;
> > +#define PENDING 0
> > int epoch;
> > } park;
> >
> > @@ -37,11 +40,33 @@ static void i915_globals_shrink(void)
> > global->shrink();
> > }
> >
> > +static void __i915_globals_grace(struct rcu_head *rcu)
> > +{
> > + /* Ratelimit parking as shrinking is quite slow */
> > + schedule_delayed_work(&park.work, round_jiffies_up_relative(2 * HZ));
> > +}
> > +
> > +static void __i915_globals_queue_rcu(void)
> > +{
> > + park.epoch = atomic_inc_return(&epoch);
> > + if (!atomic_read(&active)) {
> > + init_rcu_head(&park.rcu);
>
> Do we need to do init/destroy more than once? I think once on driver
> load/exit would be more correct since the head is statically allocated.
I did on each fresh invocation to have a better chance of detecting
wrong lifetimes. Because the first patch did call call_rcu() while
still active, I thought by having the debugobject init it would give us
the early warning that it was still alive. Similarly by calling
destroy_rcu as we transition to the delayed_work should catch if we
happen to be still active on rcu lists.
That was all I was thinking, it would give us better debug information.
-Chris
More information about the Intel-gfx
mailing list