[Intel-gfx] [PATCH 3/3] drm/i915: Defer final intel_wakeref_put to process context

Wed Aug 7 15:24:03 UTC 2019

Quoting Mika Kuoppala (2019-08-07 16:04:56)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> > -static int intel_gt_park(struct intel_wakeref *wf)
> > +static int __gt_park(struct intel_wakeref *wf)
> >  {
> >       struct drm_i915_private *i915 =
> >               container_of(wf, typeof(*i915), gt.wakeref);
> > @@ -74,22 +67,25 @@ static int intel_gt_park(struct intel_wakeref *wf)
> >       if (INTEL_GEN(i915) >= 6)
> >               gen6_rps_idle(i915);
> >  
> > +     /* Everything switched off, flush any residual interrupt just in case */
> > +     intel_synchronize_irq(i915);
> 
> Do we detect it gracefully if we get one extra afer this flush?

If we make a mistake, we get unclaimed mmio from the interrupt handler.

Given the structure of engine_pm/gt_pm, we should not be generating
either user interrupts nor CS interrupts by this point. The potential is
for the guc/pm interrupts, but I felt it was more prudent to document
this is the final point for GT interrupts of any type.

> > +     GEM_BUG_ON(intel_gt_pm_is_awake(gt));
> > +     for_each_engine(engine, gt->i915, id) {
> > +             const typeof(*igt_atomic_phases) *p;
> > +
> > +             for (p = igt_atomic_phases; p->name; p++) {
> > +                     /*
> > +                      * Acquisition is always synchronous, except if we
> > +                      * know that the engine is already awale, in which
> 
> s/awale/awake

a whale

> in which case?

Avast ye blows!
Moby Dick!

> >  static long
> >  wait_for_timelines(struct drm_i915_private *i915,
> >                  unsigned int flags, long timeout)
> > @@ -954,27 +941,20 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,
> >                          unsigned int flags, long timeout)
> >  {
> >       /* If the device is asleep, we have no requests outstanding */
> > -     if (!READ_ONCE(i915->gt.awake))
> > +     if (!intel_gt_pm_is_awake(&i915->gt))
> >               return 0;
> >  
> > -     GEM_TRACE("flags=%x (%s), timeout=%ld%s, awake?=%s\n",
> > +     GEM_TRACE("flags=%x (%s), timeout=%ld%s\n",
> >                 flags, flags & I915_WAIT_LOCKED ? "locked" : "unlocked",
> > -               timeout, timeout == MAX_SCHEDULE_TIMEOUT ? " (forever)" : "",
> > -               yesno(i915->gt.awake));
> > +               timeout, timeout == MAX_SCHEDULE_TIMEOUT ? " (forever)" : "");
> >  
> >       timeout = wait_for_timelines(i915, flags, timeout);
> >       if (timeout < 0)
> >               return timeout;
> >  
> >       if (flags & I915_WAIT_LOCKED) {
> > -             int err;
> > -
> >               lockdep_assert_held(&i915->drm.struct_mutex);
> >  
> > -             err = wait_for_engines(&i915->gt);
> > -             if (err)
> > -                     return err;
> > -
> 
> Hmm where does the implicit wait for idle come from now?

Instead of having the wait here, we moved it into the engine/gt pm
tracking and added an intel_gt_pm_wait_for_idle().

> > -int __intel_wakeref_put_last(struct intel_runtime_pm *rpm,
> > -                          struct intel_wakeref *wf,
> > -                          int (*fn)(struct intel_wakeref *wf))
> > +static void ____intel_wakeref_put_last(struct intel_wakeref *wf)
> >  {
> > -     int err;
> > +     if (!atomic_dec_and_test(&wf->count))
> > +             goto unlock;
> > +
> > +     if (likely(!wf->ops->put(wf))) {
> > +             rpm_put(wf);
> > +             wake_up_var(&wf->wakeref);
> > +     } else {
> > +             /* ops->put() must schedule its own release on deferral */
> > +             atomic_set_release(&wf->count, 1);
> 
> I lose track here. On async call to gt_park we end up leaving
> with a count 1. 

So we know wf->count is 0 and that we hold the lock preventing anyone
else raising wf->count back to 1. If we fail in our cleanup task,
knowing that we have exclusive control over the counter, we simply mark
it as 1 again (using _release to unlock anyone concurrently trying to
acquire the wakeref for themselves).

> > +     /* Assume we are not in process context and so cannot sleep. */
> > +     if (wf->ops->flags & INTEL_WAKEREF_PUT_ASYNC ||
> > +         !mutex_trylock(&wf->mutex)) {
> 
> Didn't found anything in trylock that would prevent you
> from shooting you on your own leg.

Yup. It's immorally equivalent to spin_trylock.

> So it is up to caller (and eventually lockdep spam) if
> the async usage fails to follow the rules?

Not the caller, the fault lies in the wakeref. We either mark it that is
has to use the worker (process context) or fix it such that it is quick
and can run in atomic context. engine_pm was simple enough to make
atomic, gt_pm is promising but I needed to make rps waitfree (done) and
make the display pm truly async (which I baulked at!).

> 
> > +             schedule_work(&wf->work);
> > +             return;
> > +     }
> > +
> > +     ____intel_wakeref_put_last(wf);
> >  }
> >  
> > -void __intel_wakeref_init(struct intel_wakeref *wf, struct lock_class_key *key)
> > +static void __intel_wakeref_put_work(struct work_struct *wrk)
> 
> No need for underscore before the name here?

I was just trying to keep it consistent with the 3 functions working
together.

__intel_wakeref_put_last
__intel_wakeref_put_work
____intel_wakeref_put_last

Now, could use

__intel_wakeref_put_last
__wakeref_put_work
____wakeref_put_last

keeping the underscores to underline the games being played with locking
are not for the faint of heart.
-Chris