[Intel-gfx] [PATCH 04/43] drm/i915: Do a synchronous switch-to-kernel-context on idling

Thu Mar 7 13:29:38 UTC 2019

Quoting Tvrtko Ursulin (2019-03-07 13:07:18)
> 
> On 06/03/2019 14:24, Chris Wilson wrote:
> > +static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
> > +{
> > +     if (i915_gem_switch_to_kernel_context(i915))
> > +             return false;
> 
> Is it worth still trying to idle if this fails? Since the timeout is 
> short, maybe reset in idle state bring less havoc than not. It can only 
> fail on memory allocations I think, okay and terminally wedged. In which 
> case it is still okay.

Terminally wedged is hard wired to return 0 (this, the next patch?) so
that we don't bail during i915_gem_suspend() for this reason.

We do still idle if this fails, as we mark the driver/GPU as wedged.
Perform a GPU reset so that it hopefully isn't shooting kittens any
more, and pull a pillow over our heads.

That's pretty much our answer in every case, wedge and hope the users go
away.

> > +
> > +     if (i915_gem_wait_for_idle(i915,
> > +                                I915_WAIT_LOCKED |
> > +                                I915_WAIT_FOR_IDLE_BOOST,
> > +                                HZ / 10))
> 
> You'll presumably change HZ / 10 to that new macro.

Yup.

> >   static void
> >   i915_gem_idle_work_handler(struct work_struct *work)
> >   {
> > -     struct drm_i915_private *dev_priv =
> > -             container_of(work, typeof(*dev_priv), gt.idle_work.work);
> > +     struct drm_i915_private *i915 =
> > +             container_of(work, typeof(*i915), gt.idle_work.work);
> > +     typeof(i915->gt) *gt = &i915->gt;
> 
> I am really not sure about the typeof idiom in normal C code. :( It 
> saves a little bit of typing, and a little bit of churn if type name 
> changes, but just feels weird to use it somewhere and somewhere not.

But then we have to name thing! We're not sold on gt; it means quite a
few different things around the bspec. This bit is actually part that
I'm earmarking for i915_gem itself (high level idle/power/user management)
and I'm contemplating i915_gpu for the bits that beneath us but still a
management layer over hardware (and intel_foo for the bits that talk to
hardware. Maybe that too will change if we completely split out into
different modules.)

> >       /*
> > -      * New request retired after this work handler started, extend active
> > -      * period until next instance of the work.
> > +      * Flush out the last user context, leaving only the pinned
> > +      * kernel context resident. Should anything unfortunate happen
> > +      * while we are idle (such as the GPU being power cycled), no users
> > +      * will be harmed.
> >        */
> > -     if (new_requests_since_last_retire(dev_priv))
> > -             goto out_unlock;
> > -
> > -     __i915_gem_park(dev_priv);
> > +     if (!gt->active_requests && !work_pending(&gt->idle_work.work)) {
> 
> Could have left the new_requests_since_last_retire for readability.

As always, it changed several times. I actually think considering that
we play games with active_requests, keeping that in the open is
important. ~o~
> 
> > +             ++gt->active_requests; /* don't requeue idle */
> > +
> > +             if (!switch_to_kernel_context_sync(i915)) {
> > +                     dev_err(i915->drm.dev,
> > +                             "Failed to idle engines, declaring wedged!\n");
> > +                     GEM_TRACE_DUMP();
> > +                     i915_gem_set_wedged(i915);
> > +             }
> > +             i915_retire_requests(i915);

> > @@ -3128,7 +3120,6 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,
> >                       return err;
> >   
> >               i915_retire_requests(i915);
> > -             GEM_BUG_ON(i915->gt.active_requests);
> 
> Belongs to this patch?

Yes. See the above chunk.
-Chris