[Intel-gfx] [PATCH 04/43] drm/i915: Do a synchronous switch-to-kernel-context on idling

Thu Mar 7 17:06:58 UTC 2019

On 07/03/2019 13:29, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-07 13:07:18)
>>
>> On 06/03/2019 14:24, Chris Wilson wrote:
>>> +static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
>>> +{
>>> +     if (i915_gem_switch_to_kernel_context(i915))
>>> +             return false;
>>
>> Is it worth still trying to idle if this fails? Since the timeout is
>> short, maybe reset in idle state bring less havoc than not. It can only
>> fail on memory allocations I think, okay and terminally wedged. In which
>> case it is still okay.
> 
> Terminally wedged is hard wired to return 0 (this, the next patch?) so
> that we don't bail during i915_gem_suspend() for this reason.
> 
> We do still idle if this fails, as we mark the driver/GPU as wedged.
> Perform a GPU reset so that it hopefully isn't shooting kittens any
> more, and pull a pillow over our heads.

I didn't find a path which idles before wedging if 
switch_to_kernel_context_sync fails due failing 
i915_gem_switch_to_kernel_context. Where is it?

It is a minor concern don't get me wrong. It is unlikely to fail like 
this. I was simply thinking why not try and wait for the current work to 
finish before suspending in this case. Might be a better experience 
after resume.

> That's pretty much our answer in every case, wedge and hope the users go
> away.
> 
>>> +
>>> +     if (i915_gem_wait_for_idle(i915,
>>> +                                I915_WAIT_LOCKED |
>>> +                                I915_WAIT_FOR_IDLE_BOOST,
>>> +                                HZ / 10))
>>
>> You'll presumably change HZ / 10 to that new macro.
> 
> Yup.
> 
>>>    static void
>>>    i915_gem_idle_work_handler(struct work_struct *work)
>>>    {
>>> -     struct drm_i915_private *dev_priv =
>>> -             container_of(work, typeof(*dev_priv), gt.idle_work.work);
>>> +     struct drm_i915_private *i915 =
>>> +             container_of(work, typeof(*i915), gt.idle_work.work);
>>> +     typeof(i915->gt) *gt = &i915->gt;
>>
>> I am really not sure about the typeof idiom in normal C code. :( It
>> saves a little bit of typing, and a little bit of churn if type name
>> changes, but just feels weird to use it somewhere and somewhere not.
> 
> But then we have to name thing! We're not sold on gt; it means quite a
> few different things around the bspec. This bit is actually part that
> I'm earmarking for i915_gem itself (high level idle/power/user management)
> and I'm contemplating i915_gpu for the bits that beneath us but still a
> management layer over hardware (and intel_foo for the bits that talk to
> hardware. Maybe that too will change if we completely split out into
> different modules.)

So you could have left it as is for now and have a smaller diff. But 
okay.. have it if you insist.

>   
>>>        /*
>>> -      * New request retired after this work handler started, extend active
>>> -      * period until next instance of the work.
>>> +      * Flush out the last user context, leaving only the pinned
>>> +      * kernel context resident. Should anything unfortunate happen
>>> +      * while we are idle (such as the GPU being power cycled), no users
>>> +      * will be harmed.
>>>         */
>>> -     if (new_requests_since_last_retire(dev_priv))
>>> -             goto out_unlock;
>>> -
>>> -     __i915_gem_park(dev_priv);
>>> +     if (!gt->active_requests && !work_pending(&gt->idle_work.work)) {
>>
>> Could have left the new_requests_since_last_retire for readability.
> 
> As always, it changed several times. I actually think considering that
> we play games with active_requests, keeping that in the open is
> important. ~o~

Ok.

>>
>>> +             ++gt->active_requests; /* don't requeue idle */
>>> +
>>> +             if (!switch_to_kernel_context_sync(i915)) {
>>> +                     dev_err(i915->drm.dev,
>>> +                             "Failed to idle engines, declaring wedged!\n");
>>> +                     GEM_TRACE_DUMP();
>>> +                     i915_gem_set_wedged(i915);
>>> +             }
>>> +             i915_retire_requests(i915);
> 
>>> @@ -3128,7 +3120,6 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,
>>>                        return err;
>>>    
>>>                i915_retire_requests(i915);
>>> -             GEM_BUG_ON(i915->gt.active_requests);
>>
>> Belongs to this patch?
> 
> Yes. See the above chunk.

The idle work handler? More hand holding required I'm afraid.

Regards,

Tvrtko