[Intel-gfx] [PATCH] drm/i915: Clear pending reset requests during suspend

Tue Jan 19 09:01:00 PST 2016

On 19/01/2016 16:42, Daniel Vetter wrote:
> On Tue, Jan 19, 2016 at 03:04:09PM +0000, Arun Siluvery wrote:
>> On 19/01/2016 14:13, Chris Wilson wrote:
>>> On Tue, Jan 19, 2016 at 03:04:40PM +0100, Daniel Vetter wrote:
>>>> On Tue, Jan 19, 2016 at 01:48:05PM +0000, Chris Wilson wrote:
>>>>> On Tue, Jan 19, 2016 at 01:09:28PM +0100, Daniel Vetter wrote:
>>>>>> On Thu, Jan 14, 2016 at 10:49:45AM +0000, Arun Siluvery wrote:
>>>>>>> Pending reset requests are cleared before suspending, they should be picked up
>>>>>>> after resume when new work is submitted.
>>>>>>>
>>>>>>> This is originally added as part of TDR patches for Gen8 from Tomas Elf which
>>>>>>> are under review, as suggested by Chris this is extracted as a separate patch
>>>>>>> as it can be useful now.
>>>>>>>
>>>>>>> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
>>>>>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>>>>>> Signed-off-by: Arun Siluvery <arun.siluvery at linux.intel.com>
>>>>>>
>>>>>> Pulling in the discussion we had from irc: Imo the right approach is to
>>>>>> simply wait for gpu reset to finish it's job. Since that could in turn
>>>>>> lead to a dead gpu (if we're unlucky and init_hw failed) we'd need to do
>>>>>> that in a loop around gem_idle. And drop dev->struct_mutex in-between.
>>>>>> E.g.
>>>>>>
>>>>>> while (busy) {
>>>>>> 	mutex_lock();
>>>>>> 	gpu_idle();
>>>>>> 	mutex_unlock();
>>>>>>
>>>>>> 	flush_work(reset_work);
>>>>>> }
>>>>>
>>>>> Where does the requirement for gpu_idle come from? If there is a global
>>>>> reset in progress, it cannot queue a request to flush the work and
>>>>> waiting on the old results will be skipped. So just wait for the global
>>>>> reset to complete, i.e. flush_work().
>>>>
>>>> Yes, but the global reset might in turn leave a wrecked gpu behind, or at
>>>> least a non-idle one. Hence another gpu_idle on top, to make sure. If we
>>>> change init_hw() of engines to be synchronous then we should have at least
>>>> a WARN_ON(not_idle_but_i_expected_so()); in there ...
>>
>> gpu_error.work is removed in b8d24a06568368076ebd5a858a011699a97bfa42, we
>
> git sha1 from your private tree are meaningless in the public. Either link
> to some git weburl or mailing lists archive link.

It is from drm-intel repo,
http://cgit.freedesktop.org/drm-intel/commit/?id=b8d24a06568368076ebd5a858a011699a97bfa42

http://lists.freedesktop.org/archives/intel-gfx/2015-January/059154.html

regards
Arun

>
> Thanks, Daniel
>
>> are doing reset in hangcheck work itself so I think there is no need to
>> flush work.
>>
>> while (i915_reset_in_progress(gpu_error) &&
>>         !i915_terminally_wedged(gpu_error)) {
>>          int ret;
>>
>>          mutex_lock(&dev->struct_mutex);
>>          ret = i915_gpu_idle(dev);
>>          if (ret)
>>                  DRM_ERROR("GPU is in inconsistent state after reset\n");
>>          mutex_unlock(&dev->struct_mutex);
>> }
>>
>> If the reset is successful we are idle before suspend otherwise in a wedged
>> state. is this ok?
>>
>> regards
>> Arun
>>
>>>
>>> Does it matter on suspend? We test on resume if the GPU is usable, but
>>> if we wanted to test on suspend then we should do
>>>
>>> flush_work();
>>> if (i915_terminally_wedged())
>>>     /* oh noes */;
>>> -Chris
>>>
>>
>