[Intel-gfx] [PATCH] drm/i915/gt: Clear wedged status upon suspend

Das, Nirmoy nirmoy.das at linux.intel.com
Wed Jan 25 13:28:23 UTC 2023


Hi Rodrigo,

On 1/24/2023 8:26 PM, Rodrigo Vivi wrote:
> On Tue, Jan 24, 2023 at 12:07:19PM +0100, Das, Nirmoy wrote:
>> Forgot to add the drm issue a reference.
>>
>> On 1/24/2023 12:05 PM, Nirmoy Das wrote:
>>> From: Chris Wilson <chris.p.wilson at linux.intel.com>
>>>
>>> Currently we use set-wedged on suspend if the workload is not responding
>>> in order to allow a fast suspend (albeit at the cost of discarding the
>>> current userspace). This may leave the device wedged during suspend,
>>> where we may require the device available in order to swapout CPU
>>> inaccessible device memory. Clear any temporary wedged-status after
>>> flushing userspace off the device so we can use the blitter ourselves
>>> inside suspend.
> This seems a very good move. But this explain they unset_wedged part,
> not the removal of the retire_requests. Why don't we need to retire them
> anymore?


Thanks for noticing that. This on me, I missed another patch which moved 
the intel_gt_retire_requests()

inside of intel_gt_set_wedged().

>
> Also, what are the chances of races here? I mean, we are marking
> the gpu as not wedged anymore. Do we have any warranty at this point
> that no further request will arrive?


The assumption was: this is  in single threaded suspend "context" so we 
should be fine but

we just realized that  this is getting called at pm prepare time. Thanks 
for raising this it seem

I need to refactor i915_gem_backup_suspend() as well which should be 
called much later on.


Regards,

Nirmoy

>
> Shouldn't we have a way to differentiate between the totally wedged
> and blocked for user submission?
>
>>> Testcase: igt/gem_eio/in-flight-suspend
>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/7896
>>> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> Signed-off-by: Chris Wilson <chris.p.wilson at linux.intel.com>
>>> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_gt_pm.c | 10 ++++------
>>>    1 file changed, 4 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>>> index cef3d6f5c34e..74d1dd3793f9 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>>> @@ -317,19 +317,17 @@ int intel_gt_resume(struct intel_gt *gt)
>>>    static void wait_for_suspend(struct intel_gt *gt)
>>>    {
>>> -	if (!intel_gt_pm_is_awake(gt))
>>> -		return;
>>> -
>>> -	if (intel_gt_wait_for_idle(gt, I915_GT_SUSPEND_IDLE_TIMEOUT) == -ETIME) {
>>> +	if (intel_gt_wait_for_idle(gt, I915_GT_SUSPEND_IDLE_TIMEOUT) == -ETIME)
>>>    		/*
>>>    		 * Forcibly cancel outstanding work and leave
>>>    		 * the gpu quiet.
>>>    		 */
>>>    		intel_gt_set_wedged(gt);
>>> -		intel_gt_retire_requests(gt);
>>> -	}
>>>    	intel_gt_pm_wait_for_idle(gt);
>>> +
>>> +	/* Make the GPU available again for swapout */
>>> +	intel_gt_unset_wedged(gt);
>>>    }
>>>    void intel_gt_suspend_prepare(struct intel_gt *gt)


More information about the Intel-gfx mailing list