[Intel-gfx] [PATCH 1/6] drm/i915/gt: Sanitize GPU during prepare-to-suspend
Chris Wilson
chris at chris-wilson.co.uk
Thu Feb 11 08:57:45 UTC 2021
Quoting Rodrigo Vivi (2021-02-11 04:25:17)
> On Wed, Feb 10, 2021 at 10:19:50PM +0000, Chris Wilson wrote:
> > After calling intel_gt_suspend_prepare(), the driver starts to turn off
> > various subsystems, such as clearing the GGTT, before calling
> > intel_gt_suspend_late() to relinquish control over the GT. However, if
> > we still have internal GPU state active as we clear the GGTT, the GPU
> > may write back its internal state to the residual GGTT addresses that
> > are now pointing into scratch. Let's reset the GPU to clear that
> > internal state as soon we have idled the GPU in prepare-to-suspend.
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> > drivers/gpu/drm/i915/gt/intel_gt_pm.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > index 0bd303d2823e..f41612faa269 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > @@ -295,6 +295,9 @@ void intel_gt_suspend_prepare(struct intel_gt *gt)
> > wait_for_suspend(gt);
>
> you just wedged the gpu here...
Potentially. As a means to clear a stuck GPU and force it to idle.
> > intel_uc_suspend(>->uc);
> > +
> > + /* Flush all the contexts and internal state before turning off GGTT */
> > + gt_sanitize(gt, false);
>
> and now we are unsetting wedge here...
>
> is this right?
But irrelevant, since it is undone on any of the resume pathways which
must be taken by this point.
Resume has been for many years the method to unwedge a GPU; with the
presumption being that the intervening PCI level reset would be enough
to recover the GPU. Otherwise, it would presumably quite quickly go back
into the wedged state.
The wedging on suspend is just there to cancel outstanding work. Which
is not what we want (we just want to remove work), but is what we have
for the moment. The sanitize is to make sure we don't leak our state
beyond our control of the HW.
-Chris
More information about the Intel-gfx
mailing list