[Intel-gfx] [PATCH 1/2] drm/i915/gt: obey "reset" module parameter

Chris Wilson chris at chris-wilson.co.uk
Tue Aug 18 18:12:46 UTC 2020


Quoting Rodrigo Vivi (2020-08-18 18:49:19)
> On Tue, Aug 18, 2020 at 12:58:00PM +0100, Chris Wilson wrote:
> > Quoting Marcin Ślusarz (2020-08-18 12:36:07)
> > > From: Marcin Ślusarz <marcin.slusarz at intel.com>
> > > 
> > > For some reason intel_gt_reset attempts to reset the GPU twice.
> > > On one code path (do_reset) "reset" parameter is obeyed, but is
> > > not on the other one (__intel_gt_set_wedged).
> > 
> > It's not that simple, we do want to force __intel_gt_set_wedged() to
> > cancel whatever is running on the GPU as it is used for more than just
> > failing resets (e.g. around control boundaries) regardless of what the
> > user may want.
> > 
> > I'm loathe to add a parameter just to enable unsafe behaviour, but that
> > may be the compromise.
> 
> we probably need this compromise for these cases Marcin faced...

You can always say those who risk unsafe parameters are always capable
of patching the kernel to break it.

> what about moving this to intel_get_gpu_reset()?

When it was there, we didn't have the reason why, so we ended up
duplicating the tests anyway to suppress the error messages for CI.

And it breaks the control boundary cases where we have to reset the GPU,
or when we need the wedge to undeadlock modesetting which will also
lockup the machine. In short, we should remove the parameter; we'll
still end up having to bisect through the GPU features [atomic ops, it's
always atomic ops] to find which one is killing the machine.
-Chris


More information about the Intel-gfx mailing list