[Intel-gfx] [RFC PATCH] drm/i915/debugfs: Only wedge if we have reset available

Chris Wilson chris at chris-wilson.co.uk
Wed Oct 2 15:59:02 UTC 2019


Quoting Tvrtko Ursulin (2019-10-02 16:45:18)
> 
> On 02/10/2019 13:48, Janusz Krzysztofik wrote:
> > If we process DROP_RESET_ACTIVE and cancel all outstanding requests by
> > forcing a GPU reset on a hardware with reset capabilities disabled or
> > not supported, we certainly end up with a terminally wedged GPU,
> > impossible to recover.  That's probably not what we want.
> 
> I forgot the whole background story here I'm afraid. Is the concern here 
> the IGT exit handler calling DROP_RESET_ACTIVE? If so with this patch it 
> will fail with -EBUSY, which could be fine, but what happens from the 
> perspective of next test which gets to run? It won't find a wedged GPU, 
> but will encounter a possibly nondeterministic amount of GPU work 
> scheduled before it, no?

Yes, that is the conundrum. If the test left work outstanding, and in a
few cases, we explicitly rely on the reset here to cancel persistent
(unbound nonpreemptible spinners) work, then it will cause the next
test, where drm_driver_open(DRM_INTEL) calls gem_quiescent_gpu(),
to wait until eventually it is wedged. There's a good chance that next
test will then fail because it doesn't handle the wedged gpu.

The alternative would be to wedge here, taint and reboot. Then
hopefully resume testing at the next test with vanilla state.
-Chris


More information about the Intel-gfx mailing list