[Intel-gfx] [PATCH] drm/i915: Always run hangcheck while the GPU is busy

Tue Jan 30 12:24:08 UTC 2018

Quoting Mika Kuoppala (2018-01-30 12:18:17)
> Chris Wilson <chris at chris-wilson.co.uk> writes:
> 
> > Previously, we relied on only running the hangcheck while somebody was
> > waiting on the GPU, in order to minimise the amount of time hangcheck
> > had to run. (If nobody was watching the GPU, nobody would notice if the
> > GPU wasn't responding -- eventually somebody would care and so kick
> > hangcheck into action.) However, this falls apart from around commit
> > 4680816be336 ("drm/i915: Wait first for submission, before waiting for
> > request completion"), as not all waiters declare themselves to hangcheck
> > and so we could switch off hangcheck and miss GPU hangs even when
> > waiting under the struct_mutex.
> >
> > If we enable hangcheck from the first request submission, and let it run
> > until the GPU is idle again, we forgo all the complexity involved with
> > only enabling around waiters. Instead we have to be careful that we do
> > not declare a GPU hang when idly waiting for the next request to be come
> > ready.
> 
> For the complexity part I agree that this is simple and elegant. But
> I think I have not understood it fully as I don't connect the part where
> we need to be careful in idly waiting for next request.
> Could you elaborate and point it the relevant portion in the patch for it?

It's not in this patch, it's just relating to the experiences we've had
previously in compensating for an engine with requests scheduled waiting
for a signal, making sure we treated those engines as idle rather than
stuck.
-Chris