[Intel-gfx] [PATCH 02/21] drm/i915: Delay queuing hangcheck to wait-request

Wed Jun 8 09:13:51 UTC 2016

On Wed, Jun 08, 2016 at 10:42:58AM +0200, Daniel Vetter wrote:
> On Fri, Jun 03, 2016 at 05:08:34PM +0100, Chris Wilson wrote:
> > We can forgo queuing the hangcheck from the start of every request to
> > until we wait upon a request. This reduces the overhead of every
> > request, but may increase the latency of detecting a hang. Howeever, if
> > nothing every waits upon a hang, did it ever hang? It also improves the
> > robustness of the wait-request by ensuring that the hangchecker is
> > indeed running before we sleep indefinitely (and thereby ensuring that
> > we never actually sleep forever waiting for a dead GPU).
> > 
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> 
> I think this will run into TDR patches, where we want a super-low-latency
> hangcheck in some cases. But then I think that's implemented by wrapping
> the batch in some special cs commands to insta-kill the engine if the
> timeout expired, so probably not a big problem. Still worth it to
> double-check with Mika I'd say.

Exactly. With TDR, hangcheck is relegated to denial of service
protection. This does not conflict with TDR, they act as complementary.
With timelines, we probably want to go even further and completely
divorce checking GPU state for hangcheck from checking for timeline
advancement. There simply asking if the waiter has been stuck
dramatically simplifies everything. TDR is again complementary, but
hangcheck still functions in case TDR fails or is disabled.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre