[Intel-gfx] [PATCH 1/6] drm/i915: hangcheck robustification
chris at chris-wilson.co.uk
Wed Oct 19 04:32:25 PDT 2011
On Tue, 11 Oct 2011 16:39:09 +0200, Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> From: Ben Widawsky <ben at bwidawsk.net>
> This was pulled out of the per ring error handling patch series as it
> actually fixes two issues, and bikeshedding appears to be going on
> First, remove setting hangcheck_count when we do notify ring. While it
> seems counterintuitive to be setting up a timer to catch hangcheck_count
> greater than 0 with hangcheck_count already greater than 0, actually
> when we go to check if the GPU is hung we clear that value if the gpu is
> still alive . Leaving this is actually harmful as submitting work could
> falsely clear the count while the hanghcheck code is checking the count.
> I can't think of case where this doesn't just delay the inevitable
> reset... but I didn't spend too much time thinking about it.
> Second, for Gen5+ we have more information to be considered when
> determining if the GPU is stuck, primarily the media ring (and blitter
> ring in gen6). This patch will check all available rings, and also updates
> error state with the new information. It theoretically cant fix false
> positives, but I haven't actually come across such a case.
> Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> [danvet: remove remnants of a unrelated cleanup patch]
> Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
NAK: This failed to detect a hang, leaving my box frozen. I suspect that
the value of INSTDONE was fluctuating on the render ring even though we
had now requests pending and so could assume that it was idle.
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx