[Intel-gfx] [PATCH] drm/i915: don't bail out of intel_wait_ring_buffer too early

Tue Oct 18 17:24:40 CEST 2011

On Tue, 11 Oct 2011 19:25:57 +0200, Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> In the pre-gem days with non-existing hangcheck and gpu reset code,
> this timeout of 3 seconds was pretty important to avoid stuck
> processes.
> 
> But now we have the hangcheck code in gem that goes to great length
> to ensure that the gpu is really dead before declaring it wedged.
> 
> So there's no need for this timeout anymore. Actually it's even harmful
> because we can bail out too early (e.g. with xscreensaver slip)
> when running giant batchbuffers. And our code isn't robust enough
> to properly unroll any state-changes, we pretty much rely on the gpu
> reset code cleaning up the mess (like cache tracking, fencing state,
> active list/request tracking, ...).
> 
> With this change intel_begin_ring can only fail when the gpu is
> wedged, and it will return -EAGAIN (like wait_request in case the
> gpu reset is still outstanding).
> 
> v2: Chris Wilson noted that on resume timers aren't running and hence
> we won't ever get kicked out of this loop by the hangcheck code. Use
> an insanely large timeout instead for the HAS_GEM case to prevent
> resume bugs from totally hanging the machine.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>

Personally I would have done:
 end = jiffies;
 if (has_gem)
	end += infinity;
 else
        end += 3s;
but that is one nitpick too far ;-)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre