[Intel-gfx] [PATCH 02/11] drm/i915: Only reset hangcheck at the start of an activity cycle

Mon Jul 9 14:35:16 UTC 2018

Chris Wilson <chris at chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2018-07-09 15:13:44)
>> Chris Wilson <chris at chris-wilson.co.uk> writes:
>> 
>> > Across a reset, the seqno (and thus hangcheck) should restart and the
>> > hangcheck naturally progress, for when it does not, we want to declare an
>> > emergency. Currently, we only detect if reset and reinit fails, but we
>> > do not detect if the call to reinit succeeds but the HW is fried - as we
>> > are resetting hangcheck on initialisation the engine. Remove that and
>> > rely on the natural progress to reset the hangcheck timer.
>> 
>> I take it that the intention is not to give reset
>> any special leeway wrt to request completion. So
>> we now assume that reset/recovery must fit inside
>> one hangcheck tick?
>
> We call the synchronous i915_handle_error() from inside hangcheck, so we
> know the reset is completed before we schedule the next tick. So yes it
> seems fair that the recovery should always be expected to complete
> within that tick as we would expect any other batch to complete (and the
> recovery request is just to advance the breadcrumb, no batch).
>
> So yes, reset/recovery must fit inside the tick.

Worthy goal. And yes it explains the natural progression in
the commit message.

Reviewed-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>