[Intel-gfx] [PATCH 05/11] drm/i915/tdr: Identify hung request and drop it

Arun Siluvery arun.siluvery at linux.intel.com
Wed Jul 27 11:54:44 UTC 2016


On 26/07/2016 22:37, Chris Wilson wrote:
> On Tue, Jul 26, 2016 at 05:40:51PM +0100, Arun Siluvery wrote:
>> The current active request is the one that caused the hang so this is
>> retrieved and removed from elsp queue, otherwise we cannot submit other
>> workloads to be processed by GPU.
>>
>> A consistency check between HW and driver is performed to ensure that we
>> are dropping the correct request. Since this request doesn't get executed
>> anymore, we also need to advance the seqno to mark it as complete. Head
>> pointer is advanced to skip the offending batch so that HW resumes
>> execution other workloads. If HW and SW don't agree then we won't proceed
>> with engine reset, this is treated as an error condition and we fallback to
>> full gpu reset.
>>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>> Signed-off-by: Arun Siluvery <arun.siluvery at linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/intel_lrc.c | 116 +++++++++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.h |   2 +
>>   2 files changed, 118 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index daf1279..8fc5a3b 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1026,6 +1026,122 @@ void intel_lr_context_unpin(struct i915_gem_context *ctx,
>>   	i915_gem_context_put(ctx);
>>   }
>>
>> +static void intel_lr_context_resync(struct i915_gem_context *ctx,
>> +				    struct intel_engine_cs *engine)
>> +{
>> +	u32 head;
>> +	u32 head_addr, tail_addr;
>> +	u32 *reg_state;
>> +	struct intel_ringbuffer *ringbuf;
>> +	struct drm_i915_private *dev_priv = engine->i915;
>> +
>> +	ringbuf = ctx->engine[engine->id].ringbuf;
>> +	reg_state = ctx->engine[engine->id].lrc_reg_state;
>> +
>> +	head = I915_READ_HEAD(engine);
>> +	head_addr = head & HEAD_ADDR;
>> +	tail_addr = reg_state[CTX_RING_TAIL+1] & TAIL_ADDR;
>
> ?
>
> We know where we want the head to be to emit the breadcrumb and complete
> the request since we can record that when constructing the request. That
> also neatly solves the riddle of how to update the hw state.

We want to skip only MI_BATCH_BUFFER_START and continue as usual so just 
using existing info.
>
> resync?  intel_lr_context_reset_ring may be more apt, or maybe
> intel_execlists_reset_request?

resync because we read current state and update it. 
intel_execlists_reset_request() sounds better, will change it as 
suggested. thanks.

regards
Arun


> -Chris
>



More information about the Intel-gfx mailing list