[Intel-gfx] [PATCH 4/5] drm/i915: Fixup preempt-to-busy vs reset of a virtual request

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Mon Sep 23 13:46:27 UTC 2019


On 21/09/2019 10:55, Chris Wilson wrote:
> Due to the nature of preempt-to-busy the execlists active tracking and
> the schedule queue may become temporarily desync'ed (between resubmission
> to HW and its ack from HW). This means that we may have unwound a
> request and passed it back to the virtual engine, but it is still
> inflight on the HW and may even result in a GPU hang. If we detect that
> GPU hang and try to reset, the hanging request->engine will no longer
> match the current engine, which means that the request is not on the
> execlists active list and we should not try to find an older incomplete
> request. Given that we have deduced this must be a request on a virtual
> engine, it is the single active request in the context and so must be
> guilty (as the context is still inflight, it is prevented from being
> executed on another engine as we process the reset).
> 
> Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c   | 7 +++++--
>   drivers/gpu/drm/i915/gt/intel_reset.c | 4 +---
>   2 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 1b2bacc60300..3eadd294bcd7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2326,11 +2326,14 @@ static void reset_csb_pointers(struct intel_engine_cs *engine)
>   
>   static struct i915_request *active_request(struct i915_request *rq)
>   {
> -	const struct list_head * const list =
> -		&i915_request_active_timeline(rq)->requests;
>   	const struct intel_context * const ce = rq->hw_context;
>   	struct i915_request *active = NULL;
> +	struct list_head *list;
>   
> +	if (!i915_request_is_active(rq)) /* unwound, but incomplete! */

Is it time to store the veng pointer separately in the request so we can 
add the assert here? Like 
GEM_BUG_ON(!...engine_is_virtual(rq->orig_engine)) ?

> +		return rq;
> +
> +	list = &i915_request_active_timeline(rq)->requests;
>   	list_for_each_entry_from_reverse(rq, list, link) {
>   		if (i915_request_completed(rq))
>   			break;
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 89048ca8924c..ae68c3786f63 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -42,11 +42,10 @@ static void engine_skip_context(struct i915_request *rq)
>   	struct intel_engine_cs *engine = rq->engine;
>   	struct i915_gem_context *hung_ctx = rq->gem_context;
>   
> -	lockdep_assert_held(&engine->active.lock);
> -
>   	if (!i915_request_is_active(rq))
>   		return;
>   
> +	lockdep_assert_held(&engine->active.lock);
>   	list_for_each_entry_continue(rq, &engine->active.requests, sched.link)
>   		if (rq->gem_context == hung_ctx)
>   			i915_request_skip(rq, -EIO);
> @@ -123,7 +122,6 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
>   		  rq->fence.seqno,
>   		  yesno(guilty));
>   
> -	lockdep_assert_held(&rq->engine->active.lock);
>   	GEM_BUG_ON(i915_request_completed(rq));
>   
>   	if (guilty) {
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Regards,

Tvrtko


More information about the Intel-gfx mailing list