[Intel-gfx] [PATCH 4/5] drm/i915: Fixup preempt-to-busy vs reset of a virtual request
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Mon Sep 23 13:46:27 UTC 2019
On 21/09/2019 10:55, Chris Wilson wrote:
> Due to the nature of preempt-to-busy the execlists active tracking and
> the schedule queue may become temporarily desync'ed (between resubmission
> to HW and its ack from HW). This means that we may have unwound a
> request and passed it back to the virtual engine, but it is still
> inflight on the HW and may even result in a GPU hang. If we detect that
> GPU hang and try to reset, the hanging request->engine will no longer
> match the current engine, which means that the request is not on the
> execlists active list and we should not try to find an older incomplete
> request. Given that we have deduced this must be a request on a virtual
> engine, it is the single active request in the context and so must be
> guilty (as the context is still inflight, it is prevented from being
> executed on another engine as we process the reset).
>
> Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> ---
> drivers/gpu/drm/i915/gt/intel_lrc.c | 7 +++++--
> drivers/gpu/drm/i915/gt/intel_reset.c | 4 +---
> 2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 1b2bacc60300..3eadd294bcd7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2326,11 +2326,14 @@ static void reset_csb_pointers(struct intel_engine_cs *engine)
>
> static struct i915_request *active_request(struct i915_request *rq)
> {
> - const struct list_head * const list =
> - &i915_request_active_timeline(rq)->requests;
> const struct intel_context * const ce = rq->hw_context;
> struct i915_request *active = NULL;
> + struct list_head *list;
>
> + if (!i915_request_is_active(rq)) /* unwound, but incomplete! */
Is it time to store the veng pointer separately in the request so we can
add the assert here? Like
GEM_BUG_ON(!...engine_is_virtual(rq->orig_engine)) ?
> + return rq;
> +
> + list = &i915_request_active_timeline(rq)->requests;
> list_for_each_entry_from_reverse(rq, list, link) {
> if (i915_request_completed(rq))
> break;
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 89048ca8924c..ae68c3786f63 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -42,11 +42,10 @@ static void engine_skip_context(struct i915_request *rq)
> struct intel_engine_cs *engine = rq->engine;
> struct i915_gem_context *hung_ctx = rq->gem_context;
>
> - lockdep_assert_held(&engine->active.lock);
> -
> if (!i915_request_is_active(rq))
> return;
>
> + lockdep_assert_held(&engine->active.lock);
> list_for_each_entry_continue(rq, &engine->active.requests, sched.link)
> if (rq->gem_context == hung_ctx)
> i915_request_skip(rq, -EIO);
> @@ -123,7 +122,6 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
> rq->fence.seqno,
> yesno(guilty));
>
> - lockdep_assert_held(&rq->engine->active.lock);
> GEM_BUG_ON(i915_request_completed(rq));
>
> if (guilty) {
>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
Regards,
Tvrtko
More information about the Intel-gfx
mailing list