[Intel-gfx] [PATCH] drm/i915/guc: Fix request re-submission after reset

Chris Wilson chris at chris-wilson.co.uk
Thu Mar 9 20:54:53 UTC 2017


On Thu, Mar 09, 2017 at 05:02:16PM +0000, Tvrtko Ursulin wrote:
> 
> On 09/03/2017 08:55, Oscar Mateo wrote:
> >On 03/09/2017 08:50 AM, Tvrtko Ursulin wrote:
> >>
> >>On 09/03/2017 08:42, Oscar Mateo wrote:
> >>>On 03/09/2017 02:05 AM, Tvrtko Ursulin wrote:
> >>>>From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>>>
> >>>>In order to ensure no missed interrupts we must first re-direct
> >>>>the interrupts to GuC, and only then re-submit the requests to
> >>>>be replayed after a GPU reset. Otherwise context switch can fire
> >>>>before GuC has been set up to receive it triggering more hangs.
> >>>>
> >>>>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>>>Cc: Chris Wilson <chris at chris-wilson.co.uk>
> >>>>Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> >>>>Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> >>>>Cc: Oscar Mateo <oscar.mateo at intel.com>
> >>>>Cc: Sagar Arun Kamble <sagar.a.kamble at intel.com>
> >>>>Cc: Arkadiusz Hiler <arkadiusz.hiler at intel.com>
> >>>>---
> >>>>  drivers/gpu/drm/i915/i915_guc_submission.c | 58
> >>>>+++++++++++++++++++++++++++---
> >>>>  drivers/gpu/drm/i915/intel_guc_loader.c    | 47
> >>>>------------------------
> >>>>  2 files changed, 54 insertions(+), 51 deletions(-)
> >>>>
> >>>>diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c
> >>>>b/drivers/gpu/drm/i915/i915_guc_submission.c
> >>>>index beb38e30d0e9..5b8ec0fab563 100644
> >>>>--- a/drivers/gpu/drm/i915/i915_guc_submission.c
> >>>>+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> >>>>@@ -936,6 +936,52 @@ static void guc_reset_wq(struct i915_guc_client
> >>>>*client)
> >>>>      client->wq_tail = 0;
> >>>>  }
> >>>>
> >>><SNIP>
> >>>>  int i915_guc_submission_enable(struct drm_i915_private *dev_priv)
> >>>>  {
> >>>>      struct intel_guc *guc = &dev_priv->guc;
> >>>>@@ -953,13 +999,17 @@ int i915_guc_submission_enable(struct
> >>>>drm_i915_private *dev_priv)
> >>>>        /* Take over from manual control of ELSP (execlists) */
> >>>>      for_each_engine(engine, dev_priv, id) {
> >>>>-        const int wqi_size = sizeof(struct guc_wq_item);
> >>>>-        struct drm_i915_gem_request *rq;
> >>>>-
> >>>>          engine->submit_request = i915_guc_submit;
> >>>>          engine->schedule = NULL;
> >>>>+    }
> >>>>+
> >>>>+    guc_interrupts_capture(dev_priv);
> >>>>+
> >>>>+    /* Replay the current set of previously submitted requests */
> >>>>+    for_each_engine(engine, dev_priv, id) {
> >>>>+        const int wqi_size = sizeof(struct guc_wq_item);
> >>>>+        struct drm_i915_gem_request *rq;
> >>>>
> >>>Don't you want to move the guc_interrupts_release to
> >>>i915_guc_submission_disable as well?
> >>
> >>I can't spot anything broken in that path. We never go in that
> >>direction with the live submission so why do you think it is needed?
> >>
> >>Regards,
> >>
> >>Tvrtko
> >Just code symmetry: if we are leaving i915_guc_submission_enable to
> >capture the interrupts, why doesn't the disable also releases them?
> >Maybe it's not very important now, but it makes a lot more sense with my
> >series to do proper unwinding of the whole path (I can tackle it there
> >if you prefer).
> 
> I think so. There is a multitude of people trying to refactor the
> GuC code at the moment so I would prefer just to fix the reset fail
> quickly and not interfere with that wider refactoring too much.
> Because I think it is not just a quick job of moving the interrupt
> release to get to full symmetry. Ack to merge then?

Exactly, I don't disagree with the desire to make/keep the code
symmetrical, but I also think push the fix and wait for the dust to
settle to fix the otherside, or volunteer somebody...

Just so long as we remember to do it in the short term and not leave
nits to build up.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list