[PATCH 1/6] drm/i915/guc: Handle GuC interaction in reset/suspend/unload scenarios

Chris Wilson chris at chris-wilson.co.uk
Thu Aug 3 10:01:44 UTC 2017


Quoting Sagar Arun Kamble (2017-08-03 09:20:19)
> Tearing down of guc_ggtt_invalidate/guc_interrupts/guc_communication
> setup should happen towards end of reset/suspend/unload as these are
> setup back again during recovery/resume/load.
> 
> Prepared helper intel_guc_finish that will do teardown of this setup
> along with optional notification to GuC about suspension.
> Then, during unload we can rely on intel_guc_finish being done as part
> of gem_suspend for disabling these interfaces.
> But will have to disable GuC submission prior to suspend as that involves
> communication with GuC to destroy doorbell.
> Another helper intel_guc_prepare will be useful for rebuilding the setup
> which optionally resume GuC in case of RPM resume.
> 
> drv_hangman at error-state-basic and gem_exec_suspend at basic-s3 caught
> below kernel bugs with GuC.
> kernel BUG at drivers/gpu/drm/i915/i915_gem_gtt.c:3086!
> GEM_BUG_ON(i915->ggtt.invalidate != gen6_ggtt_invalidate);
> 
> This was happening since ggtt_disable_guc was not done prior to reset
> and suspend.
> 
> drv_module_reload at basic-reload failure due to attempt to release doorbell
> post gem_suspend that resets GuC too.
> [drm] INTEL_GUC_SEND: Action 0x20 failed; ret=-110 status=0x00000020 response=0x00000000
> 
> Also moving intel_guc_suspend and intel_guc_resume to intel_uc.c.

> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Cc: Tomi Sarvela <tomi.p.sarvela at intel.com>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble at intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c            |   4 +-
>  drivers/gpu/drm/i915/i915_gem.c            |  11 ++-
>  drivers/gpu/drm/i915/i915_guc_submission.c |  58 ++--------------
>  drivers/gpu/drm/i915/intel_guc_log.c       |   3 +
>  drivers/gpu/drm/i915/intel_pm.c            |   2 +
>  drivers/gpu/drm/i915/intel_uc.c            | 106 +++++++++++++++++++++++++----
>  drivers/gpu/drm/i915/intel_uc.h            |   4 ++
>  7 files changed, 118 insertions(+), 70 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 214555e..9eee99e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -2485,7 +2485,7 @@ static int intel_runtime_suspend(struct device *kdev)
>          */
>         i915_gem_runtime_suspend(dev_priv);
>  
> -       intel_guc_suspend(dev_priv);
> +       intel_guc_finish(dev_priv);

I would personally like to have intel_guc_runtime_suspend(). (The goal
is has a consistent vocabulary for certain phases, and at the lowest
level have those break out into the steps such as intel_guc_finish.)
  
>         intel_runtime_pm_disable_interrupts(dev_priv);
>  
> @@ -2570,7 +2570,7 @@ static int intel_runtime_resume(struct device *kdev)
>         if (intel_uncore_unclaimed_mmio(dev_priv))
>                 DRM_DEBUG_DRIVER("Unclaimed access during suspend, bios?\n");
>  
> -       intel_guc_resume(dev_priv);
> +       intel_guc_prepare(dev_priv);
>  
>         if (IS_GEN9_LP(dev_priv)) {
>                 bxt_disable_dc9(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 000a764..89ff702 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2840,6 +2840,11 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
>  
>         i915_gem_revoke_fences(dev_priv);
>  
> +       if (i915.enable_guc_loading) {
> +               intel_guc_finish(dev_priv);
> +               dev_priv->guc.fw.load_status = INTEL_UC_FIRMWARE_PENDING;

Keep the guc internal logic out of i915_gem.c, just add a
intel_guc_reset_prepare() for us to call.

> +       }
> +
>         return err;
>  }
>  
> @@ -4553,9 +4558,13 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv)
>  
>         assert_kernel_context_is_current(dev_priv);
>         i915_gem_contexts_lost(dev_priv);
> +       i915_guc_submission_disable(dev_priv);
>         mutex_unlock(&dev->struct_mutex);
>  
> -       intel_guc_suspend(dev_priv);
> +       if (i915.enable_guc_loading) {
> +               intel_guc_finish(dev_priv);
> +               dev_priv->guc.fw.load_status = INTEL_UC_FIRMWARE_PENDING;
> +       }

Same as above; this time intel_guc_suspend().

>  
>         cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
>         cancel_delayed_work_sync(&dev_priv->gt.retire_work);
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 48a1e93..48250c4 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -1280,63 +1280,15 @@ void i915_guc_submission_disable(struct drm_i915_private *dev_priv)
>  {
>         struct intel_guc *guc = &dev_priv->guc;
>  
> +       if (!i915.enable_guc_submission)
> +               return;
> +
>         guc_interrupts_release(dev_priv);
>  
>         /* Revert back to manual ELSP submission */
>         intel_engines_reset_default_submission(dev_priv);
>  
> -       guc_client_free(guc->execbuf_client);
> +       if (guc->execbuf_client)
> +               guc_client_free(guc->execbuf_client);
>         guc->execbuf_client = NULL;
>  }

Code motion looks fine, but people get uppity if you mix purely
mechanical changes in with functional changes.
-Chris


More information about the Intel-gfx-trybot mailing list