[PATCH 1/6] drm/i915/guc: Handle GuC interaction in reset/suspend/unload scenarios
Chris Wilson
chris at chris-wilson.co.uk
Thu Aug 3 10:01:44 UTC 2017
Quoting Sagar Arun Kamble (2017-08-03 09:20:19)
> Tearing down of guc_ggtt_invalidate/guc_interrupts/guc_communication
> setup should happen towards end of reset/suspend/unload as these are
> setup back again during recovery/resume/load.
>
> Prepared helper intel_guc_finish that will do teardown of this setup
> along with optional notification to GuC about suspension.
> Then, during unload we can rely on intel_guc_finish being done as part
> of gem_suspend for disabling these interfaces.
> But will have to disable GuC submission prior to suspend as that involves
> communication with GuC to destroy doorbell.
> Another helper intel_guc_prepare will be useful for rebuilding the setup
> which optionally resume GuC in case of RPM resume.
>
> drv_hangman at error-state-basic and gem_exec_suspend at basic-s3 caught
> below kernel bugs with GuC.
> kernel BUG at drivers/gpu/drm/i915/i915_gem_gtt.c:3086!
> GEM_BUG_ON(i915->ggtt.invalidate != gen6_ggtt_invalidate);
>
> This was happening since ggtt_disable_guc was not done prior to reset
> and suspend.
>
> drv_module_reload at basic-reload failure due to attempt to release doorbell
> post gem_suspend that resets GuC too.
> [drm] INTEL_GUC_SEND: Action 0x20 failed; ret=-110 status=0x00000020 response=0x00000000
>
> Also moving intel_guc_suspend and intel_guc_resume to intel_uc.c.
> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Cc: Tomi Sarvela <tomi.p.sarvela at intel.com>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble at intel.com>
> ---
> drivers/gpu/drm/i915/i915_drv.c | 4 +-
> drivers/gpu/drm/i915/i915_gem.c | 11 ++-
> drivers/gpu/drm/i915/i915_guc_submission.c | 58 ++--------------
> drivers/gpu/drm/i915/intel_guc_log.c | 3 +
> drivers/gpu/drm/i915/intel_pm.c | 2 +
> drivers/gpu/drm/i915/intel_uc.c | 106 +++++++++++++++++++++++++----
> drivers/gpu/drm/i915/intel_uc.h | 4 ++
> 7 files changed, 118 insertions(+), 70 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 214555e..9eee99e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -2485,7 +2485,7 @@ static int intel_runtime_suspend(struct device *kdev)
> */
> i915_gem_runtime_suspend(dev_priv);
>
> - intel_guc_suspend(dev_priv);
> + intel_guc_finish(dev_priv);
I would personally like to have intel_guc_runtime_suspend(). (The goal
is has a consistent vocabulary for certain phases, and at the lowest
level have those break out into the steps such as intel_guc_finish.)
> intel_runtime_pm_disable_interrupts(dev_priv);
>
> @@ -2570,7 +2570,7 @@ static int intel_runtime_resume(struct device *kdev)
> if (intel_uncore_unclaimed_mmio(dev_priv))
> DRM_DEBUG_DRIVER("Unclaimed access during suspend, bios?\n");
>
> - intel_guc_resume(dev_priv);
> + intel_guc_prepare(dev_priv);
>
> if (IS_GEN9_LP(dev_priv)) {
> bxt_disable_dc9(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 000a764..89ff702 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2840,6 +2840,11 @@ int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
>
> i915_gem_revoke_fences(dev_priv);
>
> + if (i915.enable_guc_loading) {
> + intel_guc_finish(dev_priv);
> + dev_priv->guc.fw.load_status = INTEL_UC_FIRMWARE_PENDING;
Keep the guc internal logic out of i915_gem.c, just add a
intel_guc_reset_prepare() for us to call.
> + }
> +
> return err;
> }
>
> @@ -4553,9 +4558,13 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv)
>
> assert_kernel_context_is_current(dev_priv);
> i915_gem_contexts_lost(dev_priv);
> + i915_guc_submission_disable(dev_priv);
> mutex_unlock(&dev->struct_mutex);
>
> - intel_guc_suspend(dev_priv);
> + if (i915.enable_guc_loading) {
> + intel_guc_finish(dev_priv);
> + dev_priv->guc.fw.load_status = INTEL_UC_FIRMWARE_PENDING;
> + }
Same as above; this time intel_guc_suspend().
>
> cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
> cancel_delayed_work_sync(&dev_priv->gt.retire_work);
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 48a1e93..48250c4 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -1280,63 +1280,15 @@ void i915_guc_submission_disable(struct drm_i915_private *dev_priv)
> {
> struct intel_guc *guc = &dev_priv->guc;
>
> + if (!i915.enable_guc_submission)
> + return;
> +
> guc_interrupts_release(dev_priv);
>
> /* Revert back to manual ELSP submission */
> intel_engines_reset_default_submission(dev_priv);
>
> - guc_client_free(guc->execbuf_client);
> + if (guc->execbuf_client)
> + guc_client_free(guc->execbuf_client);
> guc->execbuf_client = NULL;
> }
Code motion looks fine, but people get uppity if you mix purely
mechanical changes in with functional changes.
-Chris
More information about the Intel-gfx-trybot
mailing list