[Intel-gfx] [PATCH 3/4] drm/i915/guc: Fix GuC HW/SW state cleanup in unload path
Kamble, Sagar A
sagar.a.kamble at intel.com
Thu Sep 7 10:41:58 UTC 2017
On 9/5/2017 8:24 PM, Michał Winiarski wrote:
> On Fri, Sep 01, 2017 at 11:02:11AM +0530, Sagar Arun Kamble wrote:
>> Teardown of GuC HW/SW state was not properly done in unload path.
>> During unload, we can rely on intel_guc_reset_prepare being done
>> as part of i915_gem_suspend for disabling GuC interfaces.
>> We will have to disable GuC submission prior to suspend as that involves
>> communication with GuC to destroy doorbell. So intel_uc_fini_hw has to
>> be called as part of i915_gem_suspend during unload as that really
>> takes care of finishing the GuC operations. Created new parameter for
>> i915_gem_suspend to handle unload/suspend path w.r.t gem and GuC suspend.
>> GuC related allocations are cleaned up as part of intel_uc_cleanup_hw.
> Is this still accurate description? After changes from v2?
No. Sorry. Will update this in the new version.
>
>> v2: Prepared i915_gem_unload. (Michal)
>>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble at intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_drv.c | 6 +--
>> drivers/gpu/drm/i915/i915_drv.h | 1 +
>> drivers/gpu/drm/i915/i915_gem.c | 79 ++++++++++++++++++++++++++++++++++
>> drivers/gpu/drm/i915/intel_guc.c | 13 ++++++
>> drivers/gpu/drm/i915/intel_guc.h | 1 +
>> drivers/gpu/drm/i915/intel_uc.c | 14 +++---
>> drivers/gpu/drm/i915/intel_uc_common.h | 1 +
>> 7 files changed, 103 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>> index b2e8f95..b6cc2fe 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.c
>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>> @@ -601,7 +601,7 @@ static void i915_gem_fini(struct drm_i915_private *dev_priv)
>> i915_gem_drain_workqueue(dev_priv);
>>
>> mutex_lock(&dev_priv->drm.struct_mutex);
>> - intel_uc_fini_hw(dev_priv);
>> + intel_uc_cleanup_hw(dev_priv);
>> i915_gem_cleanup_engines(dev_priv);
>> i915_gem_contexts_fini(dev_priv);
>> i915_gem_cleanup_userptr(dev_priv);
>> @@ -682,7 +682,7 @@ static int i915_load_modeset_init(struct drm_device *dev)
>> return 0;
>>
>> cleanup_gem:
>> - if (i915_gem_suspend(dev_priv))
>> + if (i915_gem_unload(dev_priv))
>> DRM_ERROR("failed to idle hardware; continuing to unload!\n");
>> i915_gem_fini(dev_priv);
>> cleanup_uc:
>> @@ -1375,7 +1375,7 @@ void i915_driver_unload(struct drm_device *dev)
>>
>> i915_driver_unregister(dev_priv);
>>
>> - if (i915_gem_suspend(dev_priv))
>> + if (i915_gem_unload(dev_priv))
>> DRM_ERROR("failed to idle hardware; continuing to unload!\n");
>>
>> intel_display_power_get(dev_priv, POWER_DOMAIN_INIT);
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 064bf0f..570e71e 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -3628,6 +3628,7 @@ void i915_gem_reset_engine(struct intel_engine_cs *engine,
>> int i915_gem_wait_for_idle(struct drm_i915_private *dev_priv,
>> unsigned int flags);
>> int __must_check i915_gem_suspend(struct drm_i915_private *dev_priv);
>> +int __must_check i915_gem_unload(struct drm_i915_private *dev_priv);
>> void i915_gem_resume(struct drm_i915_private *dev_priv);
>> int i915_gem_fault(struct vm_fault *vmf);
>> int i915_gem_object_wait(struct drm_i915_gem_object *obj,
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 977500f..24cefe9 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -4624,6 +4624,85 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv)
>> return ret;
>> }
>>
>> +int i915_gem_unload(struct drm_i915_private *dev_priv)
>> +{
>> + struct drm_device *dev = &dev_priv->drm;
>> + int ret;
>> +
>> + intel_runtime_pm_get(dev_priv);
>> + intel_suspend_gt_powersave(dev_priv);
>> +
>> + mutex_lock(&dev->struct_mutex);
>> +
>> + /* We have to flush all the executing contexts to main memory so
>> + * that they can saved in the hibernation image. To ensure the last
>> + * context image is coherent, we have to switch away from it. That
>> + * leaves the dev_priv->kernel_context still active when
>> + * we actually suspend, and its image in memory may not match the GPU
>> + * state. Fortunately, the kernel_context is disposable and we do
>> + * not rely on its state.
>> + */
>> + ret = i915_gem_switch_to_kernel_context(dev_priv);
>> + if (ret)
>> + goto err_unlock;
>> +
>> + ret = i915_gem_wait_for_idle(dev_priv,
>> + I915_WAIT_INTERRUPTIBLE |
>> + I915_WAIT_LOCKED);
>> + if (ret)
>> + goto err_unlock;
>> +
>> + assert_kernel_context_is_current(dev_priv);
>> + i915_gem_contexts_lost(dev_priv);
>> + mutex_unlock(&dev->struct_mutex);
>> +
>> + cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
>> + cancel_delayed_work_sync(&dev_priv->gt.retire_work);
>> +
>> + /* As the idle_work is rearming if it detects a race, play safe and
>> + * repeat the flush until it is definitely idle.
>> + */
>> + while (flush_delayed_work(&dev_priv->gt.idle_work))
>> + ;
>> +
>> + /* Assert that we successfully flushed all the work and
>> + * reset the GPU back to its idle, low power state.
>> + */
>> + WARN_ON(dev_priv->gt.awake);
>> + WARN_ON(!intel_engines_are_idle(dev_priv));
>> +
>> + /* Handle GuC suspension in case of unloading/system suspend */
>> + intel_uc_fini_hw(dev_priv);
>> +
>> + /*
>> + * Neither the BIOS, ourselves or any other kernel
>> + * expects the system to be in execlists mode on startup,
>> + * so we need to reset the GPU back to legacy mode. And the only
>> + * known way to disable logical contexts is through a GPU reset.
>> + *
>> + * So in order to leave the system in a known default configuration,
>> + * always reset the GPU upon unload and suspend. Afterwards we then
>> + * clean up the GEM state tracking, flushing off the requests and
>> + * leaving the system in a known idle state.
>> + *
>> + * Note that is of the upmost importance that the GPU is idle and
>> + * all stray writes are flushed *before* we dismantle the backing
>> + * storage for the pinned objects.
>> + *
>> + * However, since we are uncertain that resetting the GPU on older
>> + * machines is a good idea, we don't - just in case it leaves the
>> + * machine in an unusable condition.
>> + */
>> + i915_gem_sanitize(dev_priv);
>> + goto out_rpm_put;
>> +
>> +err_unlock:
>> + mutex_unlock(&dev->struct_mutex);
>> +out_rpm_put:
>> + intel_runtime_pm_put(dev_priv);
>> + return ret;
>> +}
>> +
> ^^^ This is just copypasta of i915_gem_suspend
> One line has changed... (well, I think it's no longer up-to-date with drm-tip,
> so there's more, but s/intel_guc_system_suspend/intel_uc_fini_hw was the
> intention, right?)
> We can do better.
Yes. This is just a line change. Earlier, I parameterised
i915_gem_suspend to differentiate suspend vs unload.
Michal suggested if we could separate these paths for gem. I am okay any
of these paths. Let me know your thoughts.
>
>> void i915_gem_resume(struct drm_i915_private *dev_priv)
>> {
>> struct drm_device *dev = &dev_priv->drm;
>> diff --git a/drivers/gpu/drm/i915/intel_guc.c b/drivers/gpu/drm/i915/intel_guc.c
>> index 1fd8599..da81c37 100644
>> --- a/drivers/gpu/drm/i915/intel_guc.c
>> +++ b/drivers/gpu/drm/i915/intel_guc.c
>> @@ -343,3 +343,16 @@ int intel_guc_system_resume(struct intel_guc *guc)
>> */
>> return ret;
>> }
>> +
>> +void intel_guc_fini(struct intel_guc *guc)
>> +{
>> + struct drm_i915_private *dev_priv = guc_to_i915(guc);
>> + struct drm_device *dev = &dev_priv->drm;
>> +
>> + if (i915.enable_guc_submission) {
>> + mutex_lock(&dev->struct_mutex);
>> + i915_guc_submission_disable(dev_priv);
>> + mutex_unlock(&dev->struct_mutex);
>> + intel_guc_reset_prepare(guc);
>> + }
>> +}
>> diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
>> index bf4dda0..1b3305f 100644
>> --- a/drivers/gpu/drm/i915/intel_guc.h
>> +++ b/drivers/gpu/drm/i915/intel_guc.h
>> @@ -169,6 +169,7 @@ static inline u32 guc_ggtt_offset(struct i915_vma *vma)
>> int intel_guc_runtime_resume(struct intel_guc *guc);
>> int intel_guc_system_suspend(struct intel_guc *guc);
>> int intel_guc_system_resume(struct intel_guc *guc);
>> +void intel_guc_fini(struct intel_guc *guc);
>>
>> static inline int intel_guc_send(struct intel_guc *guc, const u32 *action,
>> u32 len)
>> diff --git a/drivers/gpu/drm/i915/intel_uc.c b/drivers/gpu/drm/i915/intel_uc.c
>> index 30c004c..9216c73 100644
>> --- a/drivers/gpu/drm/i915/intel_uc.c
>> +++ b/drivers/gpu/drm/i915/intel_uc.c
>> @@ -381,20 +381,16 @@ int intel_uc_init_hw(struct drm_i915_private *dev_priv)
>>
>> void intel_uc_fini_hw(struct drm_i915_private *dev_priv)
>> {
>> + intel_guc_fini(&dev_priv->guc);
>> +}
>> +
>> +void intel_uc_cleanup_hw(struct drm_i915_private *dev_priv)
>> +{
>> guc_free_load_err_log(&dev_priv->guc);
> Not really part of the patch... But still - why are we touching this here?
> Isn't guc->log->vma is created only if GuC is loaded?
This is to differentiate the fini and cleanup. guc->log->vma is create
only if GuC is loaded.
I will move this past guc_loading check below.
>>
>> if (!i915.enable_guc_loading)
>> return;
>>
>> if (i915.enable_guc_submission)
>> - i915_guc_submission_disable(dev_priv);
>> -
>> - intel_guc_disable_communication(&dev_priv->guc);
>> -
>> - if (i915.enable_guc_submission) {
>> - gen9_disable_guc_interrupts(dev_priv);
>> i915_guc_submission_fini(dev_priv);
>> - }
>> -
>> - i915_ggtt_disable_guc(dev_priv);
>> }
> We now have intel_uc_fini_hw and intel_uc_cleanup_hw.
>
> intel_uc_fini_hw is called by i915_gem_unload, intel_uc_cleanup_hw is called by
> i915_gem_fini - so, it's in that order, we're first calling fini, then calling
> cleanup.
> Both are only used on unload path, while intel_uc_init_hw, is called on load,
> resume, and reset (by i915_gem_init_hw)
> We should probably redo the naming at this point... Perhaps stick with the
> unload for rather than fini_hw? Or at least comment on the ordering.
Sure. I will make this unload then.
>
> -Michał
>
>> diff --git a/drivers/gpu/drm/i915/intel_uc_common.h b/drivers/gpu/drm/i915/intel_uc_common.h
>> index f454f73..c6e873d 100644
>> --- a/drivers/gpu/drm/i915/intel_uc_common.h
>> +++ b/drivers/gpu/drm/i915/intel_uc_common.h
>> @@ -97,5 +97,6 @@ struct intel_uc_fw {
>> void intel_uc_fini_fw(struct drm_i915_private *dev_priv);
>> int intel_uc_init_hw(struct drm_i915_private *dev_priv);
>> void intel_uc_fini_hw(struct drm_i915_private *dev_priv);
>> +void intel_uc_cleanup_hw(struct drm_i915_private *dev_priv);
>>
>> #endif
>> --
>> 1.9.1
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
More information about the Intel-gfx
mailing list