[Intel-gfx] [PATCH v2] drm/i915/uc: Start preparing GuC/HuC for reset

Daniele Ceraolo Spurio daniele.ceraolospurio at intel.com
Mon Feb 26 16:57:11 UTC 2018



On 25/02/18 22:17, Sagar Arun Kamble wrote:
> 
> 
> On 2/23/2018 10:31 PM, Daniele Ceraolo Spurio wrote:
>>
>>
>> On 23/02/18 06:04, Michal Wajdeczko wrote:
>>> Right after GPU reset there will be a small window of time during which
>>> some of GuC/HuC fields will still show state before reset. Let's start
>>> to fix that by sanitizing firmware status as we will use it shortly.
>>>
>>> v2: s/reset_prepare/prepare_to_reset (Michel)
>>>      don't forget about gem_sanitize path (Daniele)
>>>
>>> Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>> Cc: Sagar Arun Kamble <sagar.a.kamble at intel.com>
>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Michel Thierry <michel.thierry at intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem.c    |  5 ++++-
>>>   drivers/gpu/drm/i915/intel_guc.h   |  5 +++++
>>>   drivers/gpu/drm/i915/intel_huc.h   |  5 +++++
>>>   drivers/gpu/drm/i915/intel_uc.c    | 14 ++++++++++++++
>>>   drivers/gpu/drm/i915/intel_uc.h    |  1 +
>>>   drivers/gpu/drm/i915/intel_uc_fw.h |  6 ++++++
>>>   6 files changed, 35 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>>> b/drivers/gpu/drm/i915/i915_gem.c
>>> index 14c855b..ae2c4ba 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -2981,6 +2981,7 @@ int i915_gem_reset_prepare(struct 
>>> drm_i915_private *dev_priv)
>>>       }
>>>         i915_gem_revoke_fences(dev_priv);
>>> +    intel_uc_prepare_to_reset(dev_priv);
>>>         return err;
>>>   }
>>> @@ -4881,8 +4882,10 @@ void i915_gem_sanitize(struct drm_i915_private 
>>> *i915)
>>>        * it may impact the display and we are uncertain about the 
>>> stability
>>>        * of the reset, so this could be applied to even earlier gen.
>>>        */
>>> -    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
>>> +    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915)) {
>>> +        intel_uc_prepare_to_reset(i915);
>>
>> This leaves the status with an incorrect value if we boot with 
>> i915.reset=0, 
> It depends on whether WOPCM is locked (In case of resume from S3 I have 
> seen it to be the case often).
> Then we need not reload GuC also unless we are not doing full GPU reset.
>> but I think this is still the right place to add this in. 
> Yes
>> There are several things with GuC that are going to break if we use 
>> reset=0 (e.g. doorbell cleanup) 
> Can you elaborate how it might break.
> i915 isn't currently communicating to GuC (destroy_doorbell) during 
> doorbell cleanup and if we start communicating then it should
> not fail as GuC will be available with reset=0.  Also 
> __intel_uc_reset_hw isn't gated by reset modparam.

As you said we do always reset GuC no matter the value of the modparam, 
but that does not reset the doorbell HW. If we're coming out of S3 and 
the state as been preserved this will cause the doorbell HW to be left 
in an unclean state, which could cause spurious doorbell interrupts to 
be sent to GuC, not sure how the firmware handles those. The code as 
moved since last time I looked at this in detail and I think we're now 
most likely going to overwrite those unclean doorbells, but there are 
unlikely corner cases (preempt context failing to be created) where we 
might not do so.
More generally, my concern was that in the code flow we assume GuC and 
related HW to be reset and in need of a re-init when we come out of 
suspend when actually as you reported that might not be the case if we 
have reset=0. Even if we have no major concerns now, issues might arise 
in the future after code reworks or new feature additions if we start 
from a wrong assumption. Instead of changing the flow to consider the 
reset=0 (which isn't really a supported scenario) I think it'd be more 
useful to just enforce the fact that we don't support that use-case with 
GuC, hence my suggestion. And yes, I'm probably just being uber-paranoid :P

Daniele

>> so I wouldn't consider this a regression, but we might want to start 
>> sanitizing the modparams to not allow reset=0 with GuC.
>>
>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>
>> Daniele
>>
>>>           WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
>>> +    }
>>>   }
>>>     int i915_gem_suspend(struct drm_i915_private *dev_priv)
>>> diff --git a/drivers/gpu/drm/i915/intel_guc.h 
>>> b/drivers/gpu/drm/i915/intel_guc.h
>>> index 52856a9..0f6adb1 100644
>>> --- a/drivers/gpu/drm/i915/intel_guc.h
>>> +++ b/drivers/gpu/drm/i915/intel_guc.h
>>> @@ -132,4 +132,9 @@ static inline u32 guc_ggtt_offset(struct i915_vma 
>>> *vma)
>>>   struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 
>>> size);
>>>   u32 intel_guc_wopcm_size(struct drm_i915_private *dev_priv);
>>>   +static inline void intel_guc_prepare_to_reset(struct intel_guc *guc)
>>> +{
>>> +    intel_uc_fw_prepare_to_reset(&guc->fw);
>>> +}
>>> +
>>>   #endif
>>> diff --git a/drivers/gpu/drm/i915/intel_huc.h 
>>> b/drivers/gpu/drm/i915/intel_huc.h
>>> index 40039db..96e24f9 100644
>>> --- a/drivers/gpu/drm/i915/intel_huc.h
>>> +++ b/drivers/gpu/drm/i915/intel_huc.h
>>> @@ -38,4 +38,9 @@ struct intel_huc {
>>>   int intel_huc_init_hw(struct intel_huc *huc);
>>>   int intel_huc_auth(struct intel_huc *huc);
>>>   +static inline void intel_huc_prepare_to_reset(struct intel_huc *huc)
>>> +{
>>> +    intel_uc_fw_prepare_to_reset(&huc->fw);
>>> +}
>>> +
>>>   #endif
>>> diff --git a/drivers/gpu/drm/i915/intel_uc.c 
>>> b/drivers/gpu/drm/i915/intel_uc.c
>>> index 9f1bac6..8042d4b 100644
>>> --- a/drivers/gpu/drm/i915/intel_uc.c
>>> +++ b/drivers/gpu/drm/i915/intel_uc.c
>>> @@ -445,3 +445,17 @@ void intel_uc_fini_hw(struct drm_i915_private 
>>> *dev_priv)
>>>       if (USES_GUC_SUBMISSION(dev_priv))
>>>           gen9_disable_guc_interrupts(dev_priv);
>>>   }
>>> +
>>> +void intel_uc_prepare_to_reset(struct drm_i915_private *i915)
>>> +{
>>> +    struct intel_huc *huc = &i915->huc;
>>> +    struct intel_guc *guc = &i915->guc;
>>> +
>>> +    if (!USES_GUC(i915))
>>> +        return;
>>> +
>>> +    GEM_BUG_ON(!HAS_GUC(i915));
>>> +
>>> +    intel_huc_prepare_to_reset(huc);
>>> +    intel_guc_prepare_to_reset(guc);
>>> +}
>>> diff --git a/drivers/gpu/drm/i915/intel_uc.h 
>>> b/drivers/gpu/drm/i915/intel_uc.h
>>> index f2984e0..7a8ae58 100644
>>> --- a/drivers/gpu/drm/i915/intel_uc.h
>>> +++ b/drivers/gpu/drm/i915/intel_uc.h
>>> @@ -39,6 +39,7 @@
>>>   void intel_uc_fini_hw(struct drm_i915_private *dev_priv);
>>>   int intel_uc_init(struct drm_i915_private *dev_priv);
>>>   void intel_uc_fini(struct drm_i915_private *dev_priv);
>>> +void intel_uc_prepare_to_reset(struct drm_i915_private *dev_priv);
>>>     static inline bool intel_uc_is_using_guc(void)
>>>   {
>>> diff --git a/drivers/gpu/drm/i915/intel_uc_fw.h 
>>> b/drivers/gpu/drm/i915/intel_uc_fw.h
>>> index d5fd460..f1ee653 100644
>>> --- a/drivers/gpu/drm/i915/intel_uc_fw.h
>>> +++ b/drivers/gpu/drm/i915/intel_uc_fw.h
>>> @@ -115,6 +115,12 @@ static inline bool 
>>> intel_uc_fw_is_selected(struct intel_uc_fw *uc_fw)
>>>       return uc_fw->path != NULL;
>>>   }
>>>   +static inline void intel_uc_fw_prepare_to_reset(struct intel_uc_fw 
>>> *uc_fw)
>>> +{
>>> +    if (uc_fw->load_status == INTEL_UC_FIRMWARE_SUCCESS)
>>> +        uc_fw->load_status = INTEL_UC_FIRMWARE_PENDING;
>>> +}
>>> +
>>>   void intel_uc_fw_fetch(struct drm_i915_private *dev_priv,
>>>                  struct intel_uc_fw *uc_fw);
>>>   int intel_uc_fw_upload(struct intel_uc_fw *uc_fw,
>>>
> 


More information about the Intel-gfx mailing list