[Intel-gfx] [PATCH v2] drm/i915/uc: Start preparing GuC/HuC for reset

Sagar Arun Kamble sagar.a.kamble at intel.com
Tue Feb 27 07:07:48 UTC 2018



On 2/26/2018 10:27 PM, Daniele Ceraolo Spurio wrote:
>
>
> On 25/02/18 22:17, Sagar Arun Kamble wrote:
>>
>>
>> On 2/23/2018 10:31 PM, Daniele Ceraolo Spurio wrote:
>>>
>>>
>>> On 23/02/18 06:04, Michal Wajdeczko wrote:
>>>> Right after GPU reset there will be a small window of time during 
>>>> which
>>>> some of GuC/HuC fields will still show state before reset. Let's start
>>>> to fix that by sanitizing firmware status as we will use it shortly.
>>>>
>>>> v2: s/reset_prepare/prepare_to_reset (Michel)
>>>>      don't forget about gem_sanitize path (Daniele)
>>>>
>>>> Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
>>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>>> Cc: Sagar Arun Kamble <sagar.a.kamble at intel.com>
>>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>>> Cc: Michel Thierry <michel.thierry at intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/i915_gem.c    |  5 ++++-
>>>>   drivers/gpu/drm/i915/intel_guc.h   |  5 +++++
>>>>   drivers/gpu/drm/i915/intel_huc.h   |  5 +++++
>>>>   drivers/gpu/drm/i915/intel_uc.c    | 14 ++++++++++++++
>>>>   drivers/gpu/drm/i915/intel_uc.h    |  1 +
>>>>   drivers/gpu/drm/i915/intel_uc_fw.h |  6 ++++++
>>>>   6 files changed, 35 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>>>> b/drivers/gpu/drm/i915/i915_gem.c
>>>> index 14c855b..ae2c4ba 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>> @@ -2981,6 +2981,7 @@ int i915_gem_reset_prepare(struct 
>>>> drm_i915_private *dev_priv)
>>>>       }
>>>>         i915_gem_revoke_fences(dev_priv);
>>>> +    intel_uc_prepare_to_reset(dev_priv);
>>>>         return err;
>>>>   }
>>>> @@ -4881,8 +4882,10 @@ void i915_gem_sanitize(struct 
>>>> drm_i915_private *i915)
>>>>        * it may impact the display and we are uncertain about the 
>>>> stability
>>>>        * of the reset, so this could be applied to even earlier gen.
>>>>        */
>>>> -    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
>>>> +    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915)) {
>>>> +        intel_uc_prepare_to_reset(i915);
>>>
>>> This leaves the status with an incorrect value if we boot with 
>>> i915.reset=0, 
>> It depends on whether WOPCM is locked (In case of resume from S3 I 
>> have seen it to be the case often).
>> Then we need not reload GuC also unless we are not doing full GPU reset.
>>> but I think this is still the right place to add this in. 
>> Yes
>>> There are several things with GuC that are going to break if we use 
>>> reset=0 (e.g. doorbell cleanup) 
>> Can you elaborate how it might break.
>> i915 isn't currently communicating to GuC (destroy_doorbell) during 
>> doorbell cleanup and if we start communicating then it should
>> not fail as GuC will be available with reset=0.  Also 
>> __intel_uc_reset_hw isn't gated by reset modparam.
>
> As you said we do always reset GuC no matter the value of the 
> modparam, but that does not reset the doorbell HW. If we're coming out 
> of S3 and the state as been preserved this will cause the doorbell HW 
> to be left in an unclean state, which could cause spurious doorbell 
> interrupts to be sent to GuC, not sure how the firmware handles those. 
> The code as moved since last time I looked at this in detail and I 
> think we're now most likely going to overwrite those unclean 
> doorbells, but there are unlikely corner cases (preempt context 
> failing to be created) where we might not do so.
> More generally, my concern was that in the code flow we assume GuC and 
> related HW to be reset and in need of a re-init when we come out of 
> suspend when actually as you reported that might not be the case if we 
> have reset=0. Even if we have no major concerns now, issues might 
> arise in the future after code reworks or new feature additions if we 
> start from a wrong assumption. Instead of changing the flow to 
> consider the reset=0 (which isn't really a supported scenario) I think 
> it'd be more useful to just enforce the fact that we don't support 
> that use-case with GuC, hence my suggestion. And yes, I'm probably 
> just being uber-paranoid :P
>
Makes sense ..... Agree on sanitizing with GuC to now allow reset=0
We could also fix this if we could reset doorbell unit alone at resume 
and acquire needed doorbells but AFAIK earlier guc_init_doorbell_hw is 
the way to reset all doorbells (that needed GuC). As you said we can 
skip these changes though since reset=0 isn't supported scenario.
> Daniele
>
>>> so I wouldn't consider this a regression, but we might want to start 
>>> sanitizing the modparams to not allow reset=0 with GuC.
>>>
>>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>>
>>> Daniele
>>>
>>>> WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
>>>> +    }
>>>>   }
>>>>     int i915_gem_suspend(struct drm_i915_private *dev_priv)
>>>> diff --git a/drivers/gpu/drm/i915/intel_guc.h 
>>>> b/drivers/gpu/drm/i915/intel_guc.h
>>>> index 52856a9..0f6adb1 100644
>>>> --- a/drivers/gpu/drm/i915/intel_guc.h
>>>> +++ b/drivers/gpu/drm/i915/intel_guc.h
>>>> @@ -132,4 +132,9 @@ static inline u32 guc_ggtt_offset(struct 
>>>> i915_vma *vma)
>>>>   struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, 
>>>> u32 size);
>>>>   u32 intel_guc_wopcm_size(struct drm_i915_private *dev_priv);
>>>>   +static inline void intel_guc_prepare_to_reset(struct intel_guc 
>>>> *guc)
>>>> +{
>>>> +    intel_uc_fw_prepare_to_reset(&guc->fw);
>>>> +}
>>>> +
>>>>   #endif
>>>> diff --git a/drivers/gpu/drm/i915/intel_huc.h 
>>>> b/drivers/gpu/drm/i915/intel_huc.h
>>>> index 40039db..96e24f9 100644
>>>> --- a/drivers/gpu/drm/i915/intel_huc.h
>>>> +++ b/drivers/gpu/drm/i915/intel_huc.h
>>>> @@ -38,4 +38,9 @@ struct intel_huc {
>>>>   int intel_huc_init_hw(struct intel_huc *huc);
>>>>   int intel_huc_auth(struct intel_huc *huc);
>>>>   +static inline void intel_huc_prepare_to_reset(struct intel_huc 
>>>> *huc)
>>>> +{
>>>> +    intel_uc_fw_prepare_to_reset(&huc->fw);
>>>> +}
>>>> +
>>>>   #endif
>>>> diff --git a/drivers/gpu/drm/i915/intel_uc.c 
>>>> b/drivers/gpu/drm/i915/intel_uc.c
>>>> index 9f1bac6..8042d4b 100644
>>>> --- a/drivers/gpu/drm/i915/intel_uc.c
>>>> +++ b/drivers/gpu/drm/i915/intel_uc.c
>>>> @@ -445,3 +445,17 @@ void intel_uc_fini_hw(struct drm_i915_private 
>>>> *dev_priv)
>>>>       if (USES_GUC_SUBMISSION(dev_priv))
>>>>           gen9_disable_guc_interrupts(dev_priv);
>>>>   }
>>>> +
>>>> +void intel_uc_prepare_to_reset(struct drm_i915_private *i915)
>>>> +{
>>>> +    struct intel_huc *huc = &i915->huc;
>>>> +    struct intel_guc *guc = &i915->guc;
>>>> +
>>>> +    if (!USES_GUC(i915))
>>>> +        return;
>>>> +
>>>> +    GEM_BUG_ON(!HAS_GUC(i915));
>>>> +
>>>> +    intel_huc_prepare_to_reset(huc);
>>>> +    intel_guc_prepare_to_reset(guc);
>>>> +}
>>>> diff --git a/drivers/gpu/drm/i915/intel_uc.h 
>>>> b/drivers/gpu/drm/i915/intel_uc.h
>>>> index f2984e0..7a8ae58 100644
>>>> --- a/drivers/gpu/drm/i915/intel_uc.h
>>>> +++ b/drivers/gpu/drm/i915/intel_uc.h
>>>> @@ -39,6 +39,7 @@
>>>>   void intel_uc_fini_hw(struct drm_i915_private *dev_priv);
>>>>   int intel_uc_init(struct drm_i915_private *dev_priv);
>>>>   void intel_uc_fini(struct drm_i915_private *dev_priv);
>>>> +void intel_uc_prepare_to_reset(struct drm_i915_private *dev_priv);
>>>>     static inline bool intel_uc_is_using_guc(void)
>>>>   {
>>>> diff --git a/drivers/gpu/drm/i915/intel_uc_fw.h 
>>>> b/drivers/gpu/drm/i915/intel_uc_fw.h
>>>> index d5fd460..f1ee653 100644
>>>> --- a/drivers/gpu/drm/i915/intel_uc_fw.h
>>>> +++ b/drivers/gpu/drm/i915/intel_uc_fw.h
>>>> @@ -115,6 +115,12 @@ static inline bool 
>>>> intel_uc_fw_is_selected(struct intel_uc_fw *uc_fw)
>>>>       return uc_fw->path != NULL;
>>>>   }
>>>>   +static inline void intel_uc_fw_prepare_to_reset(struct 
>>>> intel_uc_fw *uc_fw)
>>>> +{
>>>> +    if (uc_fw->load_status == INTEL_UC_FIRMWARE_SUCCESS)
>>>> +        uc_fw->load_status = INTEL_UC_FIRMWARE_PENDING;
>>>> +}
>>>> +
>>>>   void intel_uc_fw_fetch(struct drm_i915_private *dev_priv,
>>>>                  struct intel_uc_fw *uc_fw);
>>>>   int intel_uc_fw_upload(struct intel_uc_fw *uc_fw,
>>>>
>>

-- 
Thanks,
Sagar



More information about the Intel-gfx mailing list