[Intel-gfx] [PATCH v2] drm/i915/uc: Start preparing GuC/HuC for reset

Sagar Arun Kamble sagar.a.kamble at intel.com
Tue Feb 27 06:54:46 UTC 2018



On 2/27/2018 2:22 AM, Chris Wilson wrote:
> Quoting Daniele Ceraolo Spurio (2018-02-26 16:57:11)
>>
>> On 25/02/18 22:17, Sagar Arun Kamble wrote:
>>>
>>> On 2/23/2018 10:31 PM, Daniele Ceraolo Spurio wrote:
>>>>
>>>> On 23/02/18 06:04, Michal Wajdeczko wrote:
>>>>> Right after GPU reset there will be a small window of time during which
>>>>> some of GuC/HuC fields will still show state before reset. Let's start
>>>>> to fix that by sanitizing firmware status as we will use it shortly.
>>>>>
>>>>> v2: s/reset_prepare/prepare_to_reset (Michel)
>>>>>       don't forget about gem_sanitize path (Daniele)
>>>>>
>>>>> Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>>>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
>>>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>>>> Cc: Sagar Arun Kamble <sagar.a.kamble at intel.com>
>>>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>>>> Cc: Michel Thierry <michel.thierry at intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/i915_gem.c    |  5 ++++-
>>>>>    drivers/gpu/drm/i915/intel_guc.h   |  5 +++++
>>>>>    drivers/gpu/drm/i915/intel_huc.h   |  5 +++++
>>>>>    drivers/gpu/drm/i915/intel_uc.c    | 14 ++++++++++++++
>>>>>    drivers/gpu/drm/i915/intel_uc.h    |  1 +
>>>>>    drivers/gpu/drm/i915/intel_uc_fw.h |  6 ++++++
>>>>>    6 files changed, 35 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>>>>> b/drivers/gpu/drm/i915/i915_gem.c
>>>>> index 14c855b..ae2c4ba 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>>> @@ -2981,6 +2981,7 @@ int i915_gem_reset_prepare(struct
>>>>> drm_i915_private *dev_priv)
>>>>>        }
>>>>>          i915_gem_revoke_fences(dev_priv);
>>>>> +    intel_uc_prepare_to_reset(dev_priv);
>>>>>          return err;
>>>>>    }
>>>>> @@ -4881,8 +4882,10 @@ void i915_gem_sanitize(struct drm_i915_private
>>>>> *i915)
>>>>>         * it may impact the display and we are uncertain about the
>>>>> stability
>>>>>         * of the reset, so this could be applied to even earlier gen.
>>>>>         */
>>>>> -    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
>>>>> +    if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915)) {
>>>>> +        intel_uc_prepare_to_reset(i915);
>>>> This leaves the status with an incorrect value if we boot with
>>>> i915.reset=0,
>>> It depends on whether WOPCM is locked (In case of resume from S3 I have
>>> seen it to be the case often).
>>> Then we need not reload GuC also unless we are not doing full GPU reset.
>>>> but I think this is still the right place to add this in.
>>> Yes
>>>> There are several things with GuC that are going to break if we use
>>>> reset=0 (e.g. doorbell cleanup)
>>> Can you elaborate how it might break.
>>> i915 isn't currently communicating to GuC (destroy_doorbell) during
>>> doorbell cleanup and if we start communicating then it should
>>> not fail as GuC will be available with reset=0.  Also
>>> __intel_uc_reset_hw isn't gated by reset modparam.
>> As you said we do always reset GuC no matter the value of the modparam,
>> but that does not reset the doorbell HW. If we're coming out of S3 and
>> the state as been preserved this will cause the doorbell HW to be left
>> in an unclean state, which could cause spurious doorbell interrupts to
>> be sent to GuC, not sure how the firmware handles those. The code as
>> moved since last time I looked at this in detail and I think we're now
>> most likely going to overwrite those unclean doorbells, but there are
>> unlikely corner cases (preempt context failing to be created) where we
>> might not do so.
> I'm still going "wait, we can put the device into D3 and the GuC is
> still powered?" Something feels wrong if the GuC retains state after the
> HW is powered down.
GuC will be powered down, with RC6. Just that firmware in WOPCM can get 
wiped off if
memory is reset/powered down during resume. In case of mem sleep 
generally WOPCM stays intact and if we exit
RC6 on resume from sleep, firmware will be restored into GuC without 
driver intervention.
But since we do full GPU reset as part of sanitize we have to load it 
from driver again.
>   (So I'm wondering why this isn't just part of the
> normal guc init path for module load/resume.)
> -Chris

-- 
Thanks,
Sagar



More information about the Intel-gfx mailing list