[Intel-gfx] [PATCH v2] drm/i915/huc: Don't try to check HuC status if it's not loaded

Tue May 21 11:08:52 UTC 2019

On Tue, 21 May 2019 13:04:12 +0200, Ye, Tony <tony.ye at intel.com> wrote:

> There are two users of I915_PARAM_HUC_STATUS, the intel-media driver and  
> the legacy intel-vaapi-driver.
>
> Both drivers check the return value like this:
>
>      has_huc = !! ret_value;
>
> So the ABI change will break the existing stack because negative values  
> are treated as 1, won't it?

No, since negative errors returned by i915 intel_huc_check_status()  
function
will be only used to indicate ioctl() call failure and then stored in  
errno.
Note that your "ret_value" will be modified only when ioctl is successful.

Michal

>
> On 5/20/2019 8:16 PM, Michal Wajdeczko wrote:
>
>> On Mon, 20 May 2019 12:44:43 +0200, Chris Wilson  
>> <chris at chris-wilson.co.uk> wrote:
>>
>>> Quoting Michal Wajdeczko (2019-05-20 11:24:37)
>>>> On Mon, 20 May 2019 11:35:26 +0200, Chris Wilson
>>>> <chris at chris-wilson.co.uk> wrote:
>>>>
>>>> > Quoting Michal Wajdeczko (2019-05-19 22:50:43)
>>>> >> If we never attempted to load HuC firmware, or we already wedged
>>>> >> or reset GuC/HuC, then there is no reason to wake up the device
>>>> >> to check one bit in the register that will be for sure cleared.
>>>> >>
>>>> >> v2: check if HuC was enabled; subtle change in ABI
>>>> >>     reuse hus_is_load helper
>>>> >>
>>>> >> Suggested-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
>>>> >> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>>> >> Cc: Tony Ye <tony.ye at intel.com>
>>>> >> ---
>>>> >>  drivers/gpu/drm/i915/intel_huc.c | 20 ++++++++++++--------
>>>> >>  drivers/gpu/drm/i915/intel_huc.h |  5 +++++
>>>> >>  2 files changed, 17 insertions(+), 8 deletions(-)
>>>> >>
>>>> >> diff --git a/drivers/gpu/drm/i915/intel_huc.c
>>>> >> b/drivers/gpu/drm/i915/intel_huc.c
>>>> >> index 1ff1fb015e58..bfdebec1cfc8 100644
>>>> >> --- a/drivers/gpu/drm/i915/intel_huc.c
>>>> >> +++ b/drivers/gpu/drm/i915/intel_huc.c
>>>> >> @@ -113,7 +113,7 @@ int intel_huc_auth(struct intel_huc *huc)
>>>> >>         u32 status;
>>>> >>         int ret;
>>>> >>
>>>> >> -       if (huc->fw.load_status != INTEL_UC_FIRMWARE_SUCCESS)
>>>> >> +       if (!intel_huc_is_loaded(huc))
>>>> >>                 return -ENOEXEC;
>>>> >>
>>>> >>         ret = intel_guc_auth_huc(guc,
>>>> >> @@ -150,21 +150,25 @@ int intel_huc_auth(struct intel_huc *huc)
>>>> >>   * This function reads status register to verify if HuC
>>>> >>   * firmware was successfully loaded.
>>>> >>   *
>>>> >> - * Returns: 1 if HuC firmware is loaded and verified,
>>>> >> - * 0 if HuC firmware is not loaded and -ENODEV if HuC
>>>> >> - * is not present on this platform.
>>>> >> + * Returns: 1 if HuC firmware is loaded and verified, 0 if HuC
>>>> >> firmware is not
>>>> >> + * enabled, -ENOPKG if HuC firmware is not loaded and -ENODEV if  
>>>> HuC
>>>> >> is not
>>>> >> + * present on this platform.
>>>> >>   */
>>>> >>  int intel_huc_check_status(struct intel_huc *huc)
>>>> >>  {
>>>> >>         struct drm_i915_private *dev_priv = huc_to_i915(huc);
>>>> >>         intel_wakeref_t wakeref;
>>>> >> -       bool status = false;
>>>> >> +       bool verified = false;
>>>> >>
>>>> >>         if (!HAS_HUC(dev_priv))
>>>> >>                 return -ENODEV;
>>>> >>
>>>> >> -       with_intel_runtime_pm(dev_priv, wakeref)
>>>> >> -               status = I915_READ(HUC_STATUS2) & HUC_FW_VERIFIED;
>>>> >> +       if (!USES_HUC(dev_priv))
>>>> >> +               return 0;
>>>> >
>>>> > Hmm, EOPNOTSUPP here for the user disabled case?
>>>>
>>>> I'm not sure that user disabled case shall be reported as error,
>>>> since it perfectly fits into legacy ABI 0="not loaded".
>>>> The only small change is that now 0="not loaded (not enabled by user)"
>>>
>>> The only requirement for the intel-media driver seems to "HUC_STATUS >  
>>> 0".
>>> That's our only user, so I think we have the liberty to redefine <=0 as
>>> befits error reporting.
>>>
>>>>
>>>> >
>>>> > Then
>>>> >
>>>> > if (!intel_huc_is_loaded(huc))
>>>> >       return -ENOPKG;
>>>> >
>>>> > bool verified = false;
>>>> > with_intel_runtime_pm(dev_priv, wakeref)
>>>> >       verified = I915_READ(HUC_STATUS2) & HUC_FW_VERIFIED;
>>>> > return verified.
>>>> >
>>>> > That keeps it a bit flatter with the special casing separate and  
>>>> the 0/1
>>>> > result as given by the register status. Then if negative, we know  
>>>> one of
>>>> > the preconditions for using HuC is not available, and if 0  
>>>> something went
>>>> > wrong in mmio setup.
>>>> >
>>>> > Does that make sense?
>>>>
>>>> IMHO, if something went wrong it is better to report it as error  
>>>> rather
>>>> then weak 0 status. Compare:
>>>
>>> I disagree, <0 is the weak case. 0 is the HW reports something went
>>> wrong (and only that). 1 is the HW reports all went well.
>>
>> But I was assuming that we're defining ABI here, not simply exposing  
>> unified
>> register value. Since we want 1 to report all ok, and we can fail for  
>> many
>> reasons (no hw, no fw blob, wrong fw, bad signature, ...) we should at  
>> least
>> try to match all these failures into error code (today just ENOPKG, but  
>> later
>> we can extend into ENOEXEC=bad fw, ENOSPC=can't fit into WOPCM, ...).
>>
>> On other hand, at least for me, user decision to disable HuC should not
>> be treated the same way as other real failures, so 0 seems like perfect  
>> match.
>>
>> Then we'll have: 1 = enabled, 0 = disabled, <0 = problem
>>
>>>
>>> If we have more ways we can report the HW went wrong, sure expand that
>>> into extra error codes, but I don't see that. And if it's in the regs,
>>> why are not exporting them via reg_read_ioctl?
>>>
>>>> +---------------+-------+-------+-------+
>>>>   no hardware     -ENODEV -ENODEV -ENODEV
>>>>   fw disabled        0       0    -ENOTSUP
>>>>   fw not loaded      0    -ENOPKG -ENOPKG
>>>>   fw not verified*   0    -ENOPKG    0
>>>>   fw verified*       1       1       1
>>>> +---------------+-------+-------+-------+
>>>> *) registry access
>>>>
>>>> Note that in your case, fw verification problem can be reported both
>>>> as -ENOPKG or as 0 (former when verification fails at load time,  
>>>> latter
>>>> as result of runtime reg read). Very likely we will never see 0 at  
>>>> all,
>>>> since probably that can't change once fw was verified at load time.
>>>
>>> I see more distinction for ENODEV/ENOTSUP/ENOPKG/0/1. I am not getting
>>> your point that we have conflation here, as to me it looks more
>>
>> Note that we are verifying HuC firmware right after it was loaded into
>> hardware using exactly the same register as here (see intel_huc_auth),
>> and if this fails we immediately change fw status to FAIL, so we never
>> go beyond huc_is_loaded point.
>>
>> Assuming that once fw is verified it can't fail (to be verified later)
>> then your codes will be only ENODEV/ENOTSUP/ENOPKG/1 (no 0 value)
>>
>>> distinct, and it is clear when we report the value as given by the
>>> register and when we give an interpreted value for a driver error.
>>
>> Note that your initial idea was to reuse driver maintained fw status
>> instead of waking up device just to check register again (we don't do
>> that now as we need someone to confirm/verify that we can really skip
>> reg read). Use 0/1 for ABI as register values does not fit here, imho.
>>
>> There is also a risk that we will report some known failures as <0
>> error codes, while other, captured by hw as just 0.
>>
>>>
>>>> That might be viewed as undesired ABI change.
>>>
>>> grep says it is fine, so I'm happy that no one else cares (as can be
>>> seen by the lack of igt, no one has looked hard enough into how to
>>> distinguish the API failures ;)
>>
>> it's always hard to come out from gray area into solid spec ;)
>>
>>> -Chris