[Intel-gfx] [PATCH v2] drm/i915/huc: Don't try to check HuC status if it's not loaded

Tue May 21 11:04:12 UTC 2019

There are two users of I915_PARAM_HUC_STATUS, the intel-media driver and 
the legacy intel-vaapi-driver.

Both drivers check the return value like this:

     has_huc = !! ret_value;

So the ABI change will break the existing stack because negative values 
are treated as 1, won't it?

On 5/20/2019 8:16 PM, Michal Wajdeczko wrote:

> On Mon, 20 May 2019 12:44:43 +0200, Chris Wilson 
> <chris at chris-wilson.co.uk> wrote:
>
>> Quoting Michal Wajdeczko (2019-05-20 11:24:37)
>>> On Mon, 20 May 2019 11:35:26 +0200, Chris Wilson
>>> <chris at chris-wilson.co.uk> wrote:
>>>
>>> > Quoting Michal Wajdeczko (2019-05-19 22:50:43)
>>> >> If we never attempted to load HuC firmware, or we already wedged
>>> >> or reset GuC/HuC, then there is no reason to wake up the device
>>> >> to check one bit in the register that will be for sure cleared.
>>> >>
>>> >> v2: check if HuC was enabled; subtle change in ABI
>>> >>     reuse hus_is_load helper
>>> >>
>>> >> Suggested-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
>>> >> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>> >> Cc: Tony Ye <tony.ye at intel.com>
>>> >> ---
>>> >>  drivers/gpu/drm/i915/intel_huc.c | 20 ++++++++++++--------
>>> >>  drivers/gpu/drm/i915/intel_huc.h |  5 +++++
>>> >>  2 files changed, 17 insertions(+), 8 deletions(-)
>>> >>
>>> >> diff --git a/drivers/gpu/drm/i915/intel_huc.c
>>> >> b/drivers/gpu/drm/i915/intel_huc.c
>>> >> index 1ff1fb015e58..bfdebec1cfc8 100644
>>> >> --- a/drivers/gpu/drm/i915/intel_huc.c
>>> >> +++ b/drivers/gpu/drm/i915/intel_huc.c
>>> >> @@ -113,7 +113,7 @@ int intel_huc_auth(struct intel_huc *huc)
>>> >>         u32 status;
>>> >>         int ret;
>>> >>
>>> >> -       if (huc->fw.load_status != INTEL_UC_FIRMWARE_SUCCESS)
>>> >> +       if (!intel_huc_is_loaded(huc))
>>> >>                 return -ENOEXEC;
>>> >>
>>> >>         ret = intel_guc_auth_huc(guc,
>>> >> @@ -150,21 +150,25 @@ int intel_huc_auth(struct intel_huc *huc)
>>> >>   * This function reads status register to verify if HuC
>>> >>   * firmware was successfully loaded.
>>> >>   *
>>> >> - * Returns: 1 if HuC firmware is loaded and verified,
>>> >> - * 0 if HuC firmware is not loaded and -ENODEV if HuC
>>> >> - * is not present on this platform.
>>> >> + * Returns: 1 if HuC firmware is loaded and verified, 0 if HuC
>>> >> firmware is not
>>> >> + * enabled, -ENOPKG if HuC firmware is not loaded and -ENODEV if 
>>> HuC
>>> >> is not
>>> >> + * present on this platform.
>>> >>   */
>>> >>  int intel_huc_check_status(struct intel_huc *huc)
>>> >>  {
>>> >>         struct drm_i915_private *dev_priv = huc_to_i915(huc);
>>> >>         intel_wakeref_t wakeref;
>>> >> -       bool status = false;
>>> >> +       bool verified = false;
>>> >>
>>> >>         if (!HAS_HUC(dev_priv))
>>> >>                 return -ENODEV;
>>> >>
>>> >> -       with_intel_runtime_pm(dev_priv, wakeref)
>>> >> -               status = I915_READ(HUC_STATUS2) & HUC_FW_VERIFIED;
>>> >> +       if (!USES_HUC(dev_priv))
>>> >> +               return 0;
>>> >
>>> > Hmm, EOPNOTSUPP here for the user disabled case?
>>>
>>> I'm not sure that user disabled case shall be reported as error,
>>> since it perfectly fits into legacy ABI 0="not loaded".
>>> The only small change is that now 0="not loaded (not enabled by user)"
>>
>> The only requirement for the intel-media driver seems to "HUC_STATUS 
>> > 0".
>> That's our only user, so I think we have the liberty to redefine <=0 as
>> befits error reporting.
>>
>>>
>>> >
>>> > Then
>>> >
>>> > if (!intel_huc_is_loaded(huc))
>>> >       return -ENOPKG;
>>> >
>>> > bool verified = false;
>>> > with_intel_runtime_pm(dev_priv, wakeref)
>>> >       verified = I915_READ(HUC_STATUS2) & HUC_FW_VERIFIED;
>>> > return verified.
>>> >
>>> > That keeps it a bit flatter with the special casing separate and 
>>> the 0/1
>>> > result as given by the register status. Then if negative, we know 
>>> one of
>>> > the preconditions for using HuC is not available, and if 0 
>>> something went
>>> > wrong in mmio setup.
>>> >
>>> > Does that make sense?
>>>
>>> IMHO, if something went wrong it is better to report it as error rather
>>> then weak 0 status. Compare:
>>
>> I disagree, <0 is the weak case. 0 is the HW reports something went
>> wrong (and only that). 1 is the HW reports all went well.
>
> But I was assuming that we're defining ABI here, not simply exposing 
> unified
> register value. Since we want 1 to report all ok, and we can fail for 
> many
> reasons (no hw, no fw blob, wrong fw, bad signature, ...) we should at 
> least
> try to match all these failures into error code (today just ENOPKG, 
> but later
> we can extend into ENOEXEC=bad fw, ENOSPC=can't fit into WOPCM, ...).
>
> On other hand, at least for me, user decision to disable HuC should not
> be treated the same way as other real failures, so 0 seems like 
> perfect match.
>
> Then we'll have: 1 = enabled, 0 = disabled, <0 = problem
>
>>
>> If we have more ways we can report the HW went wrong, sure expand that
>> into extra error codes, but I don't see that. And if it's in the regs,
>> why are not exporting them via reg_read_ioctl?
>>
>>> +---------------+-------+-------+-------+
>>>   no hardware     -ENODEV -ENODEV -ENODEV
>>>   fw disabled        0       0    -ENOTSUP
>>>   fw not loaded      0    -ENOPKG -ENOPKG
>>>   fw not verified*   0    -ENOPKG    0
>>>   fw verified*       1       1       1
>>> +---------------+-------+-------+-------+
>>> *) registry access
>>>
>>> Note that in your case, fw verification problem can be reported both
>>> as -ENOPKG or as 0 (former when verification fails at load time, latter
>>> as result of runtime reg read). Very likely we will never see 0 at all,
>>> since probably that can't change once fw was verified at load time.
>>
>> I see more distinction for ENODEV/ENOTSUP/ENOPKG/0/1. I am not getting
>> your point that we have conflation here, as to me it looks more
>
> Note that we are verifying HuC firmware right after it was loaded into
> hardware using exactly the same register as here (see intel_huc_auth),
> and if this fails we immediately change fw status to FAIL, so we never
> go beyond huc_is_loaded point.
>
> Assuming that once fw is verified it can't fail (to be verified later)
> then your codes will be only ENODEV/ENOTSUP/ENOPKG/1 (no 0 value)
>
>> distinct, and it is clear when we report the value as given by the
>> register and when we give an interpreted value for a driver error.
>
> Note that your initial idea was to reuse driver maintained fw status
> instead of waking up device just to check register again (we don't do
> that now as we need someone to confirm/verify that we can really skip
> reg read). Use 0/1 for ABI as register values does not fit here, imho.
>
> There is also a risk that we will report some known failures as <0
> error codes, while other, captured by hw as just 0.
>
>>
>>> That might be viewed as undesired ABI change.
>>
>> grep says it is fine, so I'm happy that no one else cares (as can be
>> seen by the lack of igt, no one has looked hard enough into how to
>> distinguish the API failures ;)
>
> it's always hard to come out from gray area into solid spec ;)
>
>> -Chris