[Intel-gfx] [PATCH v2] drm/i915/huc: Don't try to check HuC status if it's not loaded
Ye, Tony
tony.ye at intel.com
Tue May 21 11:04:12 UTC 2019
There are two users of I915_PARAM_HUC_STATUS, the intel-media driver and
the legacy intel-vaapi-driver.
Both drivers check the return value like this:
has_huc = !! ret_value;
So the ABI change will break the existing stack because negative values
are treated as 1, won't it?
On 5/20/2019 8:16 PM, Michal Wajdeczko wrote:
> On Mon, 20 May 2019 12:44:43 +0200, Chris Wilson
> <chris at chris-wilson.co.uk> wrote:
>
>> Quoting Michal Wajdeczko (2019-05-20 11:24:37)
>>> On Mon, 20 May 2019 11:35:26 +0200, Chris Wilson
>>> <chris at chris-wilson.co.uk> wrote:
>>>
>>> > Quoting Michal Wajdeczko (2019-05-19 22:50:43)
>>> >> If we never attempted to load HuC firmware, or we already wedged
>>> >> or reset GuC/HuC, then there is no reason to wake up the device
>>> >> to check one bit in the register that will be for sure cleared.
>>> >>
>>> >> v2: check if HuC was enabled; subtle change in ABI
>>> >> reuse hus_is_load helper
>>> >>
>>> >> Suggested-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> >> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
>>> >> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>> >> Cc: Tony Ye <tony.ye at intel.com>
>>> >> ---
>>> >> drivers/gpu/drm/i915/intel_huc.c | 20 ++++++++++++--------
>>> >> drivers/gpu/drm/i915/intel_huc.h | 5 +++++
>>> >> 2 files changed, 17 insertions(+), 8 deletions(-)
>>> >>
>>> >> diff --git a/drivers/gpu/drm/i915/intel_huc.c
>>> >> b/drivers/gpu/drm/i915/intel_huc.c
>>> >> index 1ff1fb015e58..bfdebec1cfc8 100644
>>> >> --- a/drivers/gpu/drm/i915/intel_huc.c
>>> >> +++ b/drivers/gpu/drm/i915/intel_huc.c
>>> >> @@ -113,7 +113,7 @@ int intel_huc_auth(struct intel_huc *huc)
>>> >> u32 status;
>>> >> int ret;
>>> >>
>>> >> - if (huc->fw.load_status != INTEL_UC_FIRMWARE_SUCCESS)
>>> >> + if (!intel_huc_is_loaded(huc))
>>> >> return -ENOEXEC;
>>> >>
>>> >> ret = intel_guc_auth_huc(guc,
>>> >> @@ -150,21 +150,25 @@ int intel_huc_auth(struct intel_huc *huc)
>>> >> * This function reads status register to verify if HuC
>>> >> * firmware was successfully loaded.
>>> >> *
>>> >> - * Returns: 1 if HuC firmware is loaded and verified,
>>> >> - * 0 if HuC firmware is not loaded and -ENODEV if HuC
>>> >> - * is not present on this platform.
>>> >> + * Returns: 1 if HuC firmware is loaded and verified, 0 if HuC
>>> >> firmware is not
>>> >> + * enabled, -ENOPKG if HuC firmware is not loaded and -ENODEV if
>>> HuC
>>> >> is not
>>> >> + * present on this platform.
>>> >> */
>>> >> int intel_huc_check_status(struct intel_huc *huc)
>>> >> {
>>> >> struct drm_i915_private *dev_priv = huc_to_i915(huc);
>>> >> intel_wakeref_t wakeref;
>>> >> - bool status = false;
>>> >> + bool verified = false;
>>> >>
>>> >> if (!HAS_HUC(dev_priv))
>>> >> return -ENODEV;
>>> >>
>>> >> - with_intel_runtime_pm(dev_priv, wakeref)
>>> >> - status = I915_READ(HUC_STATUS2) & HUC_FW_VERIFIED;
>>> >> + if (!USES_HUC(dev_priv))
>>> >> + return 0;
>>> >
>>> > Hmm, EOPNOTSUPP here for the user disabled case?
>>>
>>> I'm not sure that user disabled case shall be reported as error,
>>> since it perfectly fits into legacy ABI 0="not loaded".
>>> The only small change is that now 0="not loaded (not enabled by user)"
>>
>> The only requirement for the intel-media driver seems to "HUC_STATUS
>> > 0".
>> That's our only user, so I think we have the liberty to redefine <=0 as
>> befits error reporting.
>>
>>>
>>> >
>>> > Then
>>> >
>>> > if (!intel_huc_is_loaded(huc))
>>> > return -ENOPKG;
>>> >
>>> > bool verified = false;
>>> > with_intel_runtime_pm(dev_priv, wakeref)
>>> > verified = I915_READ(HUC_STATUS2) & HUC_FW_VERIFIED;
>>> > return verified.
>>> >
>>> > That keeps it a bit flatter with the special casing separate and
>>> the 0/1
>>> > result as given by the register status. Then if negative, we know
>>> one of
>>> > the preconditions for using HuC is not available, and if 0
>>> something went
>>> > wrong in mmio setup.
>>> >
>>> > Does that make sense?
>>>
>>> IMHO, if something went wrong it is better to report it as error rather
>>> then weak 0 status. Compare:
>>
>> I disagree, <0 is the weak case. 0 is the HW reports something went
>> wrong (and only that). 1 is the HW reports all went well.
>
> But I was assuming that we're defining ABI here, not simply exposing
> unified
> register value. Since we want 1 to report all ok, and we can fail for
> many
> reasons (no hw, no fw blob, wrong fw, bad signature, ...) we should at
> least
> try to match all these failures into error code (today just ENOPKG,
> but later
> we can extend into ENOEXEC=bad fw, ENOSPC=can't fit into WOPCM, ...).
>
> On other hand, at least for me, user decision to disable HuC should not
> be treated the same way as other real failures, so 0 seems like
> perfect match.
>
> Then we'll have: 1 = enabled, 0 = disabled, <0 = problem
>
>>
>> If we have more ways we can report the HW went wrong, sure expand that
>> into extra error codes, but I don't see that. And if it's in the regs,
>> why are not exporting them via reg_read_ioctl?
>>
>>> +---------------+-------+-------+-------+
>>> no hardware -ENODEV -ENODEV -ENODEV
>>> fw disabled 0 0 -ENOTSUP
>>> fw not loaded 0 -ENOPKG -ENOPKG
>>> fw not verified* 0 -ENOPKG 0
>>> fw verified* 1 1 1
>>> +---------------+-------+-------+-------+
>>> *) registry access
>>>
>>> Note that in your case, fw verification problem can be reported both
>>> as -ENOPKG or as 0 (former when verification fails at load time, latter
>>> as result of runtime reg read). Very likely we will never see 0 at all,
>>> since probably that can't change once fw was verified at load time.
>>
>> I see more distinction for ENODEV/ENOTSUP/ENOPKG/0/1. I am not getting
>> your point that we have conflation here, as to me it looks more
>
> Note that we are verifying HuC firmware right after it was loaded into
> hardware using exactly the same register as here (see intel_huc_auth),
> and if this fails we immediately change fw status to FAIL, so we never
> go beyond huc_is_loaded point.
>
> Assuming that once fw is verified it can't fail (to be verified later)
> then your codes will be only ENODEV/ENOTSUP/ENOPKG/1 (no 0 value)
>
>> distinct, and it is clear when we report the value as given by the
>> register and when we give an interpreted value for a driver error.
>
> Note that your initial idea was to reuse driver maintained fw status
> instead of waking up device just to check register again (we don't do
> that now as we need someone to confirm/verify that we can really skip
> reg read). Use 0/1 for ABI as register values does not fit here, imho.
>
> There is also a risk that we will report some known failures as <0
> error codes, while other, captured by hw as just 0.
>
>>
>>> That might be viewed as undesired ABI change.
>>
>> grep says it is fine, so I'm happy that no one else cares (as can be
>> seen by the lack of igt, no one has looked hard enough into how to
>> distinguish the API failures ;)
>
> it's always hard to come out from gray area into solid spec ;)
>
>> -Chris
More information about the Intel-gfx
mailing list