[Intel-gfx] [PATCH v5 1/4] drm/i915/guc: Add fetch of hwconfig table

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Fri Feb 25 18:35:11 UTC 2022


On 25/02/2022 18:05, Michal Wajdeczko wrote:
> On 25.02.2022 18:18, Tvrtko Ursulin wrote:
>>
>> On 25/02/2022 16:46, John Harrison wrote:
>>
>>>>> driver we don't care that much that we failed to load HWconfig and
>>>>> 'notice' is enough.
>>>>>
>>>>> but I'm fine with all messages being drm_err (as we will not have to
>>>>> change that once again after HWconfig will be mandatory for the driver
>>>>> as well)
>>>>
>>>> I would be against drm_err.
>>>>
>>>> #define KERN_EMERG      KERN_SOH "0"    /* system is unusable */
>>>> #define KERN_ALERT      KERN_SOH "1"    /* action must be taken
>>>> immediately */
>>>> #define KERN_CRIT       KERN_SOH "2"    /* critical conditions */
>>>> #define KERN_ERR        KERN_SOH "3"    /* error conditions */
>>>> #define KERN_WARNING    KERN_SOH "4"    /* warning conditions */
>>>> #define KERN_NOTICE     KERN_SOH "5"    /* normal but significant
>>>> condition */
>>>> #define KERN_INFO       KERN_SOH "6"    /* informational */
>>>> #define KERN_DEBUG      KERN_SOH "7"    /* debug-level messages */
>>>>
>>>>  From the point of view of the kernel driver, this is not an error to
>>>> its operation. It can at most be a warning, but notice is also fine
>>>> by me. One could argue when reading "normal but significant
>>>> condition" that it is not normal, when it is in fact unexpected, so
>>>> if people prefer warning that is also okay by me. I still lean
>>>> towards notice becuase of the hands-off nature i915 has with the
>>>> pass-through of this blob.
>>>   From the point of view of the KMD, i915 will load and be 'functional'
>>> if it can't talk to the hardware at all. The UMDs won't work at all but
>>
>> Well this reductio ad absurdum fails I think... :)
>>
>>> the driver load will be 'fine'. That's a requirement to be able to get
>>> the user to a software fallback desktop in order to work out why the
>>> hardware isn't working (e.g. no GuC firmware file). I would view this
>>> as similar. The KMD might have loaded but the UMDs are not functional.
>>> That is definitely an error condition to me.
>>
>> ... If GuC fails to load there is no command submission and driver will
>> obviously log that with drm_err.
>>
>> If blob fails to verify it depends on the userspace stack what will
>> happen. As stated before on some platforms, and/or after a certain time,
>> Mesa will not look at the blob at all. So i915 is fine (it is after all
>> just a conduit for opaque data!), system overall is fine, so it
>> definitely isn't a KERN_ERR level event.
>>
>>>>>>>> +               ERR_PTR(ret));
>>>>>>>> +
>>>>>>>>         ret = guc_enable_communication(guc);
>>>>>>>>         if (ret)
>>>>>>>>             goto err_log_capture;
>>>>>>>> @@ -562,6 +567,8 @@ static void __uc_fini_hw(struct intel_uc *uc)
>>>>>>>>         if (intel_uc_uses_guc_submission(uc))
>>>>>>>>             intel_guc_submission_disable(guc);
>>>>>>>>     +    intel_guc_hwconfig_fini(&guc->hwconfig);
>>>>>>>> +
>>>>>>>>         __uc_sanitize(uc);
>>>>>>>>     }
>>>>>>>>     diff --git a/drivers/gpu/drm/i915/i915_pci.c
>>>>>>>> b/drivers/gpu/drm/i915/i915_pci.c
>>>>>>>> index 76e590fcb903..1d31e35a5154 100644
>>>>>>>> --- a/drivers/gpu/drm/i915/i915_pci.c
>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_pci.c
>>>>>>>> @@ -990,6 +990,7 @@ static const struct intel_device_info
>>>>>>>> adl_p_info = {
>>>>>>>>             BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VCS0) |
>>>>>>>> BIT(VCS2),
>>>>>>>>         .ppgtt_size = 48,
>>>>>>>>         .dma_mask_size = 39,
>>>>>>>> +    .has_guc_hwconfig = 1,
>>>>>>> Who requested this change? It was previously done this way but the
>>>>>>> instruction was that i915_pci.c is for hardware features only but
>>>>>>> that
>>>>>>> this, as you seem extremely keen about pointing out at every
>>>>>>> opportunity, is a software feature.
>>>>>>
>>>>>> This was requested by Michal as well. I definitely agree it is a
>>>>>> software feature, but I was not aware that "i915_pci.c is for hardware
>>>>>> features only".
>>>>>>
>>>>>> Michal, do you agree with this and returning to the previous method
>>>>>> for
>>>>>> enabling the feature?
>>>>>
>>>>> now I'm little confused as some arch direction was to treat FW as
>>>>> extension of the HW so for me it was natural to have 'has_guc_hwconfig'
>>>>> flag in device_info
>>>>>
>>>>> if still for some reason it is undesired to mix HW and FW/SW flags
>>>>> inside single group of flags then maybe we should just add separate
>>>>> group of immutable flags where has_guc_hwconfig could be defined.
>>>>>
>>>>> let our maintainers decide
>>>>
>>>> Bah.. :)
>>>>
>>>> And what was the previous method?
>>>>
>>>> [comes back later]
>>>>
>>>> Okay it was:
>>>>
>>>> +static bool has_table(struct drm_i915_private *i915)
>>>> +{
>>>> +    if (IS_ALDERLAKE_P(i915))
>>>> +        return true;
>>>>
>>>> Which sucks a bit if we want to argue it does not belong in device info.
>>>>
>>>> Why can't we ask the GuC if the blob exists? In fact what would
>>>> happen if one would call __guc_action_get_hwconfig on any GuC platform?
>>> That was how I originally wrote the code. However, other parties
>>> refuse to allow a H2G call to fail. The underlying CTB layers complain
>>> loudly on any CTB error. And the GuC architects insist that a call to
>>> query the table on an unsupported platform is an error and should
>>> return an 'unsupported' error code.
> 
> just for the background: IIRC the reason why arch didn't want 'query'
> mode that wont fail was that HWconfig is not optional on these selected
> platforms, so driver shall know up-front on which platforms it shall be
> working (and driver shall even fail if blob is not available)
> 
>>
>> Oh well, shrug, sounds silly but I will not pretend I am familiar with H2G
>>
>> In this case has_table does sound better since it indeed isn't a
>> hardware feature. It is a GuC fw thing and if we don't have a way to
>> probe things there at runtime, then at least limit the knowledge to GuC
>> files.
> 
> note that arch expectation is that driver itself shall be using this
> hwconfig (ie. will decode required data). while current approach is that
> driver acts only as a proxy to UMD, this has to change and refactoring
> will start (likely) soon. therefore flag has_guc_hwconfig fits better
> IMHO as this wont be just 'GuC' fw thing.

There we go, Jordan's validation gets a new unexpected use. :D

In this case it should maybe be like:

#define I915_NEEDS_GUC_HWCONFIG(i915) (IS_PLATFORM(xxx) || VER >= nnn)

Since it is still not a hardware feature like other device info flags.

Regards,

Tvrtko


More information about the Intel-gfx mailing list