[Intel-gfx] [PATCH v5 1/4] drm/i915/guc: Add fetch of hwconfig table

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Fri Feb 25 17:18:51 UTC 2022


On 25/02/2022 16:46, John Harrison wrote:

>>> driver we don't care that much that we failed to load HWconfig and
>>> 'notice' is enough.
>>>
>>> but I'm fine with all messages being drm_err (as we will not have to
>>> change that once again after HWconfig will be mandatory for the driver
>>> as well)
>>
>> I would be against drm_err.
>>
>> #define KERN_EMERG      KERN_SOH "0"    /* system is unusable */
>> #define KERN_ALERT      KERN_SOH "1"    /* action must be taken 
>> immediately */
>> #define KERN_CRIT       KERN_SOH "2"    /* critical conditions */
>> #define KERN_ERR        KERN_SOH "3"    /* error conditions */
>> #define KERN_WARNING    KERN_SOH "4"    /* warning conditions */
>> #define KERN_NOTICE     KERN_SOH "5"    /* normal but significant 
>> condition */
>> #define KERN_INFO       KERN_SOH "6"    /* informational */
>> #define KERN_DEBUG      KERN_SOH "7"    /* debug-level messages */
>>
>> From the point of view of the kernel driver, this is not an error to 
>> its operation. It can at most be a warning, but notice is also fine by 
>> me. One could argue when reading "normal but significant condition" 
>> that it is not normal, when it is in fact unexpected, so if people 
>> prefer warning that is also okay by me. I still lean towards notice 
>> becuase of the hands-off nature i915 has with the pass-through of this 
>> blob.
>  From the point of view of the KMD, i915 will load and be 'functional' 
> if it can't talk to the hardware at all. The UMDs won't work at all but 

Well this reductio ad absurdum fails I think... :)

> the driver load will be 'fine'. That's a requirement to be able to get 
> the user to a software fallback desktop in order to work out why the 
> hardware isn't working (e.g. no GuC firmware file). I would view this as 
> similar. The KMD might have loaded but the UMDs are not functional. That 
> is definitely an error condition to me.

... If GuC fails to load there is no command submission and driver will 
obviously log that with drm_err.

If blob fails to verify it depends on the userspace stack what will 
happen. As stated before on some platforms, and/or after a certain time, 
Mesa will not look at the blob at all. So i915 is fine (it is after all 
just a conduit for opaque data!), system overall is fine, so it 
definitely isn't a KERN_ERR level event.

>>>>>> +               ERR_PTR(ret));
>>>>>> +
>>>>>>        ret = guc_enable_communication(guc);
>>>>>>        if (ret)
>>>>>>            goto err_log_capture;
>>>>>> @@ -562,6 +567,8 @@ static void __uc_fini_hw(struct intel_uc *uc)
>>>>>>        if (intel_uc_uses_guc_submission(uc))
>>>>>>            intel_guc_submission_disable(guc);
>>>>>>    +    intel_guc_hwconfig_fini(&guc->hwconfig);
>>>>>> +
>>>>>>        __uc_sanitize(uc);
>>>>>>    }
>>>>>>    diff --git a/drivers/gpu/drm/i915/i915_pci.c 
>>>>>> b/drivers/gpu/drm/i915/i915_pci.c
>>>>>> index 76e590fcb903..1d31e35a5154 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_pci.c
>>>>>> +++ b/drivers/gpu/drm/i915/i915_pci.c
>>>>>> @@ -990,6 +990,7 @@ static const struct intel_device_info 
>>>>>> adl_p_info = {
>>>>>>            BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VCS0) | 
>>>>>> BIT(VCS2),
>>>>>>        .ppgtt_size = 48,
>>>>>>        .dma_mask_size = 39,
>>>>>> +    .has_guc_hwconfig = 1,
>>>>> Who requested this change? It was previously done this way but the
>>>>> instruction was that i915_pci.c is for hardware features only but that
>>>>> this, as you seem extremely keen about pointing out at every
>>>>> opportunity, is a software feature.
>>>>
>>>> This was requested by Michal as well. I definitely agree it is a
>>>> software feature, but I was not aware that "i915_pci.c is for hardware
>>>> features only".
>>>>
>>>> Michal, do you agree with this and returning to the previous method for
>>>> enabling the feature?
>>>
>>> now I'm little confused as some arch direction was to treat FW as
>>> extension of the HW so for me it was natural to have 'has_guc_hwconfig'
>>> flag in device_info
>>>
>>> if still for some reason it is undesired to mix HW and FW/SW flags
>>> inside single group of flags then maybe we should just add separate
>>> group of immutable flags where has_guc_hwconfig could be defined.
>>>
>>> let our maintainers decide
>>
>> Bah.. :)
>>
>> And what was the previous method?
>>
>> [comes back later]
>>
>> Okay it was:
>>
>> +static bool has_table(struct drm_i915_private *i915)
>> +{
>> +    if (IS_ALDERLAKE_P(i915))
>> +        return true;
>>
>> Which sucks a bit if we want to argue it does not belong in device info.
>>
>> Why can't we ask the GuC if the blob exists? In fact what would happen 
>> if one would call __guc_action_get_hwconfig on any GuC platform?
> That was how I originally wrote the code. However, other parties refuse 
> to allow a H2G call to fail. The underlying CTB layers complain loudly 
> on any CTB error. And the GuC architects insist that a call to query the 
> table on an unsupported platform is an error and should return an 
> 'unsupported' error code.

Oh well, shrug, sounds silly but I will not pretend I am familiar with H2G

In this case has_table does sound better since it indeed isn't a 
hardware feature. It is a GuC fw thing and if we don't have a way to 
probe things there at runtime, then at least limit the knowledge to GuC 
files.

Regards,

Tvrtko


More information about the dri-devel mailing list