[Intel-xe] [PATCH 2/2] drm/xe/uc: Add GuC/HuC firmware path overrides

John Harrison john.c.harrison at intel.com
Tue Sep 12 00:30:44 UTC 2023


On 9/11/2023 17:22, Daniele Ceraolo Spurio wrote:
> On 9/11/2023 5:16 PM, John Harrison wrote:
>> On 9/11/2023 17:10, Daniele Ceraolo Spurio wrote:
>>> On 9/11/2023 5:00 PM, John Harrison wrote:
>>>> On 8/2/2023 14:26, Daniele Ceraolo Spurio wrote:
>>>>> When testing a new binary and/or debugging binary-related issues, 
>>>>> it is
>>>>> useful to have the option to change which binary is loaded without
>>>>> having to update and re-compile the kernel. To support this 
>>>>> option, this
>>>>> patch adds 2 new modparams to override the FW path for GuC and 
>>>>> HuC. The
>>>>> HuC modparam can also be set to an empty string to disable HuC 
>>>>> loading.
>>>> Why the difference between GuC & HuC? And is 'disabled' 
>>>> significantly different to just setting it to a non-existent file?
>>>
>>> There is the global switch (the one you commented on in the previous 
>>> patch) that can be used to turn off all the FWs. I see no reason for 
>>> wanting an additional switch to turn off just the GuC, as the other 
>>> FWs won't work without it.
>>>
>>> Also, Xe aborts the driver load if any FW file is not found, so yes 
>>> disabled is different than not found.
>> Hmm. Given the firmware paths are 'unsafe' module parameters, I would 
>> argue that it is better to keep the code simple and if the user wants 
>> to shoot themselves in the foot then that is their problem. That is, 
>> just treat all firmware file overrides exactly the same. And if the 
>> driver fails to load because the user has requested an unsupported 
>> configuration then who cares.
>
> I think allowing the override for GuC would actually make things more 
> complicated. The code assumes that the GuC is enabled if the flag is 
> set and that all the GuC features can be used, so having the GuC being 
> disabled under the hood via this override might actually make the 
> driver panic, which would be bad even if the parameter is unsafe. Not 
> allowing disabling the GuC is an extra half-line of code, which IMO is 
> overall simpler and cleaner compared to either risking panics or 
> having to secure all the paths against a missing GuC.
>
Hmm. Xe seems woefully fault intolerant and fragile.

> Daniele
>
>>
>> John.
>>
>>>
>>>>
>>>>>
>>>>> Note that those modparams only take effect on platforms where we 
>>>>> already
>>>>> have a default FW, so we're sure there is support for FW loading 
>>>>> and the
>>>>> kernel isn't going to explode in an undefined path.
>>>>>
>>>>> Signed-off-by: Daniele Ceraolo Spurio 
>>>>> <daniele.ceraolospurio at intel.com>
>>>>> Cc: John Harrison <John.C.Harrison at Intel.com>
>>>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/xe/xe_module.c | 10 ++++++++++
>>>>>   drivers/gpu/drm/xe/xe_module.h |  2 ++
>>>>>   drivers/gpu/drm/xe/xe_uc_fw.c  | 30 +++++++++++++++++++++++++++++-
>>>>>   3 files changed, 41 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_module.c 
>>>>> b/drivers/gpu/drm/xe/xe_module.c
>>>>> index de85494e2280..0660017c3e83 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_module.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_module.c
>>>>> @@ -30,6 +30,16 @@ int xe_guc_log_level = 5;
>>>>>   module_param_named(guc_log_level, xe_guc_log_level, int, 0600);
>>>>>   MODULE_PARM_DESC(guc_log_level, "GuC firmware logging level 
>>>>> (0=disable, 1..5=enable with verbosity min..max)");
>>>>>   +char *xe_guc_firmware_path = NULL;
>>>>> +module_param_named_unsafe(guc_firmware_path, 
>>>>> xe_guc_firmware_path, charp, 0400);
>>>>> +MODULE_PARM_DESC(guc_firmware_path,
>>>>> +         "GuC firmware path to use instead of the default one");
>>>>> +
>>>>> +char *xe_huc_firmware_path = NULL;
>>>>> +module_param_named_unsafe(huc_firmware_path, 
>>>>> xe_huc_firmware_path, charp, 0400);
>>>>> +MODULE_PARM_DESC(huc_firmware_path,
>>>>> +         "HuC firmware path to use instead of the default one - 
>>>>> empty string disables");
>>>>> +
>>>>>   char *xe_param_force_probe = CONFIG_DRM_XE_FORCE_PROBE;
>>>>>   module_param_named_unsafe(force_probe, xe_param_force_probe, 
>>>>> charp, 0400);
>>>>>   MODULE_PARM_DESC(force_probe,
>>>>> diff --git a/drivers/gpu/drm/xe/xe_module.h 
>>>>> b/drivers/gpu/drm/xe/xe_module.h
>>>>> index 2c1f9199f909..e1da1e9ca5cb 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_module.h
>>>>> +++ b/drivers/gpu/drm/xe/xe_module.h
>>>>> @@ -10,4 +10,6 @@ extern bool force_execlist;
>>>>>   extern bool enable_display;
>>>>>   extern u32 xe_force_vram_bar_size;
>>>>>   extern int xe_guc_log_level;
>>>>> +extern char *xe_guc_firmware_path;
>>>>> +extern char *xe_huc_firmware_path;
>>>>>   extern char *xe_param_force_probe;
>>>>> diff --git a/drivers/gpu/drm/xe/xe_uc_fw.c 
>>>>> b/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>> index fd53ef9e5c99..ef873e894714 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_uc_fw.c
>>>>> @@ -15,6 +15,7 @@
>>>>>   #include "xe_gt.h"
>>>>>   #include "xe_map.h"
>>>>>   #include "xe_mmio.h"
>>>>> +#include "xe_module.h"
>>>>>   #include "xe_uc_fw.h"
>>>>>     /*
>>>>> @@ -209,6 +210,30 @@ uc_fw_auto_select(struct xe_device *xe, 
>>>>> struct xe_uc_fw *uc_fw)
>>>>>       }
>>>>>   }
>>>>>   +static void
>>>>> +uc_fw_override(struct xe_uc_fw *uc_fw)
>>>>> +{
>>>>> +    char *path_override = NULL;
>>>>> +
>>>>> +    /* empty string disables, but it's not allowed for GuC */
>>>>> +    switch (uc_fw->type) {
>>>>> +    case XE_UC_FW_TYPE_GUC:
>>>>> +        if (xe_guc_firmware_path && *xe_guc_firmware_path)
>>>>> +            path_override = xe_guc_firmware_path;
>>>>> +        break;
>>>>> +    case XE_UC_FW_TYPE_HUC:
>>>>> +        path_override = xe_huc_firmware_path;
>>>>> +        break;
>>>>> +    default:
>>>>> +        break;
>>>>> +    }
>>>>> +
>>>>> +    if (path_override) {
>>>>> +        uc_fw->path = path_override;
>>>>> +        uc_fw->user_overridden = true;
>>>>> +    }
>>>>> +}
>>>>> +
>>>>>   /**
>>>>>    * xe_uc_fw_copy_rsa - copy fw RSA to buffer
>>>>>    *
>>>>> @@ -346,7 +371,10 @@ int xe_uc_fw_init(struct xe_uc_fw *uc_fw)
>>>>>       if (!xe_uc_fw_is_supported(uc_fw))
>>>>>           return 0;
>>>>>   -    if (!xe_device_guc_submission_enabled(xe)) {
>>>>> +    uc_fw_override(uc_fw);
>>>>> +
>>>>> +    /* user can provide an empty string via the override to 
>>>>> disable */
>>>> But only for HuC?
>>>
>>> And for GSC once I get around to add support for that. I wanted to 
>>> keep the sentence general here, since there is no visibility of 
>>> which FW it is and there is already a proper description of the 
>>> override in the modparam definition, but I can add a "non-GuC FWs" 
>>> at the end if you think it makes things clearer.
Maybe just go with 'an empty path means the firmware is disabled'. Given 
that the actual anti-GuC check is not here but up above, it doesn't 
really make sense to include that restriction in the comment here. And 
it doesn't actually matter whether the empty string came from a user or 
from a compile time edit or somewhere else entirely.

My question was really more about the test below being for 
'guc_submission_enabled' which logically has no connection to HuC or GSC.

With a simplified comment:
Reviewed-by: John Harrison <John.C.Harrison at Intel.com>


>>>
>>> Daniele
>>>
>>>>
>>>> John.
>>>>
>>>>> +    if (!xe_device_guc_submission_enabled(xe) || !(*uc_fw->path)) {
>>>>>           xe_uc_fw_change_status(uc_fw, XE_UC_FIRMWARE_DISABLED);
>>>>>           drm_dbg(&xe->drm, "%s disabled", 
>>>>> xe_uc_fw_type_repr(uc_fw->type));
>>>>>           return 0;
>>>>
>>>
>>
>



More information about the Intel-xe mailing list