[PATCH 2/2] drm/xe/vf: Don't update GuC reset policy when changing wedged mode
Michal Wajdeczko
michal.wajdeczko at intel.com
Thu Apr 3 16:49:48 UTC 2025
On 03.04.2025 17:17, Laguna, Lukasz wrote:
>
> On 4/3/2025 13:05, Michal Wajdeczko wrote:
>>
>> On 03.04.2025 11:41, Lukasz Laguna wrote:
>>> Prevent the VF from attempting to update the GuC reset policy when
>>> changing the wedged mode, as this operation is not supported for VFs.
>>>
>>> Log a message to indicate that GuC may still cause engine reset even
>>> with wedged_mode=2.
>>>
>>> Signed-off-by: Lukasz Laguna <lukasz.laguna at intel.com>
>>> ---
>>> drivers/gpu/drm/xe/xe_debugfs.c | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/
>>> xe_debugfs.c
>>> index d0503959a8ed..062668d02365 100644
>>> --- a/drivers/gpu/drm/xe/xe_debugfs.c
>>> +++ b/drivers/gpu/drm/xe/xe_debugfs.c
>>> @@ -171,6 +171,11 @@ static ssize_t wedged_mode_set(struct file *f,
>>> const char __user *ubuf,
>>> xe->wedged.mode = wedged_mode;
>>> + if (IS_SRIOV_VF(xe)) {
>>> + drm_info_once(&xe->drm, "VF can't change GuC's engine reset
>>> policy. GuC may still cause engine reset even with wedged_mode=2\n");
>> never use drm_info_once() logs for places where multiple different
>> devices can be reached, as then doing something similar on next device
>> there will be no trace at all
>
> ok
>
>>
>> also should we change xe->wedged.mode if it is N/A for VF?
>
> We can't change engine reset policy, but I wouldn't say that wedged.mode
> is N/A for VFs.
>
> We can still disable it (mode=0), or use mode=2 to easily wedge device
> with simple exec timeout in order to e.g. validate whether driver
> behavior in case of wedged device is correct (all IOCTLs are blocked
> until rebind).
I was referring to the code where first there is an assignment of the
new wedged mode and then we abort and eventually print message that it
wasn't applied - IMO this is wrong
>
>> btw, what is our approach if someone on the PF already set some policy
>> on GuC will not do engine reset? or it is n/a after a VF switch?
>
> It's global policy - engine resets will be disabled for PF and VFs, and
> that's something I would expect.
my question is whether ordinary VF user expects that such policy could
be set differently on different setups (or changed at any time on PF)
maybe for the production PF case we shouldn't allow changing PF wedge
mode (something like we do for EUDBG or CCS mode case in PF w/VFs)
>
>>> + return size;
>> shouldn't we return -EPERM instead?
>
> I returned size on purpose, as it's expected that VF is not able to
> change reset policy.
hmm, but by returning 'size' you're saying "OK"
not that VF can't change it
when read back, there will new wedge mode visible
(well, it won't work as expected, though)
> Still, informed user that GuC may cause engine
> resets even in mode=2. I can add extra info, that if needed, engine
> resets needs to be disabled by PF.
I'm not sure we need detailed messages for that in dmesg
good release notes about 'wedge_mode' should be sufficient
>
>>
>>> + }
>>> +
>>> xe_pm_runtime_get(xe);
>>> for_each_gt(gt, xe, id) {
>>> ret = xe_guc_ads_scheduler_policy_toggle_reset(>-
>>> >uc.guc.ads);
More information about the Intel-xe
mailing list