[PATCH 2/4] drm/xe: Add a helper function to set recovery method
Riana Tauro
riana.tauro at intel.com
Thu Jun 19 07:26:28 UTC 2025
Hi Raag
Thank you for the review comments
On 6/6/2025 8:42 PM, Raag Jadav wrote:
> On Tue, Jun 03, 2025 at 01:43:58PM +0530, Riana Tauro wrote:
>> Add a helper function to set recovery method. The recovery
>> method has to be set before declaring the device wedged and sending the
>> drm wedged uevent. If no method is set, default unbind/re-bind method
>> will be set
>>
>> Signed-off-by: Riana Tauro <riana.tauro at intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_device.c | 30 +++++++++++++++++++++-------
>> drivers/gpu/drm/xe/xe_device.h | 1 +
>> drivers/gpu/drm/xe/xe_device_types.h | 2 ++
>> 3 files changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>> index 660b0c5126dc..3fd604ebdc6e 100644
>> --- a/drivers/gpu/drm/xe/xe_device.c
>> +++ b/drivers/gpu/drm/xe/xe_device.c
>> @@ -1120,16 +1120,28 @@ static void xe_device_wedged_fini(struct drm_device *drm, void *arg)
>> xe_pm_runtime_put(xe);
>> }
>>
>> +/**
>> + * xe_device_set_wedged_method - Set wedged recovery method
>> + * @xe: xe device instance
>
> Missing @method
Missed this. Will fix it>
>> + *
>> + * Set wedged recovery method to be sent using drm wedged uevent.
>> + */
>> +void xe_device_set_wedged_method(struct xe_device *xe, unsigned long method)
>> +{
>> + xe->wedged.method = method;
>> +}
>> +
>> /**
>> * xe_device_declare_wedged - Declare device wedged
>> * @xe: xe device instance
>> *
>> - * This is a final state that can only be cleared with a module
>> - * re-probe (unbind + bind).
>> - * In this state every IOCTL will be blocked so the GT cannot be used.
>> + * This is a final state that can only be cleared with the method specified
>> + * in the drm wedged uevent. The method needs to be set using xe_device_set_wedged_method
>> + * before declaring the device as wedged or the default method of reprobe (unbind/re-bind)
>> + * will be sent. In this state every IOCTL will be blocked so the GT cannot be used.
>
> The file convention seems like 80 characters for kernel doc, so let's
> stick to it.
okay
>
>> * In general it will be called upon any critical error such as gt reset
>> - * failure or guc loading failure. Userspace will be notified of this state
>> - * through device wedged uevent.
>> + * failure or guc loading failure or firmware failure.
>> + * Userspace will be notified of this state through device wedged uevent.
>> * If xe.wedged module parameter is set to 2, this function will be called
>> * on every single execution timeout (a.k.a. GPU hang) right after devcoredump
>> * snapshot capture. In this mode, GT reset won't be attempted so the state of
>> @@ -1152,6 +1164,11 @@ void xe_device_declare_wedged(struct xe_device *xe)
>> return;
>> }
>>
>> + /* If no wedge recovery method is set, use default */
>> + if (!xe->wedged.method)
>> + xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_REBIND
>> + | DRM_WEDGE_RECOVERY_BUS_RESET);
>
> Although there are no strict rules about this, we usually don't begin a
> new line with a symbol.
will fix this
>
>> +
>> if (!atomic_xchg(&xe->wedged.flag, 1)) {
>> xe->needs_flr_on_fini = true;
>> drm_err(&xe->drm,
>> @@ -1161,8 +1178,7 @@ void xe_device_declare_wedged(struct xe_device *xe)
>> dev_name(xe->drm.dev));
>>
>> /* Notify userspace of wedged device */
>> - drm_dev_wedged_event(&xe->drm,
>> - DRM_WEDGE_RECOVERY_REBIND | DRM_WEDGE_RECOVERY_BUS_RESET);
>> + drm_dev_wedged_event(&xe->drm, xe->wedged.method);
>
> I was a bit late to realize it when I originally added this. The event
> call should be after xe_gt_declare_wedged() to comply with wedging rules.
> We notify userspace *after* we're done with driver cleanup.
Will move gt_wedged before uevent
Thanks
Riana
>
> Raag
>
>> }
>>
>> for_each_gt(gt, xe, id)
>> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
>> index 0bc3bc8e6803..06350740aac5 100644
>> --- a/drivers/gpu/drm/xe/xe_device.h
>> +++ b/drivers/gpu/drm/xe/xe_device.h
>> @@ -191,6 +191,7 @@ static inline bool xe_device_wedged(struct xe_device *xe)
>> }
>>
>> void xe_device_declare_wedged(struct xe_device *xe);
>> +void xe_device_set_wedged_method(struct xe_device *xe, unsigned long method);
>>
>> struct xe_file *xe_file_get(struct xe_file *xef);
>> void xe_file_put(struct xe_file *xef);
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>> index b93c04466637..fb3617956d63 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -559,6 +559,8 @@ struct xe_device {
>> atomic_t flag;
>> /** @wedged.mode: Mode controlled by kernel parameter and debugfs */
>> int mode;
>> + /** @wedged.method: Recovery method to be sent in the drm device wedged uevent */
>> + unsigned long method;
>> } wedged;
>>
>> /** @bo_device: Struct to control async free of BOs */
>> --
>> 2.47.1
>>
w
More information about the Intel-xe
mailing list