[PATCH 1/2] drm: Add GPU reset sysfs event

Sharma, Shashank shashank.sharma at amd.com
Tue Mar 8 11:56:42 UTC 2022



On 3/8/2022 11:32 AM, Christian König wrote:
> Am 08.03.22 um 10:31 schrieb Sharma, Shashank:
>>
>>
>> On 3/8/2022 8:06 AM, Christian König wrote:
>>> Am 07.03.22 um 17:26 schrieb Shashank Sharma:
>>>> From: Shashank Sharma <shashank.sharma at amd.com>
>>>>
>>>> This patch adds a new sysfs event, which will indicate
>>>> the userland about a GPU reset, and can also provide
>>>> some information like:
>>>> - which PID was involved in the GPU reset
>>>> - what was the GPU status (using flags)
>>>>
>>>> This patch also introduces the first flag of the flags
>>>> bitmap, which can be appended as and when required.
>>>
>>> Make sure to CC the dri-devel mailing list when reviewing this.
>> Got it,
>>
>> I was also curious if we want to move the reset_ctx structure itself 
>> to DRM layer, like
>> drm_reset_event_ctx {
>>     u32 pid;
>>     u32 flags;
>>     char process_name[64];
>> };
> 
> I was entertaining that thought as well.
> 
> But if we do this I would go even a step further and also move the reset 
> work item into the DRM layer as well.
> 
> You might also look like into migrating the exiting i915 code which uses 
> udev to signal GPU resets to this function as well.
> 
> Regards,
> Christian.

That seems like a good idea, let me quickly dive into i915 and check 
this out.

Shashank
> 
>>
>> and then:
>> void drm_sysfs_reset_event(struct drm_device *dev, drm_reset_event_ctx 
>> *ctx);
>>
>>>
>>>>
>>>> Cc: Alexandar Deucher <alexander.deucher at amd.com>
>>>> Cc: Christian Koenig <christian.koenig at amd.com>
>>>> Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
>>>> ---
>>>>   drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>>>   include/drm/drm_sysfs.h     |  3 +++
>>>>   2 files changed, 27 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
>>>> index 430e00b16eec..52a015161431 100644
>>>> --- a/drivers/gpu/drm/drm_sysfs.c
>>>> +++ b/drivers/gpu/drm/drm_sysfs.c
>>>> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct drm_device 
>>>> *dev)
>>>>   }
>>>>   EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>>> +/**
>>>> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
>>>> + * @dev: DRM device
>>>> + * @pid: The process ID involve with the reset
>>>> + * @flags: Any other information about the GPU status
>>>> + *
>>>> + * Send a uevent for the DRM device specified by @dev. This indicates
>>>> + * user that a GPU reset has occurred, so that the interested client
>>>> + * can take any recovery or profiling measure, when required.
>>>> + */
>>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, 
>>>> uint32_t flags)
>>>
>>> The PID is usually only 32bit, but even better would be to use pid_t.
>>>
>>>> +{
>>>> +    unsigned char pid_str[21], flags_str[15];
>>>> +    unsigned char reset_str[] = "RESET=1";
>>>> +    char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>>> +
>>>> +    DRM_DEBUG("generating reset event\n");
>>>> +
>>>> +    snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>>> +    snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>>>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>>> +}
>>>> +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>>> +
>>>>   /**
>>>>    * drm_sysfs_connector_hotplug_event - generate a DRM uevent for 
>>>> any connector
>>>>    * change
>>>> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>>> index 6273cac44e47..63f00fe8054c 100644
>>>> --- a/include/drm/drm_sysfs.h
>>>> +++ b/include/drm/drm_sysfs.h
>>>> @@ -2,6 +2,8 @@
>>>>   #ifndef _DRM_SYSFS_H_
>>>>   #define _DRM_SYSFS_H_
>>>> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>>
>>> Probably better to define that the other way around, e.g. 
>>> DRM_GPU_RESET_FLAG_VRAM_LOST.
>>>
>>> Apart from that looks good to me.
>>>
>> Got it, noted.
>> - Shashank
>>
>>> Christian.
>>>
>>>> +
>>>>   struct drm_device;
>>>>   struct device;
>>>>   struct drm_connector;
>>>> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>>>>   void drm_class_device_unregister(struct device *dev);
>>>>   void drm_sysfs_hotplug_event(struct drm_device *dev);
>>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, 
>>>> uint32_t reset_flags);
>>>>   void drm_sysfs_connector_hotplug_event(struct drm_connector 
>>>> *connector);
>>>>   void drm_sysfs_connector_status_event(struct drm_connector 
>>>> *connector,
>>>>                         struct drm_property *property);
>>>
> 


More information about the amd-gfx mailing list