[PATCH 1/2] drm: Add GPU reset sysfs event

Andrey Grodzovsky andrey.grodzovsky at amd.com
Tue Mar 8 16:36:13 UTC 2022


On 2022-03-08 11:35, Sharma, Shashank wrote:
>
>
> On 3/8/2022 5:25 PM, Andrey Grodzovsky wrote:
>>
>> On 2022-03-07 11:26, Shashank Sharma wrote:
>>> From: Shashank Sharma <shashank.sharma at amd.com>
>>>
>>> This patch adds a new sysfs event, which will indicate
>>> the userland about a GPU reset, and can also provide
>>> some information like:
>>> - which PID was involved in the GPU reset
>>> - what was the GPU status (using flags)
>>>
>>> This patch also introduces the first flag of the flags
>>> bitmap, which can be appended as and when required.
>>
>>
>> I am reminding again about another important piece of info which you 
>> can add
>> here and that is Smart Trace Buffer dump [1]. The buffer size is HW 
>> specific but
>> from what I see there is no problem to just amend it as part of 
>> envp[] initialization.
>> bellow.
>>
>> The interface to get the buffer is smu_stb_collect_info and usage can 
>> be seen from
>> frebugfs interface in smu_stb_debugfs_open
>>
>> [1] - https://www.spinics.net/lists/amd-gfx/msg70751.html
>>
>
> Noted Andrey, thank for the reminder. As you can see, this patch is 
> going into DRM layer, so as of now we are accommodating the PID and 
> VRAM validity information, which is common to all the DRM drivers (not 
> only AMDGPU). But as a next step, we will extend this interface to 
> provide driver specific custom data as well,  and that is where we 
> will start digging into STB.
> - Shashank


Got it.

Andrey


>
>> Andrey
>>
>>
>>>
>>> Cc: Alexandar Deucher <alexander.deucher at amd.com>
>>> Cc: Christian Koenig <christian.koenig at amd.com>
>>> Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
>>> ---
>>>   drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>>   include/drm/drm_sysfs.h     |  3 +++
>>>   2 files changed, 27 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
>>> index 430e00b16eec..52a015161431 100644
>>> --- a/drivers/gpu/drm/drm_sysfs.c
>>> +++ b/drivers/gpu/drm/drm_sysfs.c
>>> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct drm_device 
>>> *dev)
>>>   }
>>>   EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>> +/**
>>> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU reset
>>> + * @dev: DRM device
>>> + * @pid: The process ID involve with the reset
>>> + * @flags: Any other information about the GPU status
>>> + *
>>> + * Send a uevent for the DRM device specified by @dev. This indicates
>>> + * user that a GPU reset has occurred, so that the interested client
>>> + * can take any recovery or profiling measure, when required.
>>> + */
>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, 
>>> uint32_t flags)
>>> +{
>>> +    unsigned char pid_str[21], flags_str[15];
>>> +    unsigned char reset_str[] = "RESET=1";
>>> +    char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>> +
>>> +    DRM_DEBUG("generating reset event\n");
>>> +
>>> +    snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>> +    snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>> +}
>>> +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>> +
>>>   /**
>>>    * drm_sysfs_connector_hotplug_event - generate a DRM uevent for 
>>> any connector
>>>    * change
>>> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>> index 6273cac44e47..63f00fe8054c 100644
>>> --- a/include/drm/drm_sysfs.h
>>> +++ b/include/drm/drm_sysfs.h
>>> @@ -2,6 +2,8 @@
>>>   #ifndef _DRM_SYSFS_H_
>>>   #define _DRM_SYSFS_H_
>>> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>> +
>>>   struct drm_device;
>>>   struct device;
>>>   struct drm_connector;
>>> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>>>   void drm_class_device_unregister(struct device *dev);
>>>   void drm_sysfs_hotplug_event(struct drm_device *dev);
>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, 
>>> uint32_t reset_flags);
>>>   void drm_sysfs_connector_hotplug_event(struct drm_connector 
>>> *connector);
>>>   void drm_sysfs_connector_status_event(struct drm_connector 
>>> *connector,
>>>                         struct drm_property *property);


More information about the amd-gfx mailing list