[PATCH 1/2] drm: Add GPU reset sysfs event

Christian König ckoenig.leichtzumerken at gmail.com
Wed Mar 9 08:02:25 UTC 2022


Am 08.03.22 um 16:37 schrieb Somalapuram, Amaranath:
>
> On 3/8/2022 5:26 PM, Sharma, Shashank wrote:
>>
>>
>> On 3/8/2022 11:32 AM, Christian König wrote:
>>> Am 08.03.22 um 10:31 schrieb Sharma, Shashank:
>>>>
>>>>
>>>> On 3/8/2022 8:06 AM, Christian König wrote:
>>>>> Am 07.03.22 um 17:26 schrieb Shashank Sharma:
>>>>>> From: Shashank Sharma <shashank.sharma at amd.com>
>>>>>>
>>>>>> This patch adds a new sysfs event, which will indicate
>>>>>> the userland about a GPU reset, and can also provide
>>>>>> some information like:
>>>>>> - which PID was involved in the GPU reset
>>>>>> - what was the GPU status (using flags)
>>>>>>
>>>>>> This patch also introduces the first flag of the flags
>>>>>> bitmap, which can be appended as and when required.
>>>>>
>>>>> Make sure to CC the dri-devel mailing list when reviewing this.
>>>> Got it,
>>>>
>>>> I was also curious if we want to move the reset_ctx structure 
>>>> itself to DRM layer, like
>>>> drm_reset_event_ctx {
>>>>     u32 pid;
>>>>     u32 flags;
>>>>     char process_name[64];
>>>> };
>>>
>>> I was entertaining that thought as well.
>>>
>>> But if we do this I would go even a step further and also move the 
>>> reset work item into the DRM layer as well.
>>>
>>> You might also look like into migrating the exiting i915 code which 
>>> uses udev to signal GPU resets to this function as well.
>>>
> Hi Christian,
>
> Can we access adev in common drm (even if we can access adev it will 
> not be common code)

No, that won't work.

>
> move work function to drm need to be protected(i.e reset_domain->sem), 
> adding something like reset_sem to drm_device?

Hui what? Why do you think the work function needs any protection? That 
doesn't make much sense.

Regards,
Christian.

>
> Regards,
>
> S.Amarnath
>
>>> Regards,
>>> Christian.
>>
>> That seems like a good idea, let me quickly dive into i915 and check 
>> this out.
>>
>> Shashank
>>>
>>>>
>>>> and then:
>>>> void drm_sysfs_reset_event(struct drm_device *dev, 
>>>> drm_reset_event_ctx *ctx);
>>>>
>>>>>
>>>>>>
>>>>>> Cc: Alexandar Deucher <alexander.deucher at amd.com>
>>>>>> Cc: Christian Koenig <christian.koenig at amd.com>
>>>>>> Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>>>>>   include/drm/drm_sysfs.h     |  3 +++
>>>>>>   2 files changed, 27 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_sysfs.c 
>>>>>> b/drivers/gpu/drm/drm_sysfs.c
>>>>>> index 430e00b16eec..52a015161431 100644
>>>>>> --- a/drivers/gpu/drm/drm_sysfs.c
>>>>>> +++ b/drivers/gpu/drm/drm_sysfs.c
>>>>>> @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct 
>>>>>> drm_device *dev)
>>>>>>   }
>>>>>>   EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>>>>> +/**
>>>>>> + * drm_sysfs_reset_event - generate a DRM uevent to indicate GPU 
>>>>>> reset
>>>>>> + * @dev: DRM device
>>>>>> + * @pid: The process ID involve with the reset
>>>>>> + * @flags: Any other information about the GPU status
>>>>>> + *
>>>>>> + * Send a uevent for the DRM device specified by @dev. This 
>>>>>> indicates
>>>>>> + * user that a GPU reset has occurred, so that the interested 
>>>>>> client
>>>>>> + * can take any recovery or profiling measure, when required.
>>>>>> + */
>>>>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, 
>>>>>> uint32_t flags)
>>>>>
>>>>> The PID is usually only 32bit, but even better would be to use pid_t.
>>>>>
>>>>>> +{
>>>>>> +    unsigned char pid_str[21], flags_str[15];
>>>>>> +    unsigned char reset_str[] = "RESET=1";
>>>>>> +    char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>>>>> +
>>>>>> +    DRM_DEBUG("generating reset event\n");
>>>>>> +
>>>>>> +    snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>>>>> +    snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", flags);
>>>>>> + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>>>>> +
>>>>>>   /**
>>>>>>    * drm_sysfs_connector_hotplug_event - generate a DRM uevent 
>>>>>> for any connector
>>>>>>    * change
>>>>>> diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>>>>> index 6273cac44e47..63f00fe8054c 100644
>>>>>> --- a/include/drm/drm_sysfs.h
>>>>>> +++ b/include/drm/drm_sysfs.h
>>>>>> @@ -2,6 +2,8 @@
>>>>>>   #ifndef _DRM_SYSFS_H_
>>>>>>   #define _DRM_SYSFS_H_
>>>>>> +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>>>>
>>>>> Probably better to define that the other way around, e.g. 
>>>>> DRM_GPU_RESET_FLAG_VRAM_LOST.
>>>>>
>>>>> Apart from that looks good to me.
>>>>>
>>>> Got it, noted.
>>>> - Shashank
>>>>
>>>>> Christian.
>>>>>
>>>>>> +
>>>>>>   struct drm_device;
>>>>>>   struct device;
>>>>>>   struct drm_connector;
>>>>>> @@ -11,6 +13,7 @@ int drm_class_device_register(struct device *dev);
>>>>>>   void drm_class_device_unregister(struct device *dev);
>>>>>>   void drm_sysfs_hotplug_event(struct drm_device *dev);
>>>>>> +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t pid, 
>>>>>> uint32_t reset_flags);
>>>>>>   void drm_sysfs_connector_hotplug_event(struct drm_connector 
>>>>>> *connector);
>>>>>>   void drm_sysfs_connector_status_event(struct drm_connector 
>>>>>> *connector,
>>>>>>                         struct drm_property *property);
>>>>>
>>>



More information about the amd-gfx mailing list