[PATCH 1/2] drm: Add GPU reset sysfs event

Somalapuram, Amaranath asomalap at amd.com
Tue Mar 8 17:04:30 UTC 2022


On 3/8/2022 10:27 PM, Sharma, Shashank wrote:
>
>
> On 3/8/2022 5:55 PM, Andrey Grodzovsky wrote:
>> You can read on their side here - 
>> https://www.phoronix.com/scan.php?page=news_item&px=AMD-STB-Linux-5.17 
>> and see their patch. THey don't have as clean
>> interface as we do to retrieve the buffer and currently it's 
>> hard-coded for debugfs dump but it looks like pretty straight forward 
>> to expose their buffer to external
>> client like amdgpu.
>
Customer requirement is to get reset notification for there daemon with 
other info (like PID process name vram status).

Regards,
S.Amarnath
> Noted, thanks for the pointer.
> - Shashank
>>
>> Andrey
>>
>> On 2022-03-08 11:46, Sharma, Shashank wrote:
>>> I have a very limited understanding of PMC driver and its 
>>> interfaces, so I would just go ahead and rely on Andrey's 
>>> judgement/recommendation on this :)
>>>
>>> - Shashank
>>>
>>> On 3/8/2022 5:39 PM, Andrey Grodzovsky wrote:
>>>> As long as PMC driver provides clear interface to retrieve the info 
>>>> there should be no issue to call either amdgpu interface or PMC 
>>>> interface using IS_APU (or something alike in the code)
>>>> We probably should add a wrapper function around this logic in amdgpu.
>>>>
>>>> Andrey
>>>>
>>>> On 2022-03-08 11:36, Lazar, Lijo wrote:
>>>>>
>>>>> [AMD Official Use Only]
>>>>>
>>>>>
>>>>> +Mario
>>>>>
>>>>> I guess that means the functionality needs to be present in amdgpu 
>>>>> for APUs also. Presently, this is taken care by PMC driver for APUs.
>>>>>
>>>>> Thanks,
>>>>> Lijo
>>>>> ------------------------------------------------------------------------ 
>>>>>
>>>>> *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf 
>>>>> of Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>>>>> *Sent:* Tuesday, March 8, 2022 9:55:03 PM
>>>>> *To:* Shashank Sharma <contactshashanksharma at gmail.com>; 
>>>>> amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
>>>>> *Cc:* Deucher, Alexander <Alexander.Deucher at amd.com>; Somalapuram, 
>>>>> Amaranath <Amaranath.Somalapuram at amd.com>; Koenig, Christian 
>>>>> <Christian.Koenig at amd.com>; Sharma, Shashank 
>>>>> <Shashank.Sharma at amd.com>
>>>>> *Subject:* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
>>>>>
>>>>> On 2022-03-07 11:26, Shashank Sharma wrote:
>>>>> > From: Shashank Sharma <shashank.sharma at amd.com>
>>>>> >
>>>>> > This patch adds a new sysfs event, which will indicate
>>>>> > the userland about a GPU reset, and can also provide
>>>>> > some information like:
>>>>> > - which PID was involved in the GPU reset
>>>>> > - what was the GPU status (using flags)
>>>>> >
>>>>> > This patch also introduces the first flag of the flags
>>>>> > bitmap, which can be appended as and when required.
>>>>>
>>>>>
>>>>> I am reminding again about another important piece of info which 
>>>>> you can add
>>>>> here and that is Smart Trace Buffer dump [1]. The buffer size is HW
>>>>> specific but
>>>>> from what I see there is no problem to just amend it as part of 
>>>>> envp[]
>>>>> initialization.
>>>>> bellow.
>>>>>
>>>>> The interface to get the buffer is smu_stb_collect_info and usage 
>>>>> can be
>>>>> seen from
>>>>> frebugfs interface in smu_stb_debugfs_open
>>>>>
>>>>> [1] - 
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0 
>>>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Famd-gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&reserved=0> 
>>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>> >
>>>>> > Cc: Alexandar Deucher <alexander.deucher at amd.com>
>>>>> > Cc: Christian Koenig <christian.koenig at amd.com>
>>>>> > Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
>>>>> > ---
>>>>> >   drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
>>>>> >   include/drm/drm_sysfs.h     |  3 +++
>>>>> >   2 files changed, 27 insertions(+)
>>>>> >
>>>>> > diff --git a/drivers/gpu/drm/drm_sysfs.c 
>>>>> b/drivers/gpu/drm/drm_sysfs.c
>>>>> > index 430e00b16eec..52a015161431 100644
>>>>> > --- a/drivers/gpu/drm/drm_sysfs.c
>>>>> > +++ b/drivers/gpu/drm/drm_sysfs.c
>>>>> > @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct 
>>>>> drm_device *dev)
>>>>> >   }
>>>>> >   EXPORT_SYMBOL(drm_sysfs_hotplug_event);
>>>>> >
>>>>> > +/**
>>>>> > + * drm_sysfs_reset_event - generate a DRM uevent to indicate 
>>>>> GPU reset
>>>>> > + * @dev: DRM device
>>>>> > + * @pid: The process ID involve with the reset
>>>>> > + * @flags: Any other information about the GPU status
>>>>> > + *
>>>>> > + * Send a uevent for the DRM device specified by @dev. This 
>>>>> indicates
>>>>> > + * user that a GPU reset has occurred, so that the interested 
>>>>> client
>>>>> > + * can take any recovery or profiling measure, when required.
>>>>> > + */
>>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t 
>>>>> pid, uint32_t flags)
>>>>> > +{
>>>>> > +     unsigned char pid_str[21], flags_str[15];
>>>>> > +     unsigned char reset_str[] = "RESET=1";
>>>>> > +     char *envp[] = { reset_str, pid_str, flags_str, NULL };
>>>>> > +
>>>>> > +     DRM_DEBUG("generating reset event\n");
>>>>> > +
>>>>> > +     snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
>>>>> > +     snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u", 
>>>>> flags);
>>>>> > + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, envp);
>>>>> > +}
>>>>> > +EXPORT_SYMBOL(drm_sysfs_reset_event);
>>>>> > +
>>>>> >   /**
>>>>> >    * drm_sysfs_connector_hotplug_event - generate a DRM uevent 
>>>>> for any connector
>>>>> >    * change
>>>>> > diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
>>>>> > index 6273cac44e47..63f00fe8054c 100644
>>>>> > --- a/include/drm/drm_sysfs.h
>>>>> > +++ b/include/drm/drm_sysfs.h
>>>>> > @@ -2,6 +2,8 @@
>>>>> >   #ifndef _DRM_SYSFS_H_
>>>>> >   #define _DRM_SYSFS_H_
>>>>> >
>>>>> > +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
>>>>> > +
>>>>> >   struct drm_device;
>>>>> >   struct device;
>>>>> >   struct drm_connector;
>>>>> > @@ -11,6 +13,7 @@ int drm_class_device_register(struct device 
>>>>> *dev);
>>>>> >   void drm_class_device_unregister(struct device *dev);
>>>>> >
>>>>> >   void drm_sysfs_hotplug_event(struct drm_device *dev);
>>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t 
>>>>> pid, uint32_t reset_flags);
>>>>> >   void drm_sysfs_connector_hotplug_event(struct drm_connector 
>>>>> *connector);
>>>>> >   void drm_sysfs_connector_status_event(struct drm_connector 
>>>>> *connector,
>>>>> >                                      struct drm_property 
>>>>> *property);


More information about the amd-gfx mailing list