[PATCH 1/2] drm: Add GPU reset sysfs event

Limonciello, Mario Mario.Limonciello at amd.com
Tue Mar 8 18:10:48 UTC 2022


[AMD Official Use Only]



> -----Original Message-----
> From: Sharma, Shashank <Shashank.Sharma at amd.com>
> Sent: Tuesday, March 8, 2022 12:09
> To: Limonciello, Mario <Mario.Limonciello at amd.com>; Grodzovsky, Andrey
> <Andrey.Grodzovsky at amd.com>; Lazar, Lijo <Lijo.Lazar at amd.com>; Shashank
> Sharma <contactshashanksharma at gmail.com>; amd-
> gfx at lists.freedesktop.org; S-k, Shyam-sundar <Shyam-sundar.S-k at amd.com>;
> Goswami, Sanket <Sanket.Goswami at amd.com>
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Somalapuram,
> Amaranath <Amaranath.Somalapuram at amd.com>; Koenig, Christian
> <Christian.Koenig at amd.com>
> Subject: Re: [PATCH 1/2] drm: Add GPU reset sysfs event
> 
> Hey Mario,
> 
> On 3/8/2022 6:27 PM, Limonciello, Mario wrote:
> > On 3/8/2022 10:57, Sharma, Shashank wrote:
> >>
> >>
> >> On 3/8/2022 5:55 PM, Andrey Grodzovsky wrote:
> >>> You can read on their side here -
> >>> https://www.phoronix.com/scan.php?page=news_item&px=AMD-STB-
> Linux-5.17 and
> >>> see their patch. THey don't have as clean
> >>> interface as we do to retrieve the buffer and currently it's
> >>> hard-coded for debugfs dump but it looks like pretty straight forward
> >>> to expose their buffer to external
> >>> client like amdgpu.
> 
> >>
> >> Noted, thanks for the pointer.
> >> - Shashank
> >
> > Yeah I think this is a great idea for APU, but need to keep in mind
> > amd-pmc is only activated if APU is configured for s0i3.  So in the
> > IS_APU check you will need to test for set for s0i3 and return an error
> > code if not set.
> >
> > For APU it's currently fetched on demand from debugfs.  If you can
> > "easily" export a symbol I say go for it and include a patch in your
> > series.
> 
> Yep, if the info is available via debugfs, I don't see a reason why
> can't we fetch that directly from its lower layers. May I know the name
> of this debugfs entry ?

It's called "stb_read" from debugfs.

> 
> - Shashank
> 
> If not my suggestion is to stub this out and let Shyam and
> > Sanket fill in the stub after they can sort out the exporting it to
> > another driver.
> >
> >>>
> >>> Andrey
> >>>
> >>> On 2022-03-08 11:46, Sharma, Shashank wrote:
> >>>> I have a very limited understanding of PMC driver and its
> >>>> interfaces, so I would just go ahead and rely on Andrey's
> >>>> judgement/recommendation on this :)
> >>>>
> >>>> - Shashank
> >>>>
> >>>> On 3/8/2022 5:39 PM, Andrey Grodzovsky wrote:
> >>>>> As long as PMC driver provides clear interface to retrieve the info
> >>>>> there should be no issue to call either amdgpu interface or PMC
> >>>>> interface using IS_APU (or something alike in the code)
> >>>>> We probably should add a wrapper function around this logic in amdgpu.
> >>>>>
> >>>>> Andrey
> >>>>>
> >>>>> On 2022-03-08 11:36, Lazar, Lijo wrote:
> >>>>>>
> >>>>>> [AMD Official Use Only]
> >>>>>>
> >>>>>>
> >>>>>> +Mario
> >>>>>>
> >>>>>> I guess that means the functionality needs to be present in amdgpu
> >>>>>> for APUs also. Presently, this is taken care by PMC driver for APUs.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Lijo
> >>>>>> ------------------------------------------------------------------------
> >>>>>>
> >>>>>> *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf
> >>>>>> of Andrey Grodzovsky <andrey.grodzovsky at amd.com>
> >>>>>> *Sent:* Tuesday, March 8, 2022 9:55:03 PM
> >>>>>> *To:* Shashank Sharma <contactshashanksharma at gmail.com>;
> >>>>>> amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
> >>>>>> *Cc:* Deucher, Alexander <Alexander.Deucher at amd.com>;
> Somalapuram,
> >>>>>> Amaranath <Amaranath.Somalapuram at amd.com>; Koenig, Christian
> >>>>>> <Christian.Koenig at amd.com>; Sharma, Shashank
> >>>>>> <Shashank.Sharma at amd.com>
> >>>>>> *Subject:* Re: [PATCH 1/2] drm: Add GPU reset sysfs event
> >>>>>>
> >>>>>> On 2022-03-07 11:26, Shashank Sharma wrote:
> >>>>>> > From: Shashank Sharma <shashank.sharma at amd.com>
> >>>>>> >
> >>>>>> > This patch adds a new sysfs event, which will indicate
> >>>>>> > the userland about a GPU reset, and can also provide
> >>>>>> > some information like:
> >>>>>> > - which PID was involved in the GPU reset
> >>>>>> > - what was the GPU status (using flags)
> >>>>>> >
> >>>>>> > This patch also introduces the first flag of the flags
> >>>>>> > bitmap, which can be appended as and when required.
> >>>>>>
> >>>>>>
> >>>>>> I am reminding again about another important piece of info which
> >>>>>> you can add
> >>>>>> here and that is Smart Trace Buffer dump [1]. The buffer size is HW
> >>>>>> specific but
> >>>>>> from what I see there is no problem to just amend it as part of
> >>>>>> envp[]
> >>>>>> initialization.
> >>>>>> bellow.
> >>>>>>
> >>>>>> The interface to get the buffer is smu_stb_collect_info and usage
> >>>>>> can be
> >>>>>> seen from
> >>>>>> frebugfs interface in smu_stb_debugfs_open
> >>>>>>
> >>>>>> [1] -
> >>>>>>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.sp
> inics.net%2Flists%2Famd-
> gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc
> 3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7
> C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC
> 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&a
> mp;sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&re
> served=0
> >>>>>>
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.s
> pinics.net%2Flists%2Famd-
> gfx%2Fmsg70751.html&data=04%7C01%7Clijo.lazar%40amd.com%7C80bc
> 3f07e2d0441d44a108da012036dc%7C3dd8961fe4884e608e11a82d994e183d%7
> C0%7C0%7C637823535167679490%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC
> 4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&a
> mp;sdata=53l7KlTf%2BICKkZkLVwFh6nRTjkAh%2FDpOat5DRoyKIx0%3D&re
> served=0>
> >>>>>>
> >>>>>>
> >>>>>> Andrey
> >>>>>>
> >>>>>>
> >>>>>> >
> >>>>>> > Cc: Alexandar Deucher <alexander.deucher at amd.com>
> >>>>>> > Cc: Christian Koenig <christian.koenig at amd.com>
> >>>>>> > Signed-off-by: Shashank Sharma <shashank.sharma at amd.com>
> >>>>>> > ---
> >>>>>> >   drivers/gpu/drm/drm_sysfs.c | 24 ++++++++++++++++++++++++
> >>>>>> >   include/drm/drm_sysfs.h     |  3 +++
> >>>>>> >   2 files changed, 27 insertions(+)
> >>>>>> >
> >>>>>> > diff --git a/drivers/gpu/drm/drm_sysfs.c
> >>>>>> b/drivers/gpu/drm/drm_sysfs.c
> >>>>>> > index 430e00b16eec..52a015161431 100644
> >>>>>> > --- a/drivers/gpu/drm/drm_sysfs.c
> >>>>>> > +++ b/drivers/gpu/drm/drm_sysfs.c
> >>>>>> > @@ -409,6 +409,30 @@ void drm_sysfs_hotplug_event(struct
> >>>>>> drm_device *dev)
> >>>>>> >   }
> >>>>>> >   EXPORT_SYMBOL(drm_sysfs_hotplug_event);
> >>>>>> >
> >>>>>> > +/**
> >>>>>> > + * drm_sysfs_reset_event - generate a DRM uevent to indicate
> >>>>>> GPU reset
> >>>>>> > + * @dev: DRM device
> >>>>>> > + * @pid: The process ID involve with the reset
> >>>>>> > + * @flags: Any other information about the GPU status
> >>>>>> > + *
> >>>>>> > + * Send a uevent for the DRM device specified by @dev. This
> >>>>>> indicates
> >>>>>> > + * user that a GPU reset has occurred, so that the interested
> >>>>>> client
> >>>>>> > + * can take any recovery or profiling measure, when required.
> >>>>>> > + */
> >>>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t
> >>>>>> pid, uint32_t flags)
> >>>>>> > +{
> >>>>>> > +     unsigned char pid_str[21], flags_str[15];
> >>>>>> > +     unsigned char reset_str[] = "RESET=1";
> >>>>>> > +     char *envp[] = { reset_str, pid_str, flags_str, NULL };
> >>>>>> > +
> >>>>>> > +     DRM_DEBUG("generating reset event\n");
> >>>>>> > +
> >>>>>> > +     snprintf(pid_str, ARRAY_SIZE(pid_str), "PID=%lu", pid);
> >>>>>> > +     snprintf(flags_str, ARRAY_SIZE(flags_str), "FLAGS=%u",
> >>>>>> flags);
> >>>>>> > + kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
> envp);
> >>>>>> > +}
> >>>>>> > +EXPORT_SYMBOL(drm_sysfs_reset_event);
> >>>>>> > +
> >>>>>> >   /**
> >>>>>> >    * drm_sysfs_connector_hotplug_event - generate a DRM uevent
> >>>>>> for any connector
> >>>>>> >    * change
> >>>>>> > diff --git a/include/drm/drm_sysfs.h b/include/drm/drm_sysfs.h
> >>>>>> > index 6273cac44e47..63f00fe8054c 100644
> >>>>>> > --- a/include/drm/drm_sysfs.h
> >>>>>> > +++ b/include/drm/drm_sysfs.h
> >>>>>> > @@ -2,6 +2,8 @@
> >>>>>> >   #ifndef _DRM_SYSFS_H_
> >>>>>> >   #define _DRM_SYSFS_H_
> >>>>>> >
> >>>>>> > +#define DRM_GPU_RESET_FLAG_VRAM_VALID (1 << 0)
> >>>>>> > +
> >>>>>> >   struct drm_device;
> >>>>>> >   struct device;
> >>>>>> >   struct drm_connector;
> >>>>>> > @@ -11,6 +13,7 @@ int drm_class_device_register(struct device
> >>>>>> *dev);
> >>>>>> >   void drm_class_device_unregister(struct device *dev);
> >>>>>> >
> >>>>>> >   void drm_sysfs_hotplug_event(struct drm_device *dev);
> >>>>>> > +void drm_sysfs_reset_event(struct drm_device *dev, uint64_t
> >>>>>> pid, uint32_t reset_flags);
> >>>>>> >   void drm_sysfs_connector_hotplug_event(struct drm_connector
> >>>>>> *connector);
> >>>>>> >   void drm_sysfs_connector_status_event(struct drm_connector
> >>>>>> *connector,
> >>>>>> >                                      struct drm_property
> >>>>>> *property);
> >


More information about the amd-gfx mailing list