[PATCH v2 1/2] drm: Add GPU reset sysfs event

Sharma, Shashank shashank.sharma at amd.com
Thu Mar 10 19:14:06 UTC 2022



On 3/10/2022 7:33 PM, Abhinav Kumar wrote:
> 
> 
> On 3/10/2022 9:40 AM, Rob Clark wrote:
>> On Thu, Mar 10, 2022 at 9:19 AM Sharma, Shashank
>> <shashank.sharma at amd.com> wrote:
>>>
>>>
>>>
>>> On 3/10/2022 6:10 PM, Rob Clark wrote:
>>>> On Thu, Mar 10, 2022 at 8:21 AM Sharma, Shashank
>>>> <shashank.sharma at amd.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 3/10/2022 4:24 PM, Rob Clark wrote:
>>>>>> On Thu, Mar 10, 2022 at 1:55 AM Christian König
>>>>>> <ckoenig.leichtzumerken at gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Am 09.03.22 um 19:12 schrieb Rob Clark:
>>>>>>>> On Tue, Mar 8, 2022 at 11:40 PM Shashank Sharma
>>>>>>>> <contactshashanksharma at gmail.com> wrote:
>>>>>>>>> From: Shashank Sharma <shashank.sharma at amd.com>
>>>>>>>>>
>>>>>>>>> This patch adds a new sysfs event, which will indicate
>>>>>>>>> the userland about a GPU reset, and can also provide
>>>>>>>>> some information like:
>>>>>>>>> - process ID of the process involved with the GPU reset
>>>>>>>>> - process name of the involved process
>>>>>>>>> - the GPU status info (using flags)
>>>>>>>>>
>>>>>>>>> This patch also introduces the first flag of the flags
>>>>>>>>> bitmap, which can be appended as and when required.
>>>>>>>> Why invent something new, rather than using the already existing 
>>>>>>>> devcoredump?
>>>>>>>
>>>>>>> Yeah, that's a really valid question.
>>>>>>>
>>>>>>>> I don't think we need (or should encourage/allow) something drm
>>>>>>>> specific when there is already an existing solution used by both 
>>>>>>>> drm
>>>>>>>> and non-drm drivers.  Userspace should not have to learn to support
>>>>>>>> yet another mechanism to do the same thing.
>>>>>>>
>>>>>>> Question is how is userspace notified about new available core 
>>>>>>> dumps?
>>>>>>
>>>>>> I haven't looked into it too closely, as the CrOS userspace
>>>>>> crash-reporter already had support for devcoredump, so it "just
>>>>>> worked" out of the box[1].  I believe a udev event is what triggers
>>>>>> the crash-reporter to go read the devcore dump out of sysfs.
>>>>>
>>>>> I had a quick look at the devcoredump code, and it doesn't look like
>>>>> that is sending an event to the user, so we still need an event to
>>>>> indicate a GPU reset.
>>>>
>>>> There definitely is an event to userspace, I suspect somewhere down
>>>> the device_add() path?
>>>>
>>>
>>> Let me check that out as well, hope that is not due to a driver-private
>>> event for GPU reset, coz I think I have seen some of those in a few DRM
>>> drivers.
>>>
>>
>> Definitely no driver private event for drm/msm .. I haven't dug
>> through it all but this is the collector for devcoredump, triggered
>> somehow via udev.  Most likely from event triggered by device_add()
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fchromium.googlesource.com%2Fchromiumos%2Fplatform2%2F%2B%2FHEAD%2Fcrash-reporter%2Fudev_collector.cc&data=04%7C01%7Cshashank.sharma%40amd.com%7C86146416b717420501fc08da02c4785b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637825340130157925%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=LncI%2F5mIpeG1Avj2YXLmbZ5f1ONUfpf6TzJZH3%2Fs8%2Fw%3D&reserved=0 
>>
> 
> Yes, that is correct. the uevent for devcoredump is from device_add()
> 
Yes, I could confirm in the code that device_add() sends a uevent.

kobject_uevent(&dev->kobj, KOBJ_ADD);

I was trying to map the ChromiumOs's udev event rules with the event 
being sent from device_add(), what I could see is there is only one udev 
rule for any DRM subsystem events in ChromiumOs's 99-crash-reporter.rules:

ACTION=="change", SUBSYSTEM=="drm", KERNEL=="card0", ENV{ERROR}=="1", \
   RUN+="/sbin/crash_reporter 
--udev=KERNEL=card0:SUBSYSTEM=drm:ACTION=change"

Can someone confirm that this is the rule which gets triggered when a 
devcoredump is generated ? I could not find an ERROR=1 string in the 
env[] while sending this event from dev_add();

- Shashank

> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftorvalds%2Flinux%2Fblob%2Fmaster%2Fdrivers%2Fbase%2Fcore.c%23L3340&data=04%7C01%7Cshashank.sharma%40amd.com%7C86146416b717420501fc08da02c4785b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637825340130157925%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=5HyWYZ5ZWYz4mUPWeTW51QFdoY0NlA50Nbj1dAC6os4%3D&reserved=0 
> 
> 
>>
>> BR,
>> -R


More information about the dri-devel mailing list