On 3/10/2022 8:35 PM, Rob Clark wrote:
On Thu, Mar 10, 2022 at 11:14 AM Sharma, Shashank shashank.sharma@amd.com wrote:
On 3/10/2022 7:33 PM, Abhinav Kumar wrote:
On 3/10/2022 9:40 AM, Rob Clark wrote:
On Thu, Mar 10, 2022 at 9:19 AM Sharma, Shashank shashank.sharma@amd.com wrote:
On 3/10/2022 6:10 PM, Rob Clark wrote:
On Thu, Mar 10, 2022 at 8:21 AM Sharma, Shashank shashank.sharma@amd.com wrote: > > > > On 3/10/2022 4:24 PM, Rob Clark wrote: >> On Thu, Mar 10, 2022 at 1:55 AM Christian König >> ckoenig.leichtzumerken@gmail.com wrote: >>> >>> >>> >>> Am 09.03.22 um 19:12 schrieb Rob Clark: >>>> On Tue, Mar 8, 2022 at 11:40 PM Shashank Sharma >>>> contactshashanksharma@gmail.com wrote: >>>>> From: Shashank Sharma shashank.sharma@amd.com >>>>> >>>>> This patch adds a new sysfs event, which will indicate >>>>> the userland about a GPU reset, and can also provide >>>>> some information like: >>>>> - process ID of the process involved with the GPU reset >>>>> - process name of the involved process >>>>> - the GPU status info (using flags) >>>>> >>>>> This patch also introduces the first flag of the flags >>>>> bitmap, which can be appended as and when required. >>>> Why invent something new, rather than using the already existing >>>> devcoredump? >>> >>> Yeah, that's a really valid question. >>> >>>> I don't think we need (or should encourage/allow) something drm >>>> specific when there is already an existing solution used by both >>>> drm >>>> and non-drm drivers. Userspace should not have to learn to support >>>> yet another mechanism to do the same thing. >>> >>> Question is how is userspace notified about new available core >>> dumps? >> >> I haven't looked into it too closely, as the CrOS userspace >> crash-reporter already had support for devcoredump, so it "just >> worked" out of the box[1]. I believe a udev event is what triggers >> the crash-reporter to go read the devcore dump out of sysfs. > > I had a quick look at the devcoredump code, and it doesn't look like > that is sending an event to the user, so we still need an event to > indicate a GPU reset.
There definitely is an event to userspace, I suspect somewhere down the device_add() path?
Let me check that out as well, hope that is not due to a driver-private event for GPU reset, coz I think I have seen some of those in a few DRM drivers.
Definitely no driver private event for drm/msm .. I haven't dug through it all but this is the collector for devcoredump, triggered somehow via udev. Most likely from event triggered by device_add()
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fchromium.g...
Yes, that is correct. the uevent for devcoredump is from device_add()
Yes, I could confirm in the code that device_add() sends a uevent.
kobject_uevent(&dev->kobj, KOBJ_ADD);
I was trying to map the ChromiumOs's udev event rules with the event being sent from device_add(), what I could see is there is only one udev rule for any DRM subsystem events in ChromiumOs's 99-crash-reporter.rules:
ACTION=="change", SUBSYSTEM=="drm", KERNEL=="card0", ENV{ERROR}=="1", \ RUN+="/sbin/crash_reporter --udev=KERNEL=card0:SUBSYSTEM=drm:ACTION=change"
Can someone confirm that this is the rule which gets triggered when a devcoredump is generated ? I could not find an ERROR=1 string in the env[] while sending this event from dev_add();
I think it is actually this rule:
ACTION=="add", SUBSYSTEM=="devcoredump", \ RUN+="/sbin/crash_reporter --udev=SUBSYSTEM=devcoredump:ACTION=add:KERNEL_NUMBER=%n"
It is something non-drm specific because it supports devcore dumps from non drm drivers. I know at least some of the wifi and remoteproc drivers use it.
Ah, this seems like a problem for me. I understand it will work for a reset/recovery app well, but if a DRM client (like a compositor), who wants to listen only to DRM events (like a GPU reset), wouldn't this create a lot of noise for it ? Like every time any subsystem produces this coredump, there will be a change in devcoresump subsystem, and the client will have to parse the core file, and then will have to decide if it wants to react to this, or ignore.
Wouldn't a GPU reset event, specific to DRM subsystem server better in such case ?
- Shashank
BR, -R