[PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler

Deucher, Alexander Alexander.Deucher at amd.com
Fri Feb 4 18:44:52 UTC 2022


[Public]

Seems like this functionality should be moved up into the callers.  Maybe add new IP callbacks (dump_reset_registers()) so that each IP can specify what registers are relevant for a reset debugging and then we can walk the IP list and call the callback before we call the asic_reset callbacks.

Alex

________________________________
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of Deucher, Alexander <Alexander.Deucher at amd.com>
Sent: Friday, February 4, 2022 1:41 PM
To: Sharma, Shashank <Shashank.Sharma at amd.com>; Lazar, Lijo <Lijo.Lazar at amd.com>; amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
Cc: Somalapuram, Amaranath <Amaranath.Somalapuram at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>
Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler


[Public]


[Public]

In the suspend and hibernate cases, we don't care.  In most cases the power rail will be cut once the system enters suspend so it doesn't really matter.  That's why we call the asic reset callback directly rather than going through the whole recovery process.  The device is already quiescent at this point we just want to make sure the device is in a known state when we come out of suspend (in case suspend overall fails).

Alex


________________________________
From: Sharma, Shashank <Shashank.Sharma at amd.com>
Sent: Friday, February 4, 2022 12:22 PM
To: Lazar, Lijo <Lijo.Lazar at amd.com>; amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>
Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Somalapuram, Amaranath <Amaranath.Somalapuram at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>
Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler



On 2/4/2022 6:20 PM, Lazar, Lijo wrote:
> [AMD Official Use Only]
>
> One more thing
>        In suspend-reset case, won't this cause to schedule a work item on suspend? I don't know if that is a good idea, ideally we would like to clean up all work items before going to suspend.
>
> Thanks,
> Lijo

Again, this opens scope for discussion. What if there is a GPU error
during suspend-reset, which is very probable case.

- Shashank

>
> -----Original Message-----
> From: Sharma, Shashank <Shashank.Sharma at amd.com>
> Sent: Friday, February 4, 2022 10:47 PM
> To: Lazar, Lijo <Lijo.Lazar at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Somalapuram, Amaranath <Amaranath.Somalapuram at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>
> Subject: Re: [PATCH 4/4] drm/amdgpu/nv: add navi GPU reset handler
>
>
>
> On 2/4/2022 6:11 PM, Lazar, Lijo wrote:
>> BTW, since this is already providing a set of values it would be useful to provide one more field as the reset reason - RAS error recovery, GPU hung recovery or something else.
>
> Adding this additional parameter instead of blocking something in kernel, seems like a better idea. The app can filter out and read what it is interested into.
>
> - Shashank
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20220204/5970ed71/attachment-0001.htm>


More information about the amd-gfx mailing list