[PATCH] drm/amdkfd: print unmap queue status for RAS poison consumption (v2)

Zhou1, Tao Tao.Zhou1 at amd.com
Tue Mar 22 02:57:17 UTC 2022


[AMD Official Use Only]



> -----Original Message-----
> From: Paul Menzel <pmenzel at molgen.mpg.de>
> Sent: Monday, March 21, 2022 6:47 PM
> To: Zhou1, Tao <Tao.Zhou1 at amd.com>
> Cc: amd-gfx at lists.freedesktop.org; Zhang, Hawking
> <Hawking.Zhang at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>; Yang,
> Stanley <Stanley.Yang at amd.com>; Chai, Thomas <YiPeng.Chai at amd.com>
> Subject: Re: [PATCH] drm/amdkfd: print unmap queue status for RAS poison
> consumption (v2)
> 
> Dear Tao,
> 
> 
> Thank you for the patch.
> 
> 
> Am 21.03.22 um 10:38 schrieb Tao Zhou:
> > Print the status out when it passes, and also tell user gpu reset is
> > triggered when we fallback to legacy way.
> >
> > v2: make the message more explicitly.
> >
> > Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 11 +++++++----
> >   1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> > index 56902b5bb7b6..32c451f21db7 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> > @@ -105,8 +105,6 @@ static void
> event_interrupt_poison_consumption(struct kfd_dev *dev,
> >   	if (old_poison)
> >   		return;
> >
> > -	pr_warn("RAS poison consumption handling: client id %d\n", client_id);
> > -
> >   	switch (client_id) {
> >   	case SOC15_IH_CLIENTID_SE0SH:
> >   	case SOC15_IH_CLIENTID_SE1SH:
> > @@ -130,10 +128,15 @@ static void
> event_interrupt_poison_consumption(struct kfd_dev *dev,
> >   	/* resetting queue passes, do page retirement without gpu reset
> >   	 * resetting queue fails, fallback to gpu reset solution
> >   	 */
> > -	if (!ret)
> > +	if (!ret) {
> > +		pr_warn("RAS poison consumption, unmap queue flow succeeds:
> client id %d\n",
> > +				client_id);
> 
> succeeded? As it’s a success message, should it be an informational message?

[Tao] thanks, will change to use succeeded before push. Although it reports success, poison consumption is not a usual event.

> 
> >   		amdgpu_amdkfd_ras_poison_consumption_handler(dev->adev,
> false);
> > -	else
> > +	} else {
> > +		pr_warn("RAS poison consumption, fallback to gpu reset flow:
> client
> > +id %d\n",
> 
> Fall back.
> 
> > +				client_id);
> >   		amdgpu_amdkfd_ras_poison_consumption_handler(dev->adev,
> true);
> 
> Could the log be moved somehow to the handler?

[Tao] Could not. Unmap queue isn’t called in the handler and client_id isn't transferred to the handler.

> 
> > +	}
> >   }
> >
> >   static bool event_interrupt_isr_v9(struct kfd_dev *dev,
> 
> Unrelated to the patch, at least I as user, would wish these warnings to be more
> elaborate, telling me, what the problem is, what effects it has, and what to do
> to fix it.

[Tao] It's difficult. You need a document instead of dmesg log to tell you all the details.

> 
> 
> Kind regards,
> 
> Paul


More information about the amd-gfx mailing list