[PATCH] drm/amdkfd: print unmap queue status for RAS poison consumption (v2)
Felix Kuehling
felix.kuehling at amd.com
Tue Mar 22 14:05:09 UTC 2022
Am 2022-03-21 um 23:17 schrieb Zhou1, Tao:
> [AMD Official Use Only]
>
>
>
>> -----Original Message-----
>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>> Sent: Monday, March 21, 2022 7:21 PM
>> To: Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org; Zhang,
>> Hawking <Hawking.Zhang at amd.com>; Kuehling, Felix
>> <Felix.Kuehling at amd.com>; Yang, Stanley <Stanley.Yang at amd.com>; Chai,
>> Thomas <YiPeng.Chai at amd.com>
>> Subject: Re: [PATCH] drm/amdkfd: print unmap queue status for RAS poison
>> consumption (v2)
>>
>>
>>
>> On 3/21/2022 3:08 PM, Tao Zhou wrote:
>>> Print the status out when it passes, and also tell user gpu reset is
>>> triggered when we fallback to legacy way.
>>>
>>> v2: make the message more explicitly.
>>>
>>> Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 11 +++++++----
>>> 1 file changed, 7 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
>>> index 56902b5bb7b6..32c451f21db7 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
>>> @@ -105,8 +105,6 @@ static void
>> event_interrupt_poison_consumption(struct kfd_dev *dev,
>>> if (old_poison)
>>> return;
>>>
>>> - pr_warn("RAS poison consumption handling: client id %d\n", client_id);
>>> -
>>> switch (client_id) {
>>> case SOC15_IH_CLIENTID_SE0SH:
>>> case SOC15_IH_CLIENTID_SE1SH:
>>> @@ -130,10 +128,15 @@ static void
>> event_interrupt_poison_consumption(struct kfd_dev *dev,
>>> /* resetting queue passes, do page retirement without gpu reset
>>> * resetting queue fails, fallback to gpu reset solution
>>> */
>>> - if (!ret)
>>> + if (!ret) {
>>> + pr_warn("RAS poison consumption, unmap queue flow succeeds:
>> client id %d\n",
>>> + client_id);
>> As discussed in another patch, I understand that pr_* is the legacy usage in the
>> file. But it won't be helpful for this case with multiple devices. Would suggest to
>> change to dev_info() - the message here and below seems informational about
>> the handling of this situation rather than warning of something bad.
>>
>> Thanks,
>> Lijo
> [Tao] I'll replace pr_warn with dev_info. I think we need a dedicated cleanup to retire all pr format message in amdgpu.
> RAS poison consumption is a special event should be paid attention to, I think a waning is also reasonable.
Or you could make the "unmap success" case a dev_info and the "gpu
reset" case a dev_warn.
Either way, v3 of your patch looks good to me and is
Acked-by: Felix Kuehling <Felix.Kuehling at amd.com>
Regards,
Felix
>
>>> amdgpu_amdkfd_ras_poison_consumption_handler(dev->adev,
>> false);
>>> - else
>>> + } else {
>>> + pr_warn("RAS poison consumption, fallback to gpu reset flow:
>> client id %d\n",
>>> + client_id);
>>> amdgpu_amdkfd_ras_poison_consumption_handler(dev->adev,
>> true);
>>> + }
>>> }
>>>
>>> static bool event_interrupt_isr_v9(struct kfd_dev *dev,
>>>
More information about the amd-gfx
mailing list