[PATCH v4 5/6] drm/amdgpu/vcn: VCN ras error query support
Ziya, Mohammad zafar
Mohammadzafar.Ziya at amd.com
Mon Mar 28 09:49:48 UTC 2022
[AMD Official Use Only]
Dear Paul,
Comment inline.
Regards,
Zafar
>-----Original Message-----
>From: Paul Menzel <pmenzel at molgen.mpg.de>
>Sent: Monday, March 28, 2022 3:08 PM
>To: Ziya, Mohammad zafar <Mohammadzafar.Ziya at amd.com>; Zhou1, Tao
><Tao.Zhou1 at amd.com>
>Cc: Lazar, Lijo <Lijo.Lazar at amd.com>; amd-gfx at lists.freedesktop.org; Zhang,
>Hawking <Hawking.Zhang at amd.com>
>Subject: Re: [PATCH v4 5/6] drm/amdgpu/vcn: VCN ras error query support
>
>
>Dear Mohammad,
>
>
>Am 28.03.22 um 10:47 schrieb Ziya, Mohammad zafar:
>
>[…]
>
>>> -----Original Message-----
>>> From: Paul Menzel <pmenzel at molgen.mpg.de>
>>> Sent: Monday, March 28, 2022 1:39 PM
>
>>> Am 28.03.22 um 10:00 schrieb Ziya, Mohammad zafar:
>>>
>>> […]
>>>
>>>>> From: Paul Menzel <pmenzel at molgen.mpg.de>
>>>>> Sent: Monday, March 28, 2022 1:22 PM
>
>>>>> Am 28.03.22 um 09:43 schrieb Zhou1, Tao:
>>>>>> -----Original Message-----
>>>>>> From: Ziya, Mohammad zafar <Mohammadzafar.Ziya at amd.com>
>>>>>> Sent: Monday, March 28, 2022 2:25 PM
>>>
>>> […]
>>>
>>>>>> +static uint32_t vcn_v2_6_query_poison_by_instance(struct
>amdgpu_device *adev,
>>>>>> + uint32_t instance, uint32_t sub_block) {
>>>>>> + uint32_t poison_stat = 0, reg_value = 0;
>>>>>> +
>>>>>> + switch (sub_block) {
>>>>>> + case AMDGPU_VCN_V2_6_VCPU_VCODEC:
>>>>>> + reg_value = RREG32_SOC15(VCN, instance,
>mmUVD_RAS_VCPU_VCODEC_STATUS);
>>>>>> + poison_stat = REG_GET_FIELD(reg_value,
>UVD_RAS_VCPU_VCODEC_STATUS, POISONED_PF);
>>>>>> + break;
>>>>>> + default:
>>>>>> + break;
>>>>>> + };
>>>>>> +
>>>>>> + if (poison_stat)
>>>>>> + dev_info(adev->dev, "Poison detected in VCN%d,
>sub_block%d\n",
>>>>>> + instance, sub_block);
>>>>>
>>>>> What should a user do with that information? Faulty hardware, …?
>>>>
>>>> [Mohammad]: This message will help to identify the faulty hardware,
>>>> the hardware ID will also log along with poison, help to identify
>>>> among multiple hardware installed on the system.
>>>
>>> Thank you for clarifying. If it’s indeed faulty hardware, should the
>>> log level be increased to be an error? Keep in mind, that normal
>>> ignorant users (like me) are reading the message, and it’d be great
>>> to guide them a little. They do not know what “Poison“ means I guess.
>Maybe:
>>>
>>> A hardware corruption was found indicating the device might be faulty.
>>> (Poison detected in VCN%d, sub_block%d)\n
>>>
>>> (Keep in mind, I do not know anything about RAS.)
>>
>> [Mohammad]: It is an error condition, but this is just an information
>> message which could have been ignored as well because VCN just
>> consumed the poison, not created.
>
>Sorry, I have never seen these message in `dmesg`, so could you give an
>example log please, what the user would see?
>
[Mohammad]: [ 231.181316] amdgpu 0000:8a:00.0: amdgpu: Poison detected in VCN0, sub_block0
Sample message from amdgpu " [ 237.013029] amdgpu 0000:8a:00.0: amdgpu: HDCP: optional hdcp ta ucode is not available "
>
>Kind regards,
>
>Paul
More information about the amd-gfx
mailing list