[PATCH v4 5/6] drm/amdgpu/vcn: VCN ras error query support

Ziya, Mohammad zafar Mohammadzafar.Ziya at amd.com
Mon Mar 28 09:49:48 UTC 2022


[AMD Official Use Only]

Dear Paul,

Comment inline.

Regards,
Zafar

>-----Original Message-----
>From: Paul Menzel <pmenzel at molgen.mpg.de>
>Sent: Monday, March 28, 2022 3:08 PM
>To: Ziya, Mohammad zafar <Mohammadzafar.Ziya at amd.com>; Zhou1, Tao
><Tao.Zhou1 at amd.com>
>Cc: Lazar, Lijo <Lijo.Lazar at amd.com>; amd-gfx at lists.freedesktop.org; Zhang,
>Hawking <Hawking.Zhang at amd.com>
>Subject: Re: [PATCH v4 5/6] drm/amdgpu/vcn: VCN ras error query support
>
>
>Dear Mohammad,
>
>
>Am 28.03.22 um 10:47 schrieb Ziya, Mohammad zafar:
>
>[…]
>
>>> -----Original Message-----
>>> From: Paul Menzel <pmenzel at molgen.mpg.de>
>>> Sent: Monday, March 28, 2022 1:39 PM
>
>>> Am 28.03.22 um 10:00 schrieb Ziya, Mohammad zafar:
>>>
>>> […]
>>>
>>>>> From: Paul Menzel <pmenzel at molgen.mpg.de>
>>>>> Sent: Monday, March 28, 2022 1:22 PM
>
>>>>> Am 28.03.22 um 09:43 schrieb Zhou1, Tao:
>>>>>> -----Original Message-----
>>>>>> From: Ziya, Mohammad zafar <Mohammadzafar.Ziya at amd.com>
>>>>>> Sent: Monday, March 28, 2022 2:25 PM
>>>
>>> […]
>>>
>>>>>> +static uint32_t vcn_v2_6_query_poison_by_instance(struct
>amdgpu_device *adev,
>>>>>> +			uint32_t instance, uint32_t sub_block) {
>>>>>> +	uint32_t poison_stat = 0, reg_value = 0;
>>>>>> +
>>>>>> +	switch (sub_block) {
>>>>>> +	case AMDGPU_VCN_V2_6_VCPU_VCODEC:
>>>>>> +		reg_value = RREG32_SOC15(VCN, instance,
>mmUVD_RAS_VCPU_VCODEC_STATUS);
>>>>>> +		poison_stat = REG_GET_FIELD(reg_value,
>UVD_RAS_VCPU_VCODEC_STATUS, POISONED_PF);
>>>>>> +		break;
>>>>>> +	default:
>>>>>> +		break;
>>>>>> +	};
>>>>>> +
>>>>>> +	if (poison_stat)
>>>>>> +		dev_info(adev->dev, "Poison detected in VCN%d,
>sub_block%d\n",
>>>>>> +			instance, sub_block);
>>>>>
>>>>> What should a user do with that information? Faulty hardware, …?
>>>>
>>>> [Mohammad]: This message will help to identify the faulty hardware,
>>>> the hardware ID will also log along with poison, help to identify
>>>> among multiple hardware installed on the system.
>>>
>>> Thank you for clarifying. If it’s indeed faulty hardware, should the
>>> log level be increased to be an error? Keep in mind, that normal
>>> ignorant users (like me) are reading the message, and it’d be great
>>> to guide them a little. They do not know what “Poison“ means I guess.
>Maybe:
>>>
>>> A hardware corruption was found indicating the device might be faulty.
>>> (Poison detected in VCN%d, sub_block%d)\n
>>>
>>> (Keep in mind, I do not know anything about RAS.)
>>
>> [Mohammad]: It is an error condition, but this is just an information
>> message which could have been ignored as well because VCN just
>> consumed the poison, not created.
>
>Sorry, I have never seen these message in `dmesg`, so could you give an
>example log please, what the user would see?
>

[Mohammad]: [  231.181316] amdgpu 0000:8a:00.0: amdgpu: Poison detected in VCN0, sub_block0

Sample message from amdgpu " [  237.013029] amdgpu 0000:8a:00.0: amdgpu: HDCP: optional hdcp ta ucode is not available "
>
>Kind regards,
>
>Paul


More information about the amd-gfx mailing list