[PATCH v4 5/6] drm/amdgpu/vcn: VCN ras error query support

Ziya, Mohammad zafar Mohammadzafar.Ziya at amd.com
Mon Mar 28 08:47:29 UTC 2022


[AMD Official Use Only]

Dear Paul,

Thank you for review. 
Comment added inline.

Regards,
Zafar

>-----Original Message-----
>From: Paul Menzel <pmenzel at molgen.mpg.de>
>Sent: Monday, March 28, 2022 1:39 PM
>To: Ziya, Mohammad zafar <Mohammadzafar.Ziya at amd.com>; Zhou1, Tao
><Tao.Zhou1 at amd.com>
>Cc: Lazar, Lijo <Lijo.Lazar at amd.com>; amd-gfx at lists.freedesktop.org; Zhang,
>Hawking <Hawking.Zhang at amd.com>
>Subject: Re: [PATCH v4 5/6] drm/amdgpu/vcn: VCN ras error query support
>
>Dear Mohammad,
>
>
>Am 28.03.22 um 10:00 schrieb Ziya, Mohammad zafar:
>
>[…]
>
>>> From: Paul Menzel <pmenzel at molgen.mpg.de>
>>> Sent: Monday, March 28, 2022 1:22 PM
>
>[…]
>
>>> Tao, it’s hard to find your reply in the quote, as the message is not
>>> quoted correctly (> prepended). Is it possible to use a different email
>client?
>>>
>>>
>>> Am 28.03.22 um 09:43 schrieb Zhou1, Tao:
>>>> -----Original Message-----
>>>> From: Ziya, Mohammad zafar <Mohammadzafar.Ziya at amd.com>
>>>> Sent: Monday, March 28, 2022 2:25 PM
>
>[…]
>
>>>> +static uint32_t vcn_v2_6_query_poison_by_instance(struct
>amdgpu_device *adev,
>>>> +			uint32_t instance, uint32_t sub_block) {
>>>> +	uint32_t poison_stat = 0, reg_value = 0;
>>>> +
>>>> +	switch (sub_block) {
>>>> +	case AMDGPU_VCN_V2_6_VCPU_VCODEC:
>>>> +		reg_value = RREG32_SOC15(VCN, instance,
>mmUVD_RAS_VCPU_VCODEC_STATUS);
>>>> +		poison_stat = REG_GET_FIELD(reg_value,
>UVD_RAS_VCPU_VCODEC_STATUS, POISONED_PF);
>>>> +		break;
>>>> +	default:
>>>> +		break;
>>>> +	};
>>>> +
>>>> +	if (poison_stat)
>>>> +		dev_info(adev->dev, "Poison detected in VCN%d,
>sub_block%d\n",
>>>> +			instance, sub_block);
>>>
>>> What should a user do with that information? Faulty hardware, …?
>>
>> [Mohammad]: This message will help to identify the faulty hardware,
>> the hardware ID will also log along with poison, help to identify
>> among multiple hardware installed on the system.
>
>Thank you for clarifying. If it’s indeed faulty hardware, should the log level be
>increased to be an error? Keep in mind, that normal ignorant users (like me)
>are reading the message, and it’d be great to guide them a little. They do not
>know what “Poison“ means I guess. Maybe:
>
>A hardware corruption was found indicating the device might be faulty.
>(Poison detected in VCN%d, sub_block%d)\n
>
>(Keep in mind, I do not know anything about RAS.)

[Mohammad]: It is an error condition, but this is just an information message which could have been ignored as well because VCN just consumed the poison, not created.

>
>>>> +
>>>> +	return poison_stat;
>>>> +}
>>>> +
>>>> +static bool vcn_v2_6_query_poison_status(struct amdgpu_device
>*adev) {
>>>> +	uint32_t inst, sub;
>>>> +	uint32_t poison_stat = 0;
>>>> +
>>>> +	for (inst = 0; inst < adev->vcn.num_vcn_inst; inst++)
>>>> +		for (sub = 0; sub < AMDGPU_VCN_V2_6_MAX_SUB_BLOCK;
>sub++)
>>>> +			poison_stat +=
>>>> +			vcn_v2_6_query_poison_by_instance(adev, inst,
>sub);
>>>> +
>>>> +	return poison_stat ? true : false;
>>>>
>>>> [Tao] just want to confirm the logic, if the POISONED_PF of one
>>>> instance is 1
>>> and another is 0, the poison_stat is true, correct?
>>>> Can we add a print message for this case? Same question to JPEG.
>>
>> [Mohammad]: Message will be printed on function block ahead of the
>function.
>>
>>> Is the `dev_info` message in `vcn_v2_6_query_poison_by_instance()`
>>> doing what you want?
>>>
>>> Also, instead of `poison_stat ? true : false;` a common pattern is
>>> `!!poison_stat` I believe.
>
>[…]
>
>> [Mohammad]: Noted the change. Will make to return !!poison_stat ? true
>> : false;
>
>No, just
>
>     return !!poison_stat;

[Mohammad]: Noted. I realized !!poison_stat  is enough after sending the reply.
>
>[…]
>
>
>Kind regards,
>
>Paul


More information about the amd-gfx mailing list