[PATCH v4 5/6] drm/amdgpu/vcn: VCN ras error query support

Paul Menzel pmenzel at molgen.mpg.de
Mon Mar 28 09:37:40 UTC 2022


Dear Mohammad,


Am 28.03.22 um 10:47 schrieb Ziya, Mohammad zafar:

[…]

>> -----Original Message-----
>> From: Paul Menzel <pmenzel at molgen.mpg.de>
>> Sent: Monday, March 28, 2022 1:39 PM

>> Am 28.03.22 um 10:00 schrieb Ziya, Mohammad zafar:
>>
>> […]
>>
>>>> From: Paul Menzel <pmenzel at molgen.mpg.de>
>>>> Sent: Monday, March 28, 2022 1:22 PM

>>>> Am 28.03.22 um 09:43 schrieb Zhou1, Tao:
>>>>> -----Original Message-----
>>>>> From: Ziya, Mohammad zafar <Mohammadzafar.Ziya at amd.com>
>>>>> Sent: Monday, March 28, 2022 2:25 PM
>>
>> […]
>>
>>>>> +static uint32_t vcn_v2_6_query_poison_by_instance(struct amdgpu_device *adev,
>>>>> +			uint32_t instance, uint32_t sub_block) {
>>>>> +	uint32_t poison_stat = 0, reg_value = 0;
>>>>> +
>>>>> +	switch (sub_block) {
>>>>> +	case AMDGPU_VCN_V2_6_VCPU_VCODEC:
>>>>> +		reg_value = RREG32_SOC15(VCN, instance, mmUVD_RAS_VCPU_VCODEC_STATUS);
>>>>> +		poison_stat = REG_GET_FIELD(reg_value, UVD_RAS_VCPU_VCODEC_STATUS, POISONED_PF);
>>>>> +		break;
>>>>> +	default:
>>>>> +		break;
>>>>> +	};
>>>>> +
>>>>> +	if (poison_stat)
>>>>> +		dev_info(adev->dev, "Poison detected in VCN%d, sub_block%d\n",
>>>>> +			instance, sub_block);
>>>>
>>>> What should a user do with that information? Faulty hardware, …?
>>>
>>> [Mohammad]: This message will help to identify the faulty hardware,
>>> the hardware ID will also log along with poison, help to identify
>>> among multiple hardware installed on the system.
>>
>> Thank you for clarifying. If it’s indeed faulty hardware, should the log level be
>> increased to be an error? Keep in mind, that normal ignorant users (like me)
>> are reading the message, and it’d be great to guide them a little. They do not
>> know what “Poison“ means I guess. Maybe:
>>
>> A hardware corruption was found indicating the device might be faulty.
>> (Poison detected in VCN%d, sub_block%d)\n
>>
>> (Keep in mind, I do not know anything about RAS.)
>
> [Mohammad]: It is an error condition, but this is just an information
> message which could have been ignored as well because VCN just
> consumed the poison, not created.

Sorry, I have never seen these message in `dmesg`, so could you give an 
example log please, what the user would see?


Kind regards,

Paul


More information about the amd-gfx mailing list