[PATCH 1/2] drm/amdgpu: Warn when bad pages approaches 90% threshold

Felix Kuehling felix.kuehling at amd.com
Thu Oct 21 17:18:59 UTC 2021


Am 2021-10-21 um 12:35 p.m. schrieb Luben Tuikov:
> On 2021-10-21 11:57, Kent Russell wrote:
>> dmesg doesn't warn when the number of bad pages approaches the
>> threshold for page retirement. WARN when the number of bad pages
>> is at 90% or greater for easier checks and planning, instead of waiting
>> until the GPU is full of bad pages.
>>
>> Cc: Luben Tuikov <luben.tuikov at amd.com>
>> Cc: Mukul Joshi <Mukul.Joshi at amd.com>
>> Signed-off-by: Kent Russell <kent.russell at amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
>> index f4c05ff4b26c..ce5089216474 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
>> @@ -1077,6 +1077,12 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control,
>>  		if (res)
>>  			DRM_ERROR("RAS table incorrect checksum or error:%d\n",
>>  				  res);
>> +
>> +		/* Warn if we are at 90% of the threshold or above */
> The kernel uses a couple of styles, this is one of them:
>
> /* Warn ...
>  */

That's a waste of space. That means there can be no single-line comments?

checkpatch.pl complains about multi-line comments that don't have the
final */ on their own line. But a single line comment is OK. Let's stick
with the coding style recommended by checkpatch.pl. If we make up our
own arbitrary rules for different reviewers and different parts of the
code, I think it becomes a mine-field of pointless cosmetic fixes for
anyone trespassing on unfamiliar code.


> if (...)
>
> Please use this style as it is used extensively in the amdgpu_ras_eeprom.c file.
>
>> +		if ((10 * control->ras_num_recs) >= (ras->bad_page_cnt_threshold * 9))
> You don't need the extra parenthesis around multiplication--it has higher precedence than relational operators--drop the extra parenthesis.

I agree. With that fixed, the patch is

Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>


>
> Regards,
> Luben
>
>> +			DRM_WARN("RAS records:%u exceeds 90%% of threshold:%d",
>> +					control->ras_num_recs,
>> +					ras->bad_page_cnt_threshold);
>>  	} else if (hdr->header == RAS_TABLE_HDR_BAD &&
>>  		   amdgpu_bad_page_threshold != 0) {
>>  		res = __verify_ras_table_checksum(control);


More information about the amd-gfx mailing list