[PATCH] drm/amdgpu: update ras sysfs feature info

Koenig, Christian Christian.Koenig at amd.com
Thu Aug 8 15:10:35 UTC 2019


Hi Hawking,

a multi line value is not the problem, but here you have multiple values.

E.g. in the case of the pp tables that is one big array of power 
profiles and we actually had a discussion if exposing them like this is 
ok or not.

But in the case of the ras features you got multiple different things in 
the same file. And that in turn is a clear violation of the sysfs rules.

I don't think we can't upstream the interface like this. See here for a 
good summary of the rules as well: https://lwn.net/Articles/378884/

Regards,
Christian.

Am 08.08.19 um 17:00 schrieb Zhang, Hawking:
> Hi Chris,
>
> I'm not aware of how ROCM SMI using feature nodes. but not all the sysfs are intended to be used by upper level apps/libs.
>
> There are bunches of sysfs entries that have multiple line value. The most complicate one would be pp_power_profile_mode, which looks like.
>
> 0 BOOTUP_DEFAULT*:
>                      0(       GFXCLK)       0       0       1       0       4     800 4587520  -65536       0
>                      1(       SOCCLK)       0       0       1       0       4     800  327680   -6553       0
>                      2(         UCLK)       0       0       1       0       4     800  327680  -65536       0
> .......
> 1 3D_FULL_SCREEN :
>                      0(       GFXCLK)       0       1       1       0       4     800 4587520  -65536       0
>                      1(       SOCCLK)       0       1       4     850       4     800  327680  -65536       0
>
> Regards,
> Hawking
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken at gmail.com>
> Sent: 2019年8月8日 22:25
> To: Zhang, Hawking <Hawking.Zhang at amd.com>; Russell, Kent <Kent.Russell at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan at amd.com>; Freehill, Chris <Chris.Freehill at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> Hi Hawking,
>
> looks like you skipped my response.
>
> Even the current way how sysfs is used in the ras code is a clear NO-GO and should be fixed before this is pushed upstream.
>
> A sysfs entry should seriously NOT return a multi line value which needs to be extensively parsed by the application.
>
> Regards,
> Christian.
>
> Am 08.08.19 um 15:50 schrieb Zhang, Hawking:
>> Understood and agree we should keep stable interfaces.
>>
>> But the information in feature node is not correct and makes people confusing. Basically, each IP blocks can support all the four error types, not just multi-uncorrectable. As a result, any upper level apps/libs that read from this file will just get confusing information as well. The feature mask is already good enough for this node.
>>
>> Regards,
>> Hawking
>> -----Original Message-----
>> From: Russell, Kent <Kent.Russell at amd.com>
>> Sent: 2019年8月8日 20:51
>> To: Zhang, Hawking <Hawking.Zhang at amd.com>; Zhou1, Tao
>> <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org; Pan, Xinhui
>> <Xinhui.Pan at amd.com>; Freehill, Chris <Chris.Freehill at amd.com>
>> Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
>> Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info
>>
>> +Chris Freehill
>>
>> While I can understand this change, this broke our SMI interface, which was expecting a specific string format for the ras/features file. This has happened a few times now, where changes to the RAS sysfs files has broke the SMI CLI and/or SMI LIB. Can we please get a stable interface and sysfs format set up before publishing patches? This is creating a lot of extra work for developers with the SMI to constantly keep up with the changes being made to sysfs files. Thank you.
>>
>>    Kent
>>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
>> Zhang, Hawking
>> Sent: Monday, August 5, 2019 4:15 AM
>> To: Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org;
>> Pan, Xinhui <Xinhui.Pan at amd.com>
>> Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
>> Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info
>>
>> Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>
>>
>> Regards,
>> Hawking
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Tao
>> Zhou
>> Sent: 2019年8月5日 16:04
>> To: amd-gfx at lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan at amd.com>;
>> Zhang, Hawking <Hawking.Zhang at amd.com>
>> Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
>> Subject: [PATCH] drm/amdgpu: update ras sysfs feature info
>>
>> remove confused ras error type info
>>
>> Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 +++++------------
>>    1 file changed, 5 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> index d2e8a85f6e38..369651247b23 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> @@ -787,25 +787,18 @@ static ssize_t amdgpu_ras_sysfs_features_read(struct device *dev,
>>    	struct amdgpu_device *adev = ddev->dev_private;
>>    	struct ras_common_if head;
>>    	int ras_block_count = AMDGPU_RAS_BLOCK_COUNT;
>> -	int i;
>> +	int i, enabled;
>>    	ssize_t s;
>> -	struct ras_manager *obj;
>>    
>>    	s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n",
>> con->features);
>>    
>>    	for (i = 0; i < ras_block_count; i++) {
>>    		head.block = i;
>> +		enabled = amdgpu_ras_is_feature_enabled(adev, &head);
>>    
>> -		if (amdgpu_ras_is_feature_enabled(adev, &head)) {
>> -			obj = amdgpu_ras_find_obj(adev, &head);
>> -			s += scnprintf(&buf[s], PAGE_SIZE - s,
>> -					"%s: %s\n",
>> -					ras_block_str(i),
>> -					ras_err_str(obj->head.type));
>> -		} else
>> -			s += scnprintf(&buf[s], PAGE_SIZE - s,
>> -					"%s: disabled\n",
>> -					ras_block_str(i));
>> +		s += scnprintf(&buf[s], PAGE_SIZE - s,
>> +				"%s ras feature mask: %s\n",
>> +				ras_block_str(i), enabled?"on":"off");
>>    	}
>>    
>>    	return s;



More information about the amd-gfx mailing list