[PATCH] drm/amdgpu: update ras sysfs feature info

Zhang, Hawking Hawking.Zhang at amd.com
Thu Aug 8 15:00:28 UTC 2019


Hi Chris,

I'm not aware of how ROCM SMI using feature nodes. but not all the sysfs are intended to be used by upper level apps/libs. 

There are bunches of sysfs entries that have multiple line value. The most complicate one would be pp_power_profile_mode, which looks like. 

0 BOOTUP_DEFAULT*:
                    0(       GFXCLK)       0       0       1       0       4     800 4587520  -65536       0
                    1(       SOCCLK)       0       0       1       0       4     800  327680   -6553       0
                    2(         UCLK)       0       0       1       0       4     800  327680  -65536       0
.......
1 3D_FULL_SCREEN :
                    0(       GFXCLK)       0       1       1       0       4     800 4587520  -65536       0
                    1(       SOCCLK)       0       1       4     850       4     800  327680  -65536       0

Regards,
Hawking
-----Original Message-----
From: Christian König <ckoenig.leichtzumerken at gmail.com> 
Sent: 2019年8月8日 22:25
To: Zhang, Hawking <Hawking.Zhang at amd.com>; Russell, Kent <Kent.Russell at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan at amd.com>; Freehill, Chris <Chris.Freehill at amd.com>
Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info

Hi Hawking,

looks like you skipped my response.

Even the current way how sysfs is used in the ras code is a clear NO-GO and should be fixed before this is pushed upstream.

A sysfs entry should seriously NOT return a multi line value which needs to be extensively parsed by the application.

Regards,
Christian.

Am 08.08.19 um 15:50 schrieb Zhang, Hawking:
> Understood and agree we should keep stable interfaces.
>
> But the information in feature node is not correct and makes people confusing. Basically, each IP blocks can support all the four error types, not just multi-uncorrectable. As a result, any upper level apps/libs that read from this file will just get confusing information as well. The feature mask is already good enough for this node.
>
> Regards,
> Hawking
> -----Original Message-----
> From: Russell, Kent <Kent.Russell at amd.com>
> Sent: 2019年8月8日 20:51
> To: Zhang, Hawking <Hawking.Zhang at amd.com>; Zhou1, Tao 
> <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org; Pan, Xinhui 
> <Xinhui.Pan at amd.com>; Freehill, Chris <Chris.Freehill at amd.com>
> Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
> Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> +Chris Freehill
>
> While I can understand this change, this broke our SMI interface, which was expecting a specific string format for the ras/features file. This has happened a few times now, where changes to the RAS sysfs files has broke the SMI CLI and/or SMI LIB. Can we please get a stable interface and sysfs format set up before publishing patches? This is creating a lot of extra work for developers with the SMI to constantly keep up with the changes being made to sysfs files. Thank you.
>
>   Kent
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of 
> Zhang, Hawking
> Sent: Monday, August 5, 2019 4:15 AM
> To: Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org; 
> Pan, Xinhui <Xinhui.Pan at amd.com>
> Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
> Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>
>
> Regards,
> Hawking
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Tao 
> Zhou
> Sent: 2019年8月5日 16:04
> To: amd-gfx at lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan at amd.com>; 
> Zhang, Hawking <Hawking.Zhang at amd.com>
> Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
> Subject: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> remove confused ras error type info
>
> Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 +++++------------
>   1 file changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index d2e8a85f6e38..369651247b23 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -787,25 +787,18 @@ static ssize_t amdgpu_ras_sysfs_features_read(struct device *dev,
>   	struct amdgpu_device *adev = ddev->dev_private;
>   	struct ras_common_if head;
>   	int ras_block_count = AMDGPU_RAS_BLOCK_COUNT;
> -	int i;
> +	int i, enabled;
>   	ssize_t s;
> -	struct ras_manager *obj;
>   
>   	s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n", 
> con->features);
>   
>   	for (i = 0; i < ras_block_count; i++) {
>   		head.block = i;
> +		enabled = amdgpu_ras_is_feature_enabled(adev, &head);
>   
> -		if (amdgpu_ras_is_feature_enabled(adev, &head)) {
> -			obj = amdgpu_ras_find_obj(adev, &head);
> -			s += scnprintf(&buf[s], PAGE_SIZE - s,
> -					"%s: %s\n",
> -					ras_block_str(i),
> -					ras_err_str(obj->head.type));
> -		} else
> -			s += scnprintf(&buf[s], PAGE_SIZE - s,
> -					"%s: disabled\n",
> -					ras_block_str(i));
> +		s += scnprintf(&buf[s], PAGE_SIZE - s,
> +				"%s ras feature mask: %s\n",
> +				ras_block_str(i), enabled?"on":"off");
>   	}
>   
>   	return s;



More information about the amd-gfx mailing list