[PATCH] drm/amdgpu: update ras sysfs feature info
Zhang, Hawking
Hawking.Zhang at amd.com
Thu Aug 8 15:00:28 UTC 2019
Hi Chris,
I'm not aware of how ROCM SMI using feature nodes. but not all the sysfs are intended to be used by upper level apps/libs.
There are bunches of sysfs entries that have multiple line value. The most complicate one would be pp_power_profile_mode, which looks like.
0 BOOTUP_DEFAULT*:
0( GFXCLK) 0 0 1 0 4 800 4587520 -65536 0
1( SOCCLK) 0 0 1 0 4 800 327680 -6553 0
2( UCLK) 0 0 1 0 4 800 327680 -65536 0
.......
1 3D_FULL_SCREEN :
0( GFXCLK) 0 1 1 0 4 800 4587520 -65536 0
1( SOCCLK) 0 1 4 850 4 800 327680 -65536 0
Regards,
Hawking
-----Original Message-----
From: Christian König <ckoenig.leichtzumerken at gmail.com>
Sent: 2019年8月8日 22:25
To: Zhang, Hawking <Hawking.Zhang at amd.com>; Russell, Kent <Kent.Russell at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan at amd.com>; Freehill, Chris <Chris.Freehill at amd.com>
Subject: Re: [PATCH] drm/amdgpu: update ras sysfs feature info
Hi Hawking,
looks like you skipped my response.
Even the current way how sysfs is used in the ras code is a clear NO-GO and should be fixed before this is pushed upstream.
A sysfs entry should seriously NOT return a multi line value which needs to be extensively parsed by the application.
Regards,
Christian.
Am 08.08.19 um 15:50 schrieb Zhang, Hawking:
> Understood and agree we should keep stable interfaces.
>
> But the information in feature node is not correct and makes people confusing. Basically, each IP blocks can support all the four error types, not just multi-uncorrectable. As a result, any upper level apps/libs that read from this file will just get confusing information as well. The feature mask is already good enough for this node.
>
> Regards,
> Hawking
> -----Original Message-----
> From: Russell, Kent <Kent.Russell at amd.com>
> Sent: 2019年8月8日 20:51
> To: Zhang, Hawking <Hawking.Zhang at amd.com>; Zhou1, Tao
> <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org; Pan, Xinhui
> <Xinhui.Pan at amd.com>; Freehill, Chris <Chris.Freehill at amd.com>
> Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
> Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> +Chris Freehill
>
> While I can understand this change, this broke our SMI interface, which was expecting a specific string format for the ras/features file. This has happened a few times now, where changes to the RAS sysfs files has broke the SMI CLI and/or SMI LIB. Can we please get a stable interface and sysfs format set up before publishing patches? This is creating a lot of extra work for developers with the SMI to constantly keep up with the changes being made to sysfs files. Thank you.
>
> Kent
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
> Zhang, Hawking
> Sent: Monday, August 5, 2019 4:15 AM
> To: Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org;
> Pan, Xinhui <Xinhui.Pan at amd.com>
> Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
> Subject: RE: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>
>
> Regards,
> Hawking
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Tao
> Zhou
> Sent: 2019年8月5日 16:04
> To: amd-gfx at lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan at amd.com>;
> Zhang, Hawking <Hawking.Zhang at amd.com>
> Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
> Subject: [PATCH] drm/amdgpu: update ras sysfs feature info
>
> remove confused ras error type info
>
> Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 17 +++++------------
> 1 file changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index d2e8a85f6e38..369651247b23 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -787,25 +787,18 @@ static ssize_t amdgpu_ras_sysfs_features_read(struct device *dev,
> struct amdgpu_device *adev = ddev->dev_private;
> struct ras_common_if head;
> int ras_block_count = AMDGPU_RAS_BLOCK_COUNT;
> - int i;
> + int i, enabled;
> ssize_t s;
> - struct ras_manager *obj;
>
> s = scnprintf(buf, PAGE_SIZE, "feature mask: 0x%x\n",
> con->features);
>
> for (i = 0; i < ras_block_count; i++) {
> head.block = i;
> + enabled = amdgpu_ras_is_feature_enabled(adev, &head);
>
> - if (amdgpu_ras_is_feature_enabled(adev, &head)) {
> - obj = amdgpu_ras_find_obj(adev, &head);
> - s += scnprintf(&buf[s], PAGE_SIZE - s,
> - "%s: %s\n",
> - ras_block_str(i),
> - ras_err_str(obj->head.type));
> - } else
> - s += scnprintf(&buf[s], PAGE_SIZE - s,
> - "%s: disabled\n",
> - ras_block_str(i));
> + s += scnprintf(&buf[s], PAGE_SIZE - s,
> + "%s ras feature mask: %s\n",
> + ras_block_str(i), enabled?"on":"off");
> }
>
> return s;
More information about the amd-gfx
mailing list