[PATCH 1/5] drm/amdgpu: Make default ras error type to none

Pan, Xinhui Xinhui.Pan at amd.com
Mon Apr 8 14:19:02 UTC 2019


Wait, this patchset intriduced a bug. will fix that later.

Currently, all ras ta cmds must be performed with corresponding ras objects.
So if vbios enable ras for us, we need set up the ras objects as soon as possible, otherwise the status does not match. This patch moves the ras_enable_all who set up the ras objects after all IP's initialization. So it is impossible to perform a ras ta disable cmd as there is no object.

there are 2 scenarios,
1) ras is enabled by vbios.
case enable:
enable_on_boot will bypass psp, just set up object.
case disable:
enable_on_boot need set up object first, then perform a ras ta disable cmd and update the object.

2) ras is disabled when boot by default.
case enable:
enable_on_boot will make a ras ta enable cmd and set up object.
case disable:
do nothing.

This patch only handle #1 case enable and #2 case enable and case disable. Will fix #1 case enable later.

Evan, the question you raised is good.
Performing a ras ta disable cmd without a ras object makes the code logic incorrect. looks a little weird, but code is simple.
________________________________
From: Quan, Evan
Sent: Monday, April 8, 2019 8:17:44 PM
To: Pan, Xinhui; amd-gfx at lists.freedesktop.org
Cc: Deucher, Alexander
Subject: RE: [PATCH 1/5] drm/amdgpu: Make default ras error type to none

This series is fine. Please feel free to add my RB.
But it reminds me another thing: it's a little weird to enable all RAS features and disable those not needed afterwards.
Is it possible to only enable those which are really needed(no disablement will be needed then)?

Regards,
Evan
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Pan,
> Xinhui
> Sent: 2019年4月8日 15:08
> To: amd-gfx at lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>
> Subject: [PATCH 1/5] drm/amdgpu: Make default ras error type to none
>
> Unless IP has implemented its own ras, use ERROR_NONE as the default type.
>
> Signed-off-by: xinhui pan <xinhui.pan at amd.com>
> Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 24 +++++++++++++++------
> ---
>  1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 2b7a420d5a1f..6cbf1d34c5b4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -664,11 +664,13 @@ static int amdgpu_ras_enable_all_features(struct
> amdgpu_device *adev,
>        struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
>        int ras_block_count = AMDGPU_RAS_BLOCK_COUNT;
>        int i;
> +     const enum amdgpu_ras_error_type default_ras_type =
> +             AMDGPU_RAS_ERROR__NONE;
>
>        for (i = 0; i < ras_block_count; i++) {
>                struct ras_common_if head = {
>                        .block = i,
> -                     .type =
> AMDGPU_RAS_ERROR__MULTI_UNCORRECTABLE,
> +                     .type = default_ras_type,
>                        .sub_block_index = 0,
>                };
>                strcpy(head.name, ras_block_str(i));
> @@ -1474,9 +1476,6 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
>
>        amdgpu_ras_mask &= AMDGPU_RAS_BLOCK_MASK;
>
> -     if (con->flags & AMDGPU_RAS_FLAG_INIT_BY_VBIOS)
> -             amdgpu_ras_enable_all_features(adev, 1);
> -
>        if (amdgpu_ras_fs_init(adev))
>                goto fs_out;
>
> @@ -1504,18 +1503,25 @@ void amdgpu_ras_post_init(struct
> amdgpu_device *adev)
>        if (!con)
>                return;
>
> -     /* We enable ras on all hw_supported block, but as boot parameter
> might
> -      * disable some of them and one or more IP has not implemented
> yet.
> -      * So we disable them on behalf.
> -      */
>        if (con->flags & AMDGPU_RAS_FLAG_INIT_BY_VBIOS) {
> +             /* Set up all other IPs which are not implemented. There is a
> +              * tricky thing that IP's actual ras error type should be
> +              * MULTI_UNCORRECTABLE, but as driver does not handle it,
> so
> +              * ERROR_NONE make sense anyway.
> +              */
> +             amdgpu_ras_enable_all_features(adev, 1);
> +
> +             /* We enable ras on all hw_supported block, but as boot
> +              * parameter might disable some of them and one or more IP
> has
> +              * not implemented yet. So we disable them on behalf.
> +              */
>                list_for_each_entry_safe(obj, tmp, &con->head, node) {
>                        if (!amdgpu_ras_is_supported(adev, obj-
> >head.block)) {
>                                amdgpu_ras_feature_enable(adev, &obj-
> >head, 0);
>                                /* there should be no any reference. */
>                                WARN_ON(alive_obj(obj));
>                        }
> -             };
> +             }
>        }
>  }
>
> --
> 2.17.1
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190408/91f3361c/attachment.html>


More information about the amd-gfx mailing list