[PATCH] drm/amdgpu: set default noretry=1 to fix kfd SVM issues for raven

Alex Deucher alexdeucher at gmail.com
Wed Jul 28 13:21:46 UTC 2021


On Wed, Jul 28, 2021 at 2:36 AM Changfeng <Changfeng.Zhu at amd.com> wrote:
>
> From: changzhu <Changfeng.Zhu at amd.com>
>
> From: Changfeng <Changfeng.Zhu at amd.com>
>
> It can't find any issues with noretry=1 except two SVM migrate issues.
> Oppositely, it will cause most SVM cases fail with noretry=0.
> The two SVM migrate issues also happen with noretry=0. So it can set
> default noretry=1 for raven firstly to fix most SVM fails.
>
> Change-Id: Idb5cb3c1a04104013e4ab8aed2ad4751aaec4bbc
> Signed-off-by: Changfeng <Changfeng.Zhu at amd.com>

I would suggest testing this on a wide variety of raven systems,
including some OEM ones if possible.  Last time we did this it caused
tons of stability issues with raven systems.

Alex


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 09edfb64cce0..d7f69dbd48e6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -606,19 +606,20 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
>                  * noretry = 0 will cause kfd page fault tests fail
>                  * for some ASICs, so set default to 1 for these ASICs.
>                  */
> +       case CHIP_RAVEN:
> +               /*
> +                * TODO: Raven currently can fix most SVM issues with
> +                * noretry =1. However it has two issues with noretry = 1
> +                * on kfd migrate tests. It still needs to root causes
> +                * with these two migrate fails on raven with noretry = 1.
> +                */
>                 if (amdgpu_noretry == -1)
>                         gmc->noretry = 1;
>                 else
>                         gmc->noretry = amdgpu_noretry;
>                 break;
> -       case CHIP_RAVEN:
>         default:
> -               /* Raven currently has issues with noretry
> -                * regardless of what we decide for other
> -                * asics, we should leave raven with
> -                * noretry = 0 until we root cause the
> -                * issues.
> -                *
> +               /*
>                  * default this to 0 for now, but we may want
>                  * to change this in the future for certain
>                  * GPUs as it can increase performance in
> --
> 2.17.1
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More information about the amd-gfx mailing list