[PATCH] drm/amdgpu: set default noretry=1 to fix kfd SVM issues for raven
Alex Deucher
alexdeucher at gmail.com
Wed Jul 28 13:21:46 UTC 2021
On Wed, Jul 28, 2021 at 2:36 AM Changfeng <Changfeng.Zhu at amd.com> wrote:
>
> From: changzhu <Changfeng.Zhu at amd.com>
>
> From: Changfeng <Changfeng.Zhu at amd.com>
>
> It can't find any issues with noretry=1 except two SVM migrate issues.
> Oppositely, it will cause most SVM cases fail with noretry=0.
> The two SVM migrate issues also happen with noretry=0. So it can set
> default noretry=1 for raven firstly to fix most SVM fails.
>
> Change-Id: Idb5cb3c1a04104013e4ab8aed2ad4751aaec4bbc
> Signed-off-by: Changfeng <Changfeng.Zhu at amd.com>
I would suggest testing this on a wide variety of raven systems,
including some OEM ones if possible. Last time we did this it caused
tons of stability issues with raven systems.
Alex
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 09edfb64cce0..d7f69dbd48e6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -606,19 +606,20 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
> * noretry = 0 will cause kfd page fault tests fail
> * for some ASICs, so set default to 1 for these ASICs.
> */
> + case CHIP_RAVEN:
> + /*
> + * TODO: Raven currently can fix most SVM issues with
> + * noretry =1. However it has two issues with noretry = 1
> + * on kfd migrate tests. It still needs to root causes
> + * with these two migrate fails on raven with noretry = 1.
> + */
> if (amdgpu_noretry == -1)
> gmc->noretry = 1;
> else
> gmc->noretry = amdgpu_noretry;
> break;
> - case CHIP_RAVEN:
> default:
> - /* Raven currently has issues with noretry
> - * regardless of what we decide for other
> - * asics, we should leave raven with
> - * noretry = 0 until we root cause the
> - * issues.
> - *
> + /*
> * default this to 0 for now, but we may want
> * to change this in the future for certain
> * GPUs as it can increase performance in
> --
> 2.17.1
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list