[PATCH v2] drm/amd/amdgpu: set the default value of noretry to 1 for some dGPUs

Felix Kuehling felix.kuehling at amd.com
Tue Oct 13 15:32:08 UTC 2020


Do you have more details about those test failures. In theory that test
should pass with noretry=0. If it fails, I'd rather look into the
problem than hiding it with a workaround.

Regards,
  Felix

Am 2020-10-13 um 11:13 a.m. schrieb Chengming Gui:
> noretry = 0 cause some dGPU's kfd page fault tests fail,
> so set noretry to 1 for these special ASICs:
> vega20/navi10/navi14/ARCTURUS
>
> v2:merge raven and default case due to the same setting
>
> Signed-off-by: Chengming Gui <Jack.Gui at amd.com>
> Change-Id: I3be70f463a49b0cd5c56456431d6c2cb98b13872
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 36604d751d62..3b7b9a5e9749 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -425,20 +425,28 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
>  	struct amdgpu_gmc *gmc = &adev->gmc;
>  
>  	switch (adev->asic_type) {
> -	case CHIP_RAVEN:
> -		/* Raven currently has issues with noretry
> -		 * regardless of what we decide for other
> -		 * asics, we should leave raven with
> -		 * noretry = 0 until we root cause the
> -		 * issues.
> +	case CHIP_VEGA20:
> +	case CHIP_NAVI10:
> +	case CHIP_NAVI14:
> +	case CHIP_ARCTURUS:
> +		/*
> +		 * noretry = 0 will cause kfd page fault tests fail
> +		 * for some ASICs, so set default to 1 for these ASICs.
>  		 */
>  		if (amdgpu_noretry == -1)
> -			gmc->noretry = 0;
> +			gmc->noretry = 1;
>  		else
>  			gmc->noretry = amdgpu_noretry;
>  		break;
> +	case CHIP_RAVEN:
>  	default:
> -		/* default this to 0 for now, but we may want
> +		/* Raven currently has issues with noretry
> +		 * regardless of what we decide for other
> +		 * asics, we should leave raven with
> +		 * noretry = 0 until we root cause the
> +		 * issues.
> +		 *
> +		 * default this to 0 for now, but we may want
>  		 * to change this in the future for certain
>  		 * GPUs as it can increase performance in
>  		 * certain cases.


More information about the amd-gfx mailing list