[PATCH 2/2] drm/amdgpu: add an auto setting to the noretry parameter

Wed Sep 23 14:48:02 UTC 2020

Series is Reviewed-by: Luben Tuikov <luben.tuikov at amd.com>

This is a good change!

Regards,
Luben

On 2020-09-23 10:08, Alex Deucher wrote:
> This allows us to set different defaults on a per asic basis.  This
> way we can enable noretry on dGPUs where it can increase performance
> in certain cases and disable it on chips where it can be problematic.
> 
> For now the default is 0 for all asics, but we may want to try and
> enable it again for newer dGPUs.
> 
> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  9 +++++++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 26 ++++++++++++++++++++++++-
>  2 files changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index a4b518211b1f..f3e2fbcfadfb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -147,7 +147,7 @@ int amdgpu_async_gfx_ring = 1;
>  int amdgpu_mcbp = 0;
>  int amdgpu_discovery = -1;
>  int amdgpu_mes = 0;
> -int amdgpu_noretry;
> +int amdgpu_noretry = -1;
>  int amdgpu_force_asic_type = -1;
>  int amdgpu_tmz = 0;
>  int amdgpu_reset_method = -1; /* auto */
> @@ -596,8 +596,13 @@ MODULE_PARM_DESC(mes,
>  	"Enable Micro Engine Scheduler (0 = disabled (default), 1 = enabled)");
>  module_param_named(mes, amdgpu_mes, int, 0444);
>  
> +/**
> + * DOC: noretry (int)
> + * Disable retry faults in the GPU memory controller.
> + * (0 = retry enabled, 1 = retry disabled, -1 auto (default))
> + */
>  MODULE_PARM_DESC(noretry,
> -	"Disable retry faults (0 = retry enabled (default), 1 = retry disabled)");
> +	"Disable retry faults (0 = retry enabled, 1 = retry disabled, -1 auto (default))");
>  module_param_named(noretry, amdgpu_noretry, int, 0644);
>  
>  /**
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> index 3572629fef0a..36604d751d62 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -424,7 +424,31 @@ void amdgpu_gmc_noretry_set(struct amdgpu_device *adev)
>  {
>  	struct amdgpu_gmc *gmc = &adev->gmc;
>  
> -	gmc->noretry = amdgpu_noretry;
> +	switch (adev->asic_type) {
> +	case CHIP_RAVEN:
> +		/* Raven currently has issues with noretry
> +		 * regardless of what we decide for other
> +		 * asics, we should leave raven with
> +		 * noretry = 0 until we root cause the
> +		 * issues.
> +		 */
> +		if (amdgpu_noretry == -1)
> +			gmc->noretry = 0;
> +		else
> +			gmc->noretry = amdgpu_noretry;
> +		break;
> +	default:
> +		/* default this to 0 for now, but we may want
> +		 * to change this in the future for certain
> +		 * GPUs as it can increase performance in
> +		 * certain cases.
> +		 */
> +		if (amdgpu_noretry == -1)
> +			gmc->noretry = 0;
> +		else
> +			gmc->noretry = amdgpu_noretry;
> +		break;
> +	}
>  }
>  
>  void amdgpu_gmc_set_vm_fault_masks(struct amdgpu_device *adev, int hub_type,
>