[PATCH] drm/amdgpu: support gpu recovery tests on compute rings

Christian König ckoenig.leichtzumerken at gmail.com
Fri Apr 26 07:33:50 UTC 2019


Am 26.04.19 um 09:24 schrieb Evan Quan:
> A new module parameter is added for determining
> whether or not to enforce timeout on compute jobs.

Can we rework that a bit and instead of a bool have a separate 
millisecond timeout for compute?

E.g. default is 0 and that means MAX_SCHEDULE_TIMEOUT unless we are 
under SRIOV.
Any other value is just the timeout in milliseconds.

Christian.

>
> Change-Id: If14b75977312e42dac0431072456e5b69cf1bc2f
> Signed-off-by: Evan Quan <evan.quan at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h       | 1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c   | 8 ++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 3 ++-
>   3 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index e16dcee2bf75..ee624d993df7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -166,6 +166,7 @@ extern int amdgpu_si_support;
>   #ifdef CONFIG_DRM_AMDGPU_CIK
>   extern int amdgpu_cik_support;
>   #endif
> +extern bool amdgpu_compute_timeout_enforced;
>   
>   #define AMDGPU_VM_MAX_NUM_CTX			4096
>   #define AMDGPU_SG_THRESHOLD			(256*1024*1024)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 13a68f62bcc8..91de3e90fae9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -140,6 +140,7 @@ struct amdgpu_mgpu_info mgpu_info = {
>   };
>   int amdgpu_ras_enable = -1;
>   uint amdgpu_ras_mask = 0xffffffff;
> +bool amdgpu_compute_timeout_enforced = false;
>   
>   /**
>    * DOC: vramlimit (int)
> @@ -234,6 +235,13 @@ module_param_named(msi, amdgpu_msi, int, 0444);
>   MODULE_PARM_DESC(lockup_timeout, "GPU lockup timeout in ms > 0 (default 10000)");
>   module_param_named(lockup_timeout, amdgpu_lockup_timeout, int, 0444);
>   
> +/**
> + * DOC: compute_timeout_enforced (bool)
> + * Whether or not to enforce timeout on compute jobs (1 = enable, 0 = disable). The default is 0.
> + */
> +MODULE_PARM_DESC(compute_timeout_enforced, "Enforce timeout on compute jobs (1 = enable, 0 = disable (default))");
> +module_param_named(compute_timeout_enforced, amdgpu_compute_timeout_enforced, bool, 0444);
> +
>   /**
>    * DOC: dpm (int)
>    * Override for dynamic power management setting (1 = enable, 0 = disable). The default is -1 (auto).
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 4dee2326b29c..4adffad04dbc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -453,7 +453,8 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
>   	if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) {
>   		/* for non-sriov case, no timeout enforce on compute ring */
>   		if ((ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE)
> -				&& !amdgpu_sriov_vf(ring->adev))
> +				&& !amdgpu_sriov_vf(ring->adev)
> +				&& !amdgpu_compute_timeout_enforced)
>   			timeout = MAX_SCHEDULE_TIMEOUT;
>   		else
>   			timeout = msecs_to_jiffies(amdgpu_lockup_timeout);



More information about the amd-gfx mailing list