[PATCH v4] drm/amd/amdgpu:Fix compute ring unable to detect hang.
Christian König
ckoenig.leichtzumerken at gmail.com
Thu Sep 19 08:14:24 UTC 2019
Am 19.09.19 um 10:00 schrieb Jesse Zhang:
> When compute fence did not signal, compute ring cannot detect hardware
> hang because its timeout value is set to be infinite by default.
>
> In SR-IOV and passthrough mode, if user does not declare custome timeout
> value for compute ring, then use gfx ring timeout value as default. So
> that when there is a ture hardware hang, compute ring can detect it.
>
> Change-Id: I794ec0868c6c0aad407749457260ecfee0617c10
> Signed-off-by: Jesse Zhang <zhexi.zhang at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 5 +----
> drivers/gpu/drm/amd/amdgpu/soc15.c | 10 ++++++++++
> 2 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index cbcaa7c..963b6d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -468,10 +468,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
> * For sriov case, always use the timeout
> * as gfx ring
> */
Please also remove the comment since that is now stale.
Apart from that looks good to me,
Christian.
> - if (!amdgpu_sriov_vf(ring->adev))
> - timeout = adev->compute_timeout;
> - else
> - timeout = adev->gfx_timeout;
> + timeout = adev->compute_timeout;
> break;
> case AMDGPU_RING_TYPE_SDMA:
> timeout = adev->sdma_timeout;
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index 7c7e9f5..6cd5548 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -687,6 +687,16 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev)
> adev->rev_id = soc15_get_rev_id(adev);
> adev->nbio.funcs->detect_hw_virt(adev);
>
> + /*
> + * If running under SR-IOV or passthrough mode and user did not set
> + * custom value for compute ring timeout, set timeout to be the same
> + * as gfx ring timeout to avoid compute ring cannot detect an true
> + * hang.
> + */
> + if ((amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev)) &&
> + (adev->compute_timeout == MAX_SCHEDULE_TIMEOUT))
> + adev->compute_timeout = adev->gfx_timeout;
> +
> if (amdgpu_sriov_vf(adev))
> adev->virt.ops = &xgpu_ai_virt_ops;
>
More information about the amd-gfx
mailing list