[PATCH v5] drm/amd/amdgpu:Fix compute ring unable to detect hang.
Christian König
ckoenig.leichtzumerken at gmail.com
Thu Sep 19 12:12:15 UTC 2019
Am 19.09.19 um 12:09 schrieb Jesse Zhang:
> When compute fence did signal, compute ring cannot detect hardware hang
> because its timeout value is set to be infinite by default.
>
> In SR-IOV and passthrough mode, if user does not declare custome timeout
> value for compute ring, then use gfx ring timeout value as default. So
> that when there is a ture hardware hang, compute ring can detect it.
>
> Change-Id: I794ec0868c6c0aad407749457260ecfee0617c10
> Signed-off-by: Jesse Zhang <zhexi.zhang at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 ++++++------
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +++-
> 2 files changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 3b5282b..03ac5a1da 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1024,12 +1024,6 @@ static int amdgpu_device_check_arguments(struct amdgpu_device *adev)
>
> amdgpu_device_check_block_size(adev);
>
> - ret = amdgpu_device_get_job_timeout_settings(adev);
> - if (ret) {
> - dev_err(adev->dev, "invalid lockup_timeout parameter syntax\n");
> - return ret;
> - }
> -
> adev->firmware.load_type = amdgpu_ucode_get_load_type(adev, amdgpu_fw_load_type);
>
> return ret;
> @@ -2732,6 +2726,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> if (r)
> return r;
>
> + r = amdgpu_device_get_job_timeout_settings(adev);
> + if (r) {
> + dev_err(adev->dev, "invalid lockup_timeout parameter syntax\n");
> + return r;
> + }
> +
I assume that you move the code because previously SRIOV/passthrough
setting is not available yet?
But even with this here you can still remove the extra SRIOV check in
amdgpu_fence.c.
Regards,
Christian.
> /* doorbell bar mapping and doorbell index init*/
> amdgpu_device_doorbell_init(adev);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 420888e..1236245 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1378,10 +1378,12 @@ int amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
> }
> /*
> * There is only one value specified and
> - * it should apply to all non-compute jobs.
> + * it should apply to all jobs.
> */
> if (index == 1)
> adev->sdma_timeout = adev->video_timeout = adev->gfx_timeout;
> + if (amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev))
> + adev->compute_timeout = adev->gfx_timeout;
> }
>
> return ret;
More information about the amd-gfx
mailing list