[PATCH v2] drm/amd/amdgpu:Fix compute ring unable to detect hang.
Jesse Zhang
zhexi.zhang at amd.com
Thu Sep 19 07:08:55 UTC 2019
When compute fence did signal, compute ring cannot detect hardware hang
because its timeout value is set to be infinite by default.
In SR-IOV and passthrough mode, if user does not declare custome timeout
value for compute ring, then use gfx ring timeout value as default. So
that when there is a ture hardware hang, compute ring can detect it.
Change-Id: I794ec0868c6c0aad407749457260ecfee0617c10
Signed-off-by: Jesse Zhang <zhexi.zhang at amd.com>
---
drivers/gpu/drm/amd/amdgpu/soc15.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 7c7e9f5..6cd5548 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -687,6 +687,16 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev)
adev->rev_id = soc15_get_rev_id(adev);
adev->nbio.funcs->detect_hw_virt(adev);
+ /*
+ * If running under SR-IOV or passthrough mode and user did not set
+ * custom value for compute ring timeout, set timeout to be the same
+ * as gfx ring timeout to avoid compute ring cannot detect an true
+ * hang.
+ */
+ if ((amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev)) &&
+ (adev->compute_timeout == MAX_SCHEDULE_TIMEOUT))
+ adev->compute_timeout = adev->gfx_timeout;
+
if (amdgpu_sriov_vf(adev))
adev->virt.ops = &xgpu_ai_virt_ops;
--
2.7.4
More information about the amd-gfx
mailing list