[PATCH 1/2] SWDEV-195825 drm/amd/amdgpu:[Gibraltar][V320] tdr-1 test failed after 2 rounds

Jesse Zhang zhexi.zhang at amd.com
Tue Sep 17 06:31:45 UTC 2019


Issue:
quark didn't trigger TDR correctly on compute ring

Root cause:
Default timeout value for compute ring is infinite

Solution:
In SR-IOV and passthrough mode, if compute ring timeout
is set, then use user set value; if not set, then use
same value as gfx ring timeout.

Signed-off-by: Jesse Zhang <zhexi.zhang at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/soc15.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
index 7c7e9f5..4155cc1 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -687,6 +687,16 @@ int soc15_set_ip_blocks(struct amdgpu_device *adev)
 	adev->rev_id = soc15_get_rev_id(adev);
 	adev->nbio.funcs->detect_hw_virt(adev);
 
+	/*
+	 * If running under SR-IOV or passthrough mode and user did not set
+	 * custom value for compute ring timeout, set timeout to be the same
+	 * as gfx ring timeout to avoid compute ring cannot detect an true
+	 * hang.
+	 */
+	if ((amdgpu_sriov_vf(adev) || amdgpu_passthrough(adev)) &&
+		adev->compute_timeout == MAX_SCHEDULE_TIMEOUT)
+		adev->compute_timeout = adev->gfx_timeout;
+
 	if (amdgpu_sriov_vf(adev))
 		adev->virt.ops = &xgpu_ai_virt_ops;
 
-- 
2.7.4



More information about the amd-gfx mailing list