[PATCH] amdgpu: disable GPU reset if amdgpu.lockup_timeout=0

Liu, Monk Monk.Liu at amd.com
Tue Dec 12 03:18:21 UTC 2017

NAK, you change break SRIOV logic:

Without lockup_timeout set, this gpu_recover() won't get called at all , unless your IB triggered invalid instruct and that IRQ invoked 
Amdgpu_gpu_recover(), by this cause you should disable the logic that in that IRQ instead of change gpu_recover() itself because 
For SRIOV we need gpu_recover() even lockup_timeout is zero 

-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf Of Marek Ol?ák
Sent: 2017年12月12日 5:30
To: amd-gfx at lists.freedesktop.org
Subject: [PATCH] amdgpu: disable GPU reset if amdgpu.lockup_timeout=0

From: Marek Olšák <marek.olsak at amd.com>

Signed-off-by: Marek Olšák <marek.olsak at amd.com>

Is this really correct? I have no easy way to test it.

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 8d03baa..56c41cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3018,20 +3018,24 @@ static int amdgpu_reset_sriov(struct amdgpu_device *adev, uint64_t *reset_flags,
  * Attempt to reset the GPU if it has hung (all asics).
  * Returns 0 for success or an error on failure.
 int amdgpu_gpu_recover(struct amdgpu_device *adev, struct amdgpu_job *job)  {
 	struct drm_atomic_state *state = NULL;
 	uint64_t reset_flags = 0;
 	int i, r, resched;
+	/* amdgpu.lockup_timeout=0 disables GPU reset. */
+	if (amdgpu_lockup_timeout == 0)
+		return 0;
 	if (!amdgpu_check_soft_reset(adev)) {
 		DRM_INFO("No hardware hang detected. Did some blocks stall?\n");
 		return 0;
 	dev_info(adev->dev, "GPU reset begin!\n");
 	adev->in_gpu_reset = 1;

amd-gfx mailing list
amd-gfx at lists.freedesktop.org

More information about the amd-gfx mailing list