[PATCH] drm/amdgpu: fix the nullptr issue when reenter GPU recovery
Zhang, Hawking
Hawking.Zhang at amd.com
Thu Aug 20 08:24:22 UTC 2020
[AMD Public Use]
Hi Dennis,
Can you elaborate the case that driver re-enter GPU recovery in sGPU system? I'm wondering whether this is a valid case or we shall prevent this from the beginning.
Regards,
Hawking
-----Original Message-----
From: Dennis Li <Dennis.Li at amd.com>
Sent: Thursday, August 20, 2020 10:21
To: amd-gfx at lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>
Cc: Li, Dennis <Dennis.Li at amd.com>
Subject: [PATCH] drm/amdgpu: fix the nullptr issue when reenter GPU recovery
in single gpu system, if driver reenter gpu recovery, amdgpu_device_lock_adev will return false, but hive is nullptr now.
Signed-off-by: Dennis Li <Dennis.Li at amd.com>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 82242e2f5658..81b1d9a1dca0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4371,8 +4371,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
if (!amdgpu_device_lock_adev(tmp_adev)) {
DRM_INFO("Bailing on TDR for s_job:%llx, as another already in progress",
job ? job->base.id : -1);
- mutex_unlock(&hive->hive_lock);
- return 0;
+ r = 0;
+ goto skip_recovery;
}
/*
@@ -4505,6 +4505,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
amdgpu_device_unlock_adev(tmp_adev);
}
+skip_recovery:
if (hive) {
atomic_set(&hive->in_reset, 0);
mutex_unlock(&hive->hive_lock);
--
2.17.1
More information about the amd-gfx
mailing list