[PATCH v4 3/3] dmr/amdgpu: Add system auto reboot to RAS.
Zhang, Hawking
Hawking.Zhang at amd.com
Tue Sep 3 19:13:07 UTC 2019
Series is Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>
Regards,
Hawking
-----Original Message-----
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Andrey Grodzovsky
Sent: 2019年9月4日 0:44
To: amd-gfx at lists.freedesktop.org
Cc: alexdeucher at gmail.com; ckoenig.leichtzumerken at gmail.com; Zhou1, Tao <Tao.Zhou1 at amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>
Subject: [PATCH v4 3/3] dmr/amdgpu: Add system auto reboot to RAS.
In case of RAS error allow user configure auto system reboot through ras_ctrl.
This is also part of the temproray work around for the RAS hang problem.
v4: Use latest kernel API for disk sync.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 ++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9 ++++++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 +
3 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index be52318..69e0820 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -64,6 +64,8 @@
#include "amdgpu_ras.h"
#include "amdgpu_pmu.h"
+#include <linux/suspend.h>
+
MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -3760,6 +3762,18 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
int i, r = 0;
bool in_ras_intr = amdgpu_ras_intr_triggered();
+ /*
+ * Flush RAM to disk so that after reboot
+ * the user can read log and see why the system rebooted.
+ */
+ if (in_ras_intr && amdgpu_ras_get_context(adev)->reboot) {
+
+ DRM_WARN("Emergency reboot.");
+
+ ksys_sync_helper();
+ emergency_restart();
+ }
+
need_full_reset = job_signaled = false;
INIT_LIST_HEAD(&device_list);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 1cc34de..a8eecb8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -154,6 +154,8 @@ static int amdgpu_ras_debugfs_ctrl_parse_data(struct file *f,
op = 1;
else if (sscanf(str, "inject %32s %8s", block_name, err) == 2)
op = 2;
+ else if (sscanf(str, "reboot %32s", block_name) == 1)
+ op = 3;
else if (str[0] && str[1] && str[2] && str[3])
/* ascii string, but commands are not matched. */
return -EINVAL;
@@ -287,6 +289,9 @@ static ssize_t amdgpu_ras_debugfs_ctrl_write(struct file *f, const char __user *
/* data.inject.address is offset instead of absolute gpu address */
ret = amdgpu_ras_error_inject(adev, &data.inject);
break;
+ case 3:
+ amdgpu_ras_get_context(adev)->reboot = true;
+ break;
default:
ret = -EINVAL;
break;
@@ -1733,6 +1738,8 @@ int amdgpu_ras_fini(struct amdgpu_device *adev) void amdgpu_ras_global_ras_isr(struct amdgpu_device *adev) {
if (atomic_cmpxchg(&amdgpu_ras_in_intr, 0, 1) == 0) {
- DRM_WARN("RAS event of type ERREVENT_ATHUB_INTERRUPT detected! Stopping all GPU jobs.\n");
+ DRM_WARN("RAS event of type ERREVENT_ATHUB_INTERRUPT detected!\n");
+
+ amdgpu_ras_reset_gpu(adev, false);
}
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
index 3ec2a87..a83ec99 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
@@ -333,6 +333,7 @@ struct amdgpu_ras {
struct mutex recovery_lock;
uint32_t flags;
+ bool reboot;
};
struct ras_fs_data {
--
2.7.4
_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list