[PATCH 3/3] drm/amdgpu: skip umc ras irq handling in poison mode (v2)
Zhang, Hawking
Hawking.Zhang at amd.com
Wed Sep 22 11:41:35 UTC 2021
[AMD Official Use Only]
Let's replace " RAS poison mode " with "poison is created, no user action is needed" other than that, the patch is
Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>
Regards,
Hawking
-----Original Message-----
From: Zhou1, Tao <Tao.Zhou1 at amd.com>
Sent: Wednesday, September 22, 2021 18:33
To: amd-gfx at lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang at amd.com>; Clements, John <John.Clements at amd.com>; Yang, Stanley <Stanley.Yang at amd.com>
Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
Subject: [PATCH 3/3] drm/amdgpu: skip umc ras irq handling in poison mode (v2)
In ras poison mode, umc uncorrectable error will be ignored until the corrupted data consumed by another ras module (such as gfx, sdma).
v2: simplify the debug message and replace dev_warn with dev_info.
Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 33 ++++++++++++++-----------
1 file changed, 19 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 5b362e944541..6fad3e1b8c94 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1544,22 +1544,27 @@ static void amdgpu_ras_interrupt_handler(struct ras_manager *obj)
data->rptr = (data->aligned_element_size +
data->rptr) % data->ring_size;
- /* Let IP handle its data, maybe we need get the output
- * from the callback to udpate the error type/count, etc
- */
if (data->cb) {
- ret = data->cb(obj->adev, &err_data, &entry);
- /* ue will trigger an interrupt, and in that case
- * we need do a reset to recovery the whole system.
- * But leave IP do that recovery, here we just dispatch
- * the error.
- */
- if (ret == AMDGPU_RAS_SUCCESS) {
- /* these counts could be left as 0 if
- * some blocks do not count error number
+ if (amdgpu_ras_is_poison_supported(obj->adev) &&
+ obj->head.block == AMDGPU_RAS_BLOCK__UMC)
+ dev_info(obj->adev->dev, "RAS poison mode\n");
+ else {
+ /* Let IP handle its data, maybe we need get the output
+ * from the callback to udpate the error type/count, etc
+ */
+ ret = data->cb(obj->adev, &err_data, &entry);
+ /* ue will trigger an interrupt, and in that case
+ * we need do a reset to recovery the whole system.
+ * But leave IP do that recovery, here we just dispatch
+ * the error.
*/
- obj->err_data.ue_count += err_data.ue_count;
- obj->err_data.ce_count += err_data.ce_count;
+ if (ret == AMDGPU_RAS_SUCCESS) {
+ /* these counts could be left as 0 if
+ * some blocks do not count error number
+ */
+ obj->err_data.ue_count += err_data.ue_count;
+ obj->err_data.ce_count += err_data.ce_count;
+ }
}
}
}
--
2.17.1
More information about the amd-gfx
mailing list