[PATCH 3/3] drm/amdgpu: skip umc ras irq handling in poison mode (v2)
Tao Zhou
tao.zhou1 at amd.com
Wed Sep 22 10:32:48 UTC 2021
In ras poison mode, umc uncorrectable error will be ignored until
the corrupted data consumed by another ras module (such as gfx, sdma).
v2: simplify the debug message and replace dev_warn with dev_info.
Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 33 ++++++++++++++-----------
1 file changed, 19 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 5b362e944541..6fad3e1b8c94 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -1544,22 +1544,27 @@ static void amdgpu_ras_interrupt_handler(struct ras_manager *obj)
data->rptr = (data->aligned_element_size +
data->rptr) % data->ring_size;
- /* Let IP handle its data, maybe we need get the output
- * from the callback to udpate the error type/count, etc
- */
if (data->cb) {
- ret = data->cb(obj->adev, &err_data, &entry);
- /* ue will trigger an interrupt, and in that case
- * we need do a reset to recovery the whole system.
- * But leave IP do that recovery, here we just dispatch
- * the error.
- */
- if (ret == AMDGPU_RAS_SUCCESS) {
- /* these counts could be left as 0 if
- * some blocks do not count error number
+ if (amdgpu_ras_is_poison_supported(obj->adev) &&
+ obj->head.block == AMDGPU_RAS_BLOCK__UMC)
+ dev_info(obj->adev->dev, "RAS poison mode\n");
+ else {
+ /* Let IP handle its data, maybe we need get the output
+ * from the callback to udpate the error type/count, etc
+ */
+ ret = data->cb(obj->adev, &err_data, &entry);
+ /* ue will trigger an interrupt, and in that case
+ * we need do a reset to recovery the whole system.
+ * But leave IP do that recovery, here we just dispatch
+ * the error.
*/
- obj->err_data.ue_count += err_data.ue_count;
- obj->err_data.ce_count += err_data.ce_count;
+ if (ret == AMDGPU_RAS_SUCCESS) {
+ /* these counts could be left as 0 if
+ * some blocks do not count error number
+ */
+ obj->err_data.ue_count += err_data.ue_count;
+ obj->err_data.ce_count += err_data.ce_count;
+ }
}
}
}
--
2.17.1
More information about the amd-gfx
mailing list