[PATCH 0/4] enable umc ras ce interrupt

Chen, Guchun Guchun.Chen at amd.com
Thu Aug 1 08:21:51 UTC 2019


1) Patch 1, looks the return value of our callback always returns UE case, but I assume CE case should also be covered. Maybe it's another topic.
	if (ret == AMDGPU_RAS_UE) {
+		/* these counts could be left as 0 if
+		 * some blocks do not count error number
+		 */
 		obj->err_data.ue_count += err_data.ue_count;
+		obj->err_data.ce_count += err_data.ce_count;

2) In Patch 2, one unused variable "ras_error_status" is there, do we need to remove it?

static void umc_v6_1_ras_init(struct amdgpu_device *adev)  {
+	void *ras_error_status = NULL;
 
+	amdgpu_umc_for_each_channel(umc_v6_1_ras_init_per_channel);
 }

Regards,
Guchun

-----Original Message-----
From: Zhang, Hawking <Hawking.Zhang at amd.com> 
Sent: Thursday, August 1, 2019 3:52 PM
To: Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org; Li, Dennis <Dennis.Li at amd.com>; Chen, Guchun <Guchun.Chen at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>
Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
Subject: RE: [PATCH 0/4] enable umc ras ce interrupt

1.) Please fix the typo in patch #2 description: ec --> ce 2). Patch #2

+	ecc_err_cnt_sel = REG_SET_FIELD(ecc_err_cnt_sel, UMCCH0_0_EccErrCntSel,
+					EccErrInt, 0x1);
For the EccErrInt field, it should be programed to be (MAX - INIT), correct? but the hardcoded value seems not match with the value calculated by those macro. 

Regards,
Hawking
-----Original Message-----
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Tao Zhou
Sent: 2019年8月1日 14:54
To: amd-gfx at lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang at amd.com>; Li, Dennis <Dennis.Li at amd.com>; Chen, Guchun <Guchun.Chen at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>
Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
Subject: [PATCH 0/4] enable umc ras ce interrupt

These patches add support for umc ce interrupt, the interrupt is controlled by a error count threshold.

Tao Zhou (4):
  drm/amdgpu: support ce interrupt in ras module
  drm/amdgpu: implement umc ras init function
  drm/amdgpu: update the calc algorithm of umc ecc error count
  drm/amdgpu: only uncorrectable error needs gpu reset

 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 12 ++++---
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c   |  6 +++-
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.c   | 42 ++++++++++++++++++++++---
 drivers/gpu/drm/amd/amdgpu/umc_v6_1.h   |  7 +++++
 4 files changed, 58 insertions(+), 9 deletions(-)

--
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More information about the amd-gfx mailing list