[PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query

Clements, John John.Clements at amd.com
Fri Jan 3 03:18:41 UTC 2020


[AMD Public Use]

Hello GuChun,

Good point, it makes sense to make function static inline here, I think I shall also rename the function from  get_umc_reg_offset  to  get_umc_6_reg_offset.

Thank you,
John Clements

From: Chen, Guchun <Guchun.Chen at amd.com>
Sent: Friday, January 3, 2020 11:09 AM
To: Clements, John <John.Clements at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>; amd-gfx at lists.freedesktop.org; Zhou1, Tao <Tao.Zhou1 at amd.com>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

Yes, John, that concern is cleared after I look into the code.

One more issue is, it's better that function get_umc_reg_offset is one static inline function? With this problem fixed, the patch is: Reviewed-by: Guchun Chen <guchun.chen at amd.com<mailto:guchun.chen at amd.com>>

uint32_t get_umc_reg_offset(struct amdgpu_device *adev,
+                                             uint32_t umc_inst,
+                                             uint32_t ch_inst)

Regards,
Guchun

From: Clements, John <John.Clements at amd.com<mailto:John.Clements at amd.com>>
Sent: Friday, January 3, 2020 10:58 AM
To: Chen, Guchun <Guchun.Chen at amd.com<mailto:Guchun.Chen at amd.com>>; Zhang, Hawking <Hawking.Zhang at amd.com<mailto:Hawking.Zhang at amd.com>>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>; Zhou1, Tao <Tao.Zhou1 at amd.com<mailto:Tao.Zhou1 at amd.com>>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

Hello GuChun/Hawking,

Thank you for your feedback, I have updated the patch with the following amendments:

  *   Remove +#define UMC_REG_OFFSET (I forgot to remove this in original patch, I prefer the function over the macro)
  *   Updated the coding style of the braces in the for loops to have the starting brace on the same line as the for loop declaration

GuChun,
For your concern about the umc_v6_1_query_ras_error_count, in the UE/CE error counter register reading, the local SW error counters can only be incremented and not cleared throughout the iteration over the UMC error counter registers.

Thank you,
John Clements

From: Chen, Guchun <Guchun.Chen at amd.com<mailto:Guchun.Chen at amd.com>>
Sent: Friday, January 3, 2020 9:07 AM
To: Zhang, Hawking <Hawking.Zhang at amd.com<mailto:Hawking.Zhang at amd.com>>; Clements, John <John.Clements at amd.com<mailto:John.Clements at amd.com>>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>; Zhou1, Tao <Tao.Zhou1 at amd.com<mailto:Tao.Zhou1 at amd.com>>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Public Use]

+#define UMC_REG_OFFSET(adev, ch_inst, umc_inst) ((adev)->umc.channel_offs * (ch_inst) + UMC_6_INST_DIST*(umc_inst))
Coding style problem, miss blank space around last "*".

+            for (umc_inst = 0; umc_inst < adev->umc.umc_inst_num; umc_inst++)
+            {
Another coding style problem. "{" should follow closely at the same line, not starting at one new line.

Thirdly, in umc_v6_1_query_ras_error_count, we use dual loops for query error counter for all UMC channels. But we always use the same variable to do the query. So the value will be overwritten by new one? Then we will miss former error counters if there are. Correct?

Regards,
Guchun

From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org<mailto:amd-gfx-bounces at lists.freedesktop.org>> On Behalf Of Zhang, Hawking
Sent: Thursday, January 2, 2020 8:38 PM
To: Clements, John <John.Clements at amd.com<mailto:John.Clements at amd.com>>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>; Zhou1, Tao <Tao.Zhou1 at amd.com<mailto:Tao.Zhou1 at amd.com>>
Subject: RE: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

UMC_REG_OFFSET(adev, ch_inst, umc_inst) and the function get_umc_reg_offset actually do the same thing? I guess you just want to keep either of them, right?

Regards,
Hawking

From: Clements, John <John.Clements at amd.com<mailto:John.Clements at amd.com>>
Sent: Thursday, January 2, 2020 18:31
To: amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>; Zhang, Hawking <Hawking.Zhang at amd.com<mailto:Hawking.Zhang at amd.com>>; Zhou1, Tao <Tao.Zhou1 at amd.com<mailto:Tao.Zhou1 at amd.com>>
Subject: [PATCH] drm/amdgpu: resolved bug in UMC 6 error counter query


[AMD Official Use Only - Internal Distribution Only]

Added patch to resolve following issue where error counter detection was not iterating over all UMC instances/channels.
Removed support for accessing UMC error counters via MMIO.

Thank you,
John Clements
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20200103/2e16a1c3/attachment.htm>


More information about the amd-gfx mailing list