[PATCH] Series to disallow XGMI link power down during RAS XGMI error injection

Clements, John John.Clements at amd.com
Fri May 8 09:35:45 UTC 2020

[AMD Public Use]

Responses inline

From: Chen, Guchun <Guchun.Chen at amd.com>
Sent: Friday, May 8, 2020 3:11 PM
To: Clements, John <John.Clements at amd.com>; amd-gfx at lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang at amd.com>
Subject: RE: [PATCH] Series to disallow XGMI link power down during RAS XGMI error injection

[AMD Public Use]

Patch 1,
1. maybe it's better to split it into 2, one is for header file change, and another is functional patch.
2.  Can below codes be merged to one return instead? Variable 'en' can serve the function input in smu_send_smc_msg_with_param.
[JC] I think what you are suggesting would probably work, but my concern is that it is not immediate clear to me if bool ec will compile to 1. The spec for the SMU cmd defines 1 to enable and 0 to disable
if (en)
+                           return smu_send_smc_msg_with_param(smu,
+                                                                                       SMU_MSG_GmiPwrDnControl,
+                                                                                       1,
+                                                                                       NULL);
+            return smu_send_smc_msg_with_param(smu,
+                                                                         SMU_MSG_GmiPwrDnControl,
+                                                                         0,
+                                                                         NULL);

Patch 2: Reviewed-by: Guchun Chen guchun.chen at amd.com<mailto:guchun.chen at amd.com>

Patch 3:
1.function amdgpu_ras_error_inject_xgmi should be one static function? [JC] That makes sense, I'll update function to be static
2. Regarding below return case, is there one dpm_allow_xgmi_power_down leak? As we have disabled xgmi link power down.
[JC] I intentionally did this, when the RAS int is triggered I don't want to create any races conditions to access the SMU to perform functions and entering BACO reset, the XGMI per link settings will be restored anyway after the BACO reset.
[JC] That being said I understand that this looks like it could be a bug, I'll add comments there to clarify the sequence
+            if (amdgpu_ras_intr_triggered())
+                           return ret;


From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org<mailto:amd-gfx-bounces at lists.freedesktop.org>> On Behalf Of Clements, John
Sent: Friday, May 8, 2020 2:51 PM
To: amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>; Zhang, Hawking <Hawking.Zhang at amd.com<mailto:Hawking.Zhang at amd.com>>
Subject: [PATCH] Series to disallow XGMI link power down during RAS XGMI error injection

[AMD Official Use Only - Internal Distribution Only]

Submitting 3 patches for review:

  *   Added host to SMU FW cmd to enable/disable XGMI link power down
  *   Added DPM function for XGMI link power down control
  *   Disable XGMI link power down prior to issuing a XGMI RAS error
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20200508/b44a57d7/attachment-0001.htm>

More information about the amd-gfx mailing list