[PATCH] drm/amd/pm: Ignore initial value in smu response register

Lazar, Lijo Lijo.Lazar at amd.com
Mon Jul 8 10:09:37 UTC 2024


[Public]

One problem is it's also bypassing a valid 0 response which usually means FW may not have completed processing the previous message.

What I thought was is it shouldn't even attempt sending a message if it identified a FW hang.

Is there a possibility to have the same problem whenever there is SRIOV full access - as in before/after reset etc.?

If state == FW_INIT, ignore response state before sending the message.
If there is no expected response to a message, make the state to FW_HANG. This part is tricky as what qualifies as a FW hang could change based on the specific SOC's message. Avoiding bool for this reason; to keep it open for having other FW states.
If state == FW_HANG don't even attempt to send the message.

Move FW state to FW_INIT whenever there is init/resume sequence - hw_init/hw_resume?

Thanks,
Lijo
-----Original Message-----
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Danijel Slivka
Sent: Monday, July 8, 2024 1:37 PM
To: amd-gfx at lists.freedesktop.org
Cc: Slivka, Danijel <Danijel.Slivka at amd.com>
Subject: [PATCH] drm/amd/pm: Ignore initial value in smu response register

Why:
If the reg mmMP1_SMN_C2PMSG_90 is being written to during amdgpu driver load or driver unload, subsequent amdgpu driver load will fail at smu_hw_init. The default of mmMP1_SMN_C2PMSG_90 register at a clean environment is 0x1 and if value differs from expected, amdgpu driver load will fail.

How to fix:
Ignore the initial value in smu response register before the first smu message is sent, proceed further to send the message. If register holds
0x0 or an unexpected value after smu message was sent set fw_state_hang flag and no further smu messages will be sent.

Signed-off-by: Danijel Slivka <danijel.slivka at amd.com>
---
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 1 +
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c        | 7 +++++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index a34c802f52be..bfe08fa0db6d 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -562,6 +562,7 @@ struct smu_context {
        uint32_t smc_fw_if_version;
        uint32_t smc_fw_version;
        uint32_t smc_fw_caps;
+       bool smc_fw_state_hang;

        bool uploading_custom_pp_table;
        bool dc_controlled_by_gpio;
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index 5592fd825aa3..9e4e62dcbee7 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -421,7 +421,7 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
        if (poll) {
                reg = __smu_cmn_poll_stat(smu);
                res = __smu_cmn_reg2errno(smu, reg);
-               if (reg == SMU_RESP_NONE || res == -EREMOTEIO) {
+               if ((reg == SMU_RESP_NONE || res == -EREMOTEIO) &&
+smu->smc_fw_state_hang) {
                        __smu_cmn_reg_print_error(smu, reg, index, param, msg);
                        goto Out;
                }
@@ -429,8 +429,11 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
        __smu_cmn_send_msg(smu, (uint16_t) index, param);
        reg = __smu_cmn_poll_stat(smu);
        res = __smu_cmn_reg2errno(smu, reg);
-       if (res != 0)
+       if (res != 0) {
+               if (reg == SMU_RESP_NONE || res == -EREMOTEIO)
+                       smu->smc_fw_state_hang = true;
                __smu_cmn_reg_print_error(smu, reg, index, param, msg);
+       }
        if (read_arg) {
                smu_cmn_read_arg(smu, read_arg);
                dev_dbg(adev->dev, "smu send message: %s(%d) param: 0x%08x, resp: 0x%08x,\
--
2.34.1



More information about the amd-gfx mailing list