[PATCH] drm/amd/pm: Ignore initial value in smu response register

Slivka, Danijel Danijel.Slivka at amd.com
Mon Jul 8 13:23:13 UTC 2024


[Public]

>-----Original Message-----
>From: Lazar, Lijo <Lijo.Lazar at amd.com>
>Sent: Monday, July 8, 2024 12:13 PM
>To: Slivka, Danijel <Danijel.Slivka at amd.com>; amd-gfx at lists.freedesktop.org
>Cc: Slivka, Danijel <Danijel.Slivka at amd.com>
>Subject: RE: [PATCH] drm/amd/pm: Ignore initial value in smu response register
>
>[Public]
>
>One problem is it's also bypassing a valid 0 response which usually means FW
>may not have completed processing the previous message.
>

Bypassing zero value is added as 0 could represent garbage value.
But adding HW_INIT state as initial state and ignoring any value when HW_INIT state is set.

>What I thought was is it shouldn't even attempt sending a message if it
>identified a FW hang.
>
>Is there a possibility to have the same problem whenever there is SRIOV full
>access - as in before/after reset etc.?

Yes, this could occur every time when VF is in full access.

>
>If state == FW_INIT, ignore response state before sending the message.
>If there is no expected response to a message, make the state to FW_HANG.
>This part is tricky as what qualifies as a FW hang could change based on the
>specific SOC's message. Avoiding bool for this reason; to keep it open for having
>other FW states.
>If state == FW_HANG don't even attempt to send the message.
>
>Move FW state to FW_INIT whenever there is init/resume sequence -
>hw_init/hw_resume?
>

Applied the suggestion and sent out new patch v2

[PATCH v2] drm/amd/pm: Ignore initial value in smu response register

Thanks,
BR,
Danijel Slivka

>Thanks,
>Lijo
>-----Original Message-----
>From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Danijel
>Slivka
>Sent: Monday, July 8, 2024 1:37 PM
>To: amd-gfx at lists.freedesktop.org
>Cc: Slivka, Danijel <Danijel.Slivka at amd.com>
>Subject: [PATCH] drm/amd/pm: Ignore initial value in smu response register
>
>Why:
>If the reg mmMP1_SMN_C2PMSG_90 is being written to during amdgpu driver
>load or driver unload, subsequent amdgpu driver load will fail at smu_hw_init.
>The default of mmMP1_SMN_C2PMSG_90 register at a clean environment is
>0x1 and if value differs from expected, amdgpu driver load will fail.
>
>How to fix:
>Ignore the initial value in smu response register before the first smu message is
>sent, proceed further to send the message. If register holds
>0x0 or an unexpected value after smu message was sent set fw_state_hang flag
>and no further smu messages will be sent.
>
>Signed-off-by: Danijel Slivka <danijel.slivka at amd.com>
>---
> drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h | 1 +
> drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c        | 7 +++++--
> 2 files changed, 6 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
>b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
>index a34c802f52be..bfe08fa0db6d 100644
>--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
>+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
>@@ -562,6 +562,7 @@ struct smu_context {
>        uint32_t smc_fw_if_version;
>        uint32_t smc_fw_version;
>        uint32_t smc_fw_caps;
>+       bool smc_fw_state_hang;
>
>        bool uploading_custom_pp_table;
>        bool dc_controlled_by_gpio;
>diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>index 5592fd825aa3..9e4e62dcbee7 100644
>--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>@@ -421,7 +421,7 @@ int smu_cmn_send_smc_msg_with_param(struct
>smu_context *smu,
>        if (poll) {
>                reg = __smu_cmn_poll_stat(smu);
>                res = __smu_cmn_reg2errno(smu, reg);
>-               if (reg == SMU_RESP_NONE || res == -EREMOTEIO) {
>+               if ((reg == SMU_RESP_NONE || res == -EREMOTEIO) &&
>+smu->smc_fw_state_hang) {
>                        __smu_cmn_reg_print_error(smu, reg, index, param, msg);
>                        goto Out;
>                }
>@@ -429,8 +429,11 @@ int smu_cmn_send_smc_msg_with_param(struct
>smu_context *smu,
>        __smu_cmn_send_msg(smu, (uint16_t) index, param);
>        reg = __smu_cmn_poll_stat(smu);
>        res = __smu_cmn_reg2errno(smu, reg);
>-       if (res != 0)
>+       if (res != 0) {
>+               if (reg == SMU_RESP_NONE || res == -EREMOTEIO)
>+                       smu->smc_fw_state_hang = true;
>                __smu_cmn_reg_print_error(smu, reg, index, param, msg);
>+       }
>        if (read_arg) {
>                smu_cmn_read_arg(smu, read_arg);
>                dev_dbg(adev->dev, "smu send message: %s(%d) param: 0x%08x,
>resp: 0x%08x,\
>--
>2.34.1
>



More information about the amd-gfx mailing list