[PATCH] drm/amd/pm: Send message when resp status is 0xFC
Lazar, Lijo
Lijo.Lazar at amd.com
Tue Mar 8 05:58:05 UTC 2022
[Public]
Hi Evan,
Any comments on this?
Thanks,
Lijo
-----Original Message-----
From: Lazar, Lijo
Sent: Friday, February 25, 2022 7:30 PM
To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Wang, Yang(Kevin) <KevinYang.Wang at amd.com>
Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
On 2/25/2022 6:33 PM, Quan, Evan wrote:
> [AMD Official Use Only]
>
>
>
>> -----Original Message-----
>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>> Sent: Friday, February 25, 2022 3:43 PM
>> To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org
>> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Deucher, Alexander
>> <Alexander.Deucher at amd.com>; Wang, Yang(Kevin)
>> <KevinYang.Wang at amd.com>
>> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is
>> 0xFC
>>
>>
>>
>> On 2/25/2022 1:02 PM, Quan, Evan wrote:
>>> [AMD Official Use Only]
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>>>> Sent: Friday, February 25, 2022 2:03 PM
>>>> To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org
>>>> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Deucher, Alexander
>>>> <Alexander.Deucher at amd.com>; Wang, Yang(Kevin)
>>>> <KevinYang.Wang at amd.com>
>>>> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is
>>>> 0xFC
>>>>
>>>>
>>>>
>>>> On 2/25/2022 11:25 AM, Quan, Evan wrote:
>>>>> [AMD Official Use Only]
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>>>>>> Sent: Friday, February 25, 2022 1:47 PM
>>>>>> To: Quan, Evan <Evan.Quan at amd.com>; amd-
>> gfx at lists.freedesktop.org
>>>>>> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Deucher, Alexander
>>>>>> <Alexander.Deucher at amd.com>; Wang, Yang(Kevin)
>>>>>> <KevinYang.Wang at amd.com>
>>>>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is
>>>>>> 0xFC
>>>>>>
>>>>>> [AMD Official Use Only]
>>>>>>
>>>>>>> That is the caller can perform something like issuing the same
>>>>>>> message again without prerequisites check on PMFW busy
>>>>>>
>>>>>> This patch expects this method. Caller may try to resend message
>>>>>> again. As part of message sending, driver first checks response
>>>>>> register. Current logic blocks sending any message if it sees
>>>>>> 0xFC in response register, this patch is to address that.
>>>>> [Quan, Evan] Yes, I know. But the caller here could be another one.
>>>>> I mean
>>>> there may be another caller stepped in.
>>>>>
>>>>
>>>> That shouldn't cause an issue to the second caller if it got message mutex.
>>>> The second caller also should be able to send message if PMFW got
>>>> free by that time. The first caller can retry when it gets back the
>>>> message mutex. FW doesn't maintain any state for 0xFC response. Any
>>>> other message may be sent after that. If driver keeps the state
>>>> based on two callers, that is a logic problem in driver. I don't
>>>> think we have any
>> flow like that.
>>> [Quan, Evan] Yeah, but there may be some case that messages issued
>>> by
>> the two callers have dependence.
>>> That means the message issued by the 2nd caller should be only after
>>> the
>> 1st one.
>>> The one i can think of is "EnableAllSmuFeatures" message should be
>>> after
>> "SetAllowedFeatures" message.
>>> Although that should not cause any problem, I'm not sure whether
>>> there is
>> other similar case.
>>>
>>> What I suggest is something like below. We just do it again in
>> smu_cmn_send_smc_msg_with_param() on PMFW busy.
>>>
>>> int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
>>> enum smu_message_type msg,
>>> uint32_t param,
>>> uint32_t *read_arg)
>>> {
>>> ...
>>> ...
>>> mutex_lock(&smu->message_lock);
>>> reg = __smu_cmn_poll_stat(smu);
>>> res = __smu_cmn_reg2errno(smu, reg);
>>> if (reg == SMU_RESP_NONE ||
>>> reg == SMU_RESP_BUSY_OTHER ||
>>> res == -EREMOTEIO) {
>>> __smu_cmn_reg_print_error(smu, reg, index, param, msg);
>>> goto Out;
>>> }
>>> +retry:
>>> __smu_cmn_send_msg(smu, (uint16_t) index, param);
>>> reg = __smu_cmn_poll_stat(smu);
>>> res = __smu_cmn_reg2errno(smu, reg);
>>> + if (reg == SMU_RESP_BUSY_OTHER) {
>>> + mdelay(1);
>>> + goto retry;
>>> + }
>>
>> I suppose the retry option should be left to caller. Regardless of
>> retry or not, the patch is still valid.
>>
>> Example situation -
>>
>> rocm-smi is trying to get metrics and another app is trying set
>> performance profile. If metrics fetch fail and even retry of metrics
>> fetch fail after some loops, rocm-smi is free to fetch the metrics
>> again after say 5s. That also shouldn't prevent the second app to
>> send performance profile message and that app also may retry the same later.
> [Quan, Evan] I have no doubt the second app may still be able to send performance profile message.
> However, the metrics data retrieved by rocm-smi will be different
> under such scenario > That is after performance profile setting the
> clock frequencies may
go up.
> If the first app is sensitive to that(suppose it wants to compare the
> clock frequencies before and after performance profile setting), that
> will be a problem
Ah, my intention was to tell about a different use case, probably a bad example.
What I intended is -
rocm-smi is running periodically collecting metrics data and another app which doesn't bother about the background data collection trying to do something else. Thus a metrics fetch fail shouldn't prevent the other app from changing performance profile or whatever it intended to do.
Also, rocm-smi may be able to fetch data during the next polling interval when the PMFW gets relatively free.
>
> I reconsider this a bit. Can we add one more parameter for smu_cmn_send_smc_msg_with_param()?
> That parameter can tell what the caller wants(retry or abort) under PMFW busy scenario.
There is one complication to this. The same message could return 0xFC under different conditions, and it may be fine for situation A but not for situation B. For ex: during driver load or init after a reset, we don't expect PMFW to be busy.Driver may send message to get metrics data to check say 'smart shift is enabled or not'. A failure with 0xFC is not a good sign that time and we should abort there.
The same message sent during runtime (like rocm-smi data collection) may fail with 0xFC due to workload conditions.
Adding another parameter will complicate things as we need to check every condition of message usage. In the present case, a message failure with 0xFC is treated as failure during init and we don't proceed. For a runtime use, apps always have a choice whether to retry the same API or not.
Thanks,
Lijo
>
> BR
> Evan
>>
>> Thanks,
>> Lijo
>>
>>> ...
>>> ...
>>> }
>>>
>>> BR
>>> Evan
>>>>
>>>> Basically, 0xFC is not valid pre-condition check for sending any
>>>> message. As per PMFW team - it only means that PMFW was busy when a
>>>> previous message was sent and PMFW won't change the response status
>>>> when it becomes free.
>>>>
>>>> Thanks,
>>>> Lijo
>>>>
>>>>> BR
>>>>> Evan
>>>>>>
>>>>>> Thanks,
>>>>>> Lijo
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Quan, Evan <Evan.Quan at amd.com>
>>>>>> Sent: Friday, February 25, 2022 11:07 AM
>>>>>> To: Lazar, Lijo <Lijo.Lazar at amd.com>;
>>>>>> amd-gfx at lists.freedesktop.org
>>>>>> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Deucher, Alexander
>>>>>> <Alexander.Deucher at amd.com>; Wang, Yang(Kevin)
>>>>>> <KevinYang.Wang at amd.com>
>>>>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is
>>>>>> 0xFC
>>>>>>
>>>>>> [AMD Official Use Only]
>>>>>>
>>>>>> This may introduce some problems for two callers scenarios. That
>>>>>> is the 2nd one will still proceed even if the 1st one was already blocked.
>>>>>> Maybe the logics here should be performed by the caller. That is
>>>>>> the caller can perform something like issuing the same message
>>>>>> again without prerequisites check on PMFW busy.
>>>>>> Or we can just update the smu_cmn_send_smc_msg APIs to give it
>>>>>> another try on PMFW busy.
>>>>>>
>>>>>> BR
>>>>>> Evan
>>>>>>> -----Original Message-----
>>>>>>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>>>>>>> Sent: Friday, February 25, 2022 12:22 PM
>>>>>>> To: amd-gfx at lists.freedesktop.org
>>>>>>> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Deucher,
>> Alexander
>>>>>>> <Alexander.Deucher at amd.com>; Wang, Yang(Kevin)
>>>>>>> <KevinYang.Wang at amd.com>; Quan, Evan <Evan.Quan at amd.com>
>>>>>>> Subject: [PATCH] drm/amd/pm: Send message when resp status is
>> 0xFC
>>>>>>>
>>>>>>> When PMFW is really busy, it will respond with 0xFC. However, it
>>>>>>> doesn't change the response register state when it becomes free.
>>>>>>> Driver should retry and proceed to send message if the response
>>>>>>> status is
>>>>>> 0xFC.
>>>>>>>
>>>>>>> Signed-off-by: Lijo Lazar <lijo.lazar at amd.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 --
>>>>>>> 1 file changed, 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>>>> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>>>> index 590a6ed12d54..92161b9d8c1a 100644
>>>>>>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>>>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>>>> @@ -297,7 +297,6 @@ int
>> smu_cmn_send_msg_without_waiting(struct
>>>>>>> smu_context *smu,
>>>>>>> reg = __smu_cmn_poll_stat(smu);
>>>>>>> res = __smu_cmn_reg2errno(smu, reg);
>>>>>>> if (reg == SMU_RESP_NONE ||
>>>>>>> - reg == SMU_RESP_BUSY_OTHER ||
>>>>>>> res == -EREMOTEIO)
>>>>>>> goto Out;
>>>>>>> __smu_cmn_send_msg(smu, msg_index, param); @@ -
>> 391,7 +390,6
>>>>>> @@ int
>>>>>>> smu_cmn_send_smc_msg_with_param(struct
>>>>>>> smu_context *smu,
>>>>>>> reg = __smu_cmn_poll_stat(smu);
>>>>>>> res = __smu_cmn_reg2errno(smu, reg);
>>>>>>> if (reg == SMU_RESP_NONE ||
>>>>>>> - reg == SMU_RESP_BUSY_OTHER ||
>>>>>>> res == -EREMOTEIO) {
>>>>>>> __smu_cmn_reg_print_error(smu, reg, index, param,
>> msg);
>>>>>>> goto Out;
>>>>>>> --
>>>>>>> 2.25.1
More information about the amd-gfx
mailing list