[PATCH] drm/amd/pm: Send message when resp status is 0xFC

Quan, Evan Evan.Quan at amd.com
Fri Feb 25 13:03:04 UTC 2022


[AMD Official Use Only]



> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar at amd.com>
> Sent: Friday, February 25, 2022 3:43 PM
> To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Deucher, Alexander
> <Alexander.Deucher at amd.com>; Wang, Yang(Kevin)
> <KevinYang.Wang at amd.com>
> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
> 
> 
> 
> On 2/25/2022 1:02 PM, Quan, Evan wrote:
> > [AMD Official Use Only]
> >
> >
> >
> >> -----Original Message-----
> >> From: Lazar, Lijo <Lijo.Lazar at amd.com>
> >> Sent: Friday, February 25, 2022 2:03 PM
> >> To: Quan, Evan <Evan.Quan at amd.com>; amd-gfx at lists.freedesktop.org
> >> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Deucher, Alexander
> >> <Alexander.Deucher at amd.com>; Wang, Yang(Kevin)
> >> <KevinYang.Wang at amd.com>
> >> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is
> >> 0xFC
> >>
> >>
> >>
> >> On 2/25/2022 11:25 AM, Quan, Evan wrote:
> >>> [AMD Official Use Only]
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
> >>>> Sent: Friday, February 25, 2022 1:47 PM
> >>>> To: Quan, Evan <Evan.Quan at amd.com>; amd-
> gfx at lists.freedesktop.org
> >>>> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Deucher, Alexander
> >>>> <Alexander.Deucher at amd.com>; Wang, Yang(Kevin)
> >>>> <KevinYang.Wang at amd.com>
> >>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is
> >>>> 0xFC
> >>>>
> >>>> [AMD Official Use Only]
> >>>>
> >>>>> That is the caller can perform something like issuing the same
> >>>>> message again without prerequisites check on PMFW busy
> >>>>
> >>>> This patch expects this method. Caller may try to resend message
> >>>> again. As part of message sending, driver first checks response
> >>>> register. Current logic blocks sending any message if it sees 0xFC
> >>>> in response register, this patch is to address that.
> >>> [Quan, Evan] Yes, I know. But the caller here could be another one.
> >>> I mean
> >> there may be another caller stepped in.
> >>>
> >>
> >> That shouldn't cause an issue to the second caller if it got message mutex.
> >> The second caller also should be able to send message if PMFW got
> >> free by that time. The first caller can retry when it gets back the
> >> message mutex. FW doesn't maintain any state for 0xFC response. Any
> >> other message may be sent after that. If driver keeps the state based
> >> on two callers, that is a logic problem in driver. I don't think we have any
> flow like that.
> > [Quan, Evan] Yeah, but there may be some case that messages issued by
> the two callers have dependence.
> > That means the message issued by the 2nd caller should be only after the
> 1st one.
> > The one i can think of is "EnableAllSmuFeatures" message should be after
> "SetAllowedFeatures" message.
> > Although that should not cause any problem, I'm not sure whether there is
> other similar case.
> >
> > What I suggest is something like below. We just do it again in
> smu_cmn_send_smc_msg_with_param() on PMFW busy.
> >
> > int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
> > 				    enum smu_message_type msg,
> > 				    uint32_t param,
> > 				    uint32_t *read_arg)
> > {
> > ...
> > ...
> > 	mutex_lock(&smu->message_lock);
> > 	reg = __smu_cmn_poll_stat(smu);
> > 	res = __smu_cmn_reg2errno(smu, reg);
> > 	if (reg == SMU_RESP_NONE ||
> > 	    reg == SMU_RESP_BUSY_OTHER ||
> > 	    res == -EREMOTEIO) {
> > 		__smu_cmn_reg_print_error(smu, reg, index, param, msg);
> > 		goto Out;
> > 	}
> > +retry:
> > 	__smu_cmn_send_msg(smu, (uint16_t) index, param);
> > 	reg = __smu_cmn_poll_stat(smu);
> > 	res = __smu_cmn_reg2errno(smu, reg);
> > +	if (reg == SMU_RESP_BUSY_OTHER) {
> > +		mdelay(1);
> > +		goto retry;
> > +	}
> 
> I suppose the retry option should be left to caller. Regardless of retry or not,
> the patch is still valid.
> 
> Example situation -
> 
> rocm-smi is trying to get metrics and another app is trying set performance
> profile. If metrics fetch fail and even retry of metrics fetch fail after some
> loops, rocm-smi is free to fetch the metrics again after say 5s. That also
> shouldn't prevent the second app to send performance profile message and
> that app also may retry the same later.
[Quan, Evan] I have no doubt the second app may still be able to send performance profile message.
However, the metrics data retrieved by rocm-smi will be different under such scenario.
That is after performance profile setting the clock frequencies may go up.
If the first app is sensitive to that(suppose it wants to compare the clock frequencies before and after performance profile setting), that will be a problem.

I reconsider this a bit. Can we add one more parameter for smu_cmn_send_smc_msg_with_param()?
That parameter can tell what the caller wants(retry or abort) under PMFW busy scenario.

BR
Evan
> 
> Thanks,
> Lijo
> 
> > ...
> > ...
> > }
> >
> > BR
> > Evan
> >>
> >> Basically, 0xFC is not valid pre-condition check for sending any
> >> message. As per PMFW team - it only means that PMFW was busy when a
> >> previous message was sent and PMFW won't change the response status
> >> when it becomes free.
> >>
> >> Thanks,
> >> Lijo
> >>
> >>> BR
> >>> Evan
> >>>>
> >>>> Thanks,
> >>>> Lijo
> >>>>
> >>>> -----Original Message-----
> >>>> From: Quan, Evan <Evan.Quan at amd.com>
> >>>> Sent: Friday, February 25, 2022 11:07 AM
> >>>> To: Lazar, Lijo <Lijo.Lazar at amd.com>; amd-gfx at lists.freedesktop.org
> >>>> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Deucher, Alexander
> >>>> <Alexander.Deucher at amd.com>; Wang, Yang(Kevin)
> >>>> <KevinYang.Wang at amd.com>
> >>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is
> >>>> 0xFC
> >>>>
> >>>> [AMD Official Use Only]
> >>>>
> >>>> This may introduce some problems for two callers scenarios. That is
> >>>> the 2nd one will still proceed even if the 1st one was already blocked.
> >>>> Maybe the logics here should be performed by the caller. That is
> >>>> the caller can perform something like issuing the same message
> >>>> again without prerequisites check on PMFW busy.
> >>>> Or we can just update the smu_cmn_send_smc_msg APIs to give it
> >>>> another try on PMFW busy.
> >>>>
> >>>> BR
> >>>> Evan
> >>>>> -----Original Message-----
> >>>>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
> >>>>> Sent: Friday, February 25, 2022 12:22 PM
> >>>>> To: amd-gfx at lists.freedesktop.org
> >>>>> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Deucher,
> Alexander
> >>>>> <Alexander.Deucher at amd.com>; Wang, Yang(Kevin)
> >>>>> <KevinYang.Wang at amd.com>; Quan, Evan <Evan.Quan at amd.com>
> >>>>> Subject: [PATCH] drm/amd/pm: Send message when resp status is
> 0xFC
> >>>>>
> >>>>> When PMFW is really busy, it will respond with 0xFC. However, it
> >>>>> doesn't change the response register state when it becomes free.
> >>>>> Driver should retry and proceed to send message if the response
> >>>>> status is
> >>>> 0xFC.
> >>>>>
> >>>>> Signed-off-by: Lijo Lazar <lijo.lazar at amd.com>
> >>>>> ---
> >>>>>    drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 --
> >>>>>    1 file changed, 2 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>>>> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>>>> index 590a6ed12d54..92161b9d8c1a 100644
> >>>>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>>>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>>>> @@ -297,7 +297,6 @@ int
> smu_cmn_send_msg_without_waiting(struct
> >>>>> smu_context *smu,
> >>>>>    	reg = __smu_cmn_poll_stat(smu);
> >>>>>    	res = __smu_cmn_reg2errno(smu, reg);
> >>>>>    	if (reg == SMU_RESP_NONE ||
> >>>>> -	    reg == SMU_RESP_BUSY_OTHER ||
> >>>>>    	    res == -EREMOTEIO)
> >>>>>    		goto Out;
> >>>>>    	__smu_cmn_send_msg(smu, msg_index, param); @@ -
> 391,7 +390,6
> >>>> @@ int
> >>>>> smu_cmn_send_smc_msg_with_param(struct
> >>>>> smu_context *smu,
> >>>>>    	reg = __smu_cmn_poll_stat(smu);
> >>>>>    	res = __smu_cmn_reg2errno(smu, reg);
> >>>>>    	if (reg == SMU_RESP_NONE ||
> >>>>> -	    reg == SMU_RESP_BUSY_OTHER ||
> >>>>>    	    res == -EREMOTEIO) {
> >>>>>    		__smu_cmn_reg_print_error(smu, reg, index, param,
> msg);
> >>>>>    		goto Out;
> >>>>> --
> >>>>> 2.25.1


More information about the amd-gfx mailing list