[PATCH] drm/msm: Disable frequency clamping on a630

Sun Aug 8 14:32:59 UTC 2021


On 07/08/2021 21:04, Rob Clark wrote:
> On Sat, Aug 7, 2021 at 12:21 PM Caleb Connolly
> <caleb.connolly at linaro.org> wrote:
>>
>> Hi Rob, Akhil,
>>
>> On 29/07/2021 21:53, Rob Clark wrote:
>>> On Thu, Jul 29, 2021 at 1:28 PM Caleb Connolly
>>> <caleb.connolly at linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 29/07/2021 21:24, Rob Clark wrote:
>>>>> On Thu, Jul 29, 2021 at 1:06 PM Caleb Connolly
>>>>> <caleb.connolly at linaro.org> wrote:
>>>>>>
>>>>>> Hi Rob,
>>>>>>
>>>>>> I've done some more testing! It looks like before that patch ("drm/msm: Devfreq tuning") the GPU would never get above
>>>>>> the second frequency in the OPP table (342MHz) (at least, not in glxgears). With the patch applied it would more
>>>>>> aggressively jump up to the max frequency which seems to be unstable at the default regulator voltages.
>>>>>
>>>>> *ohh*, yeah, ok, that would explain it
>>>>>
>>>>>> Hacking the pm8005 s1 regulator (which provides VDD_GFX) up to 0.988v (instead of the stock 0.516v) makes the GPU stable
>>>>>> at the higher frequencies.
>>>>>>
>>>>>> Applying this patch reverts the behaviour, and the GPU never goes above 342MHz in glxgears, losing ~30% performance in
>>>>>> glxgear.
>>>>>>
>>>>>> I think (?) that enabling CPR support would be the proper solution to this - that would ensure that the regulators run
>>>>>> at the voltage the hardware needs to be stable.
>>>>>>
>>>>>> Is hacking the voltage higher (although ideally not quite that high) an acceptable short term solution until we have
>>>>>> CPR? Or would it be safer to just not make use of the higher frequencies on a630 for now?
>>>>>>
>>>>>
>>>>> tbh, I'm not sure about the regulator stuff and CPR.. Bjorn is already
>>>>> on CC and I added sboyd, maybe one of them knows better.
>>>>>
>>>>> In the short term, removing the higher problematic OPPs from dts might
>>>>> be a better option than this patch (which I'm dropping), since there
>>>>> is nothing stopping other workloads from hitting higher OPPs.
>>>> Oh yeah that sounds like a more sensible workaround than mine .
>>>>>
>>>>> I'm slightly curious why I didn't have problems at higher OPPs on my
>>>>> c630 laptop (sdm850)
>>>> Perhaps you won the sillicon lottery - iirc sdm850 is binned for higher clocks as is out of the factory.
>>>>
>>>> Would it be best to drop the OPPs for all devices? Or just those affected? I guess it's possible another c630 might
>>>> crash where yours doesn't?
>>>
>>> I've not heard any reports of similar issues from the handful of other
>>> folks with c630's on #aarch64-laptops.. but I can't really say if that
>>> is luck or not.
>> It looks like this affects at least the OnePlus 6 and PocoPhone F1, I've done some more poking and the following diff
>> seems to fix the stability issues completely, it seems the delay is required to let the update propagate.
>>
>> This doesn't feel like the right fix, but hopefully it's enough to come up with a better solution than disabling the new
>> devfreq behaviour on a630.
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> index d7cec7f0dde0..69e2a5e84dae 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> @@ -139,6 +139,10 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
>>                   return;
>>           }
>>
>> +       dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
>> +
>> +       usleep_range(300, 500);
>> +
> 
> Hmm, this is going to be in the critical path on idle -> active
> transition (ie. think response time to user-input).. so we defn don't
> want to do this unconditionally..
> 
> If I understand the problem, we just want to limit how far we jump the
> gpu freq in one go.. maybe deleting the lowest (and perhaps highest)
> OPP would accomplish that?  Could that be done in the board(s)'s
> toplevel dts files?
That would be a workaround, however I'd really like to avoid limiting performance as a solution if I can help it, 
especially as the fix might just be "set the opp first, wait for it to apply, then set the core clock".

Is there a sensible way to get a callback from the opp notify chain? Or from rpmh directly? Or is this solution really 
not the right way to go?
> 
> BR,
> -R
> 
>>           gmu_write(gmu, REG_A6XX_GMU_DCVS_ACK_OPTION, 0);
>>
>>           gmu_write(gmu, REG_A6XX_GMU_DCVS_PERF_SETTING,
>> @@ -158,7 +162,6 @@ void a6xx_gmu_set_freq(struct msm_gpu *gpu, struct dev_pm_opp *opp)
>>           if (ret)
>>                   dev_err(gmu->dev, "GMU set GPU frequency error: %d\n", ret);
>>
>> -       dev_pm_opp_set_opp(&gpu->pdev->dev, opp);
>>           pm_runtime_put(gmu->dev);
>>    }
>>>
>>> Maybe just remove it for affected devices?  But I'll defer to Bjorn.
>>>
>>> BR,
>>> -R
>>>
>>
>> --
>> Kind Regards,
>> Caleb (they/them)

-- 
Kind Regards,
Caleb (they/them)