[PATCH v2 2/5] thermal: devfreq_cooling: get a copy of device status
Lukasz Luba
lukasz.luba at arm.com
Thu Dec 3 15:38:03 UTC 2020
On 12/3/20 1:09 PM, Daniel Lezcano wrote:
> On 18/11/2020 13:03, Lukasz Luba wrote:
>> Devfreq cooling needs to now the correct status of the device in order
>> to operate. Do not rely on Devfreq last_status which might be a stale data
>> and get more up-to-date values of the load.
>>
>> Devfreq framework can change the device status in the background. To
>> mitigate this situation make a copy of the status structure and use it
>> for internal calculations.
>>
>> In addition this patch adds normalization function, which also makes sure
>> that whatever data comes from the device, it is in a sane range.
>>
>> Signed-off-by: Lukasz Luba <lukasz.luba at arm.com>
>> ---
>> drivers/thermal/devfreq_cooling.c | 52 +++++++++++++++++++++++++------
>> 1 file changed, 43 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
>> index 659c0143c9f0..925523694462 100644
>> --- a/drivers/thermal/devfreq_cooling.c
>> +++ b/drivers/thermal/devfreq_cooling.c
>> @@ -227,20 +227,46 @@ static inline unsigned long get_total_power(struct devfreq_cooling_device *dfc,
>> voltage);
>> }
>>
>> +static void _normalize_load(struct devfreq_dev_status *status)
>> +{
>> + /* Make some space if needed */
>> + if (status->busy_time > 0xffff) {
>> + status->busy_time >>= 10;
>> + status->total_time >>= 10;
>> + }
>> +
>> + if (status->busy_time > status->total_time)
>> + status->busy_time = status->total_time;
>
> How the condition above is possible?
They should, be checked by the driver, but I cannot trust
and have to check for all corner cases: (div by 0, overflow
one of them, etc). The busy_time and total_time are unsigned long,
which means 4B on 32bit machines.
If these values are coming from device counters, which count every
busy cycle and total cycles of a clock of a device running at e.g.
1GHz they would overflow every ~4s.
Normally IPA polling are 1s and 100ms, it's platform specific. But there
are also 'empty' periods when IPA sees temperature very low and does not
even call the .get_requested_power() callbacks for the cooling devices,
just grants max freq to all. This is problematic. I am investigating it
and will propose a solution for IPA soon.
I would avoid all of this if devfreq core would have default for all
devices a reliable polling timer... Let me check some possibilities also
for this case.
>
>> + status->busy_time *= 100;
>> + status->busy_time /= status->total_time ? : 1;
>> +
>> + /* Avoid division by 0 */
>> + status->busy_time = status->busy_time ? : 1;
>> + status->total_time = 100;
>
> Why not base the normalization on 1024? and use an intermediate u64.
You are the 2nd reviewer who is asking this. I tried to keep 'load' as
in range [0, 100] since we also have 'load' in cpufreq cooling in this
range. Maybe I should switch to 1024 (Ionela was also asking for this).
>
> For example:
>
> static u32 _normalize_load(struct devfreq_dev_status *status)
> {
> u64 load = 0;
>
> /* Prevent divison by zero */
> if (!status->busy_time)
> return 0;
>
> /*
> * Assuming status->total_time is always greater or equal
> * to status->busy_time, it can not be equal to zero because
> * of the test above
> */
> load = status->busy_time * 1024;
> load /= status->total_time;
I wanted to avoid any divisions which involve 64bit var on 32bit
machine.
>
> /*
> * load is always [1..1024[, so it can not be truncated by a
> * u64 -> u32 coercive cast
> */
> return (u32)load;
> }
>
>
>> +}
>>
>> static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cdev,
>> u32 *power)
>> {
>> struct devfreq_cooling_device *dfc = cdev->devdata;
>> struct devfreq *df = dfc->devfreq;
>> - struct devfreq_dev_status *status = &df->last_status;
>> + struct devfreq_dev_status status;
>> unsigned long state;
>> - unsigned long freq = status->current_frequency;
>> + unsigned long freq;
>> unsigned long voltage;
>> u32 dyn_power = 0;
>> u32 static_power = 0;
>> int res;
>>
>> + mutex_lock(&df->lock);
>> + res = df->profile->get_dev_status(df->dev.parent, &status);
>> + mutex_unlock(&df->lock);
>> + if (res)
>> + return res;
>> +
>> + freq = status.current_frequency;
>> +
>> state = freq_get_state(dfc, freq);
>> if (state == THERMAL_CSTATE_INVALID) {
>> res = -EAGAIN;
>> @@ -268,16 +294,18 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
>> } else {
>> dyn_power = dfc->power_table[state];
>>
>> + _normalize_load(&status);
>
> load = _normalize_load(&status);
>
>> +
>> /* Scale dynamic power for utilization */
>> - dyn_power *= status->busy_time;
>> - dyn_power /= status->total_time;
>> + dyn_power *= status.busy_time;
>> + dyn_power /= status.total_time;
>
> /*
> * May be change dyn_power to a u64 to prevent overflow
> * when multiplied by 'load'
> */
> dyn_power = (dyn_power * load) / 1024;
dyn_power value from EM should fit in 16bit [1], so we should be safe.
I will experiment with the 1024 code and check some corner cases.
Thank you Daniel for the review!
Regards,
Lukasz
[1]
https://elixir.bootlin.com/linux/v5.10-rc5/source/kernel/power/energy_model.c#L135
More information about the dri-devel
mailing list