[Intel-gfx] [PATCH] drm/i915/pmu: Inspect runtime PM state more carefully while estimating RC6

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Tue Apr 10 10:54:23 UTC 2018


On 10/04/2018 11:34, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-04-10 11:22:55)
>>
>> On 10/04/2018 10:57, Chris Wilson wrote:
>>> But I'm not understanding the failure -- why is the estimate bad? At the
>>> very least we still ensure that it is monotonic? Is it just the jitter
>>> you are worrying about? (If the estimate is bad here, isn't it always
>>> bad?)
>>
>> As far as I have seen failures from CI are all estimate being too large.
>> (no jitter and no going backwards)
>>
>> What I suspect is going bad in either case, is that we must not add the
>> delta from current jiffies to internal runtime pm counters if state is
>> not suspended. If we do that we are accounting an unknown period of time
>> as suspended time and that would explain the over-estimation.
>>
>> In other words we are only allowed to estimate if the current state is
>> definitely suspended. If it is anything else we need to report either
>> the last estimated value, or the last real value, depending what is more
>> recent.
> 
> i.e. we must not use kdev->power.suspended_jiffies before we know it is
> set.
> 
> Ok, that is stating to make sense. Thanks, can you update the commitmsg
> with this (pretty much verbatim as it is a good explanation).

Can do.

The patch makes sense - but I still cannot explain the failures since 
the test is supposed to be running in an controlled environment:

1. enter runtime suspend
2. sample rc6 (this sets the estimation state)
3. sleep for 2s
4. sample rc6

And the diff between two rc6 states can show from 10% - 50% more RC6 
elapsed time than sleep time, even to 300% more in some reports. If i915 
is runtime suspended the whole time, and dmesg says it is, I don't know 
how this is possible.

There must be another flaw somewhere which I am not seeing currently.

Regards,

Tvrtko


More information about the Intel-gfx mailing list