[Intel-gfx] [PATCH] drm/i915/selftest: Bump up sample period for busy stats selftest
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Nov 4 08:29:38 UTC 2022
On 03/11/2022 18:08, Umesh Nerlige Ramappa wrote:
> On Thu, Nov 03, 2022 at 12:28:46PM +0000, Tvrtko Ursulin wrote:
>>
>> On 03/11/2022 00:11, Umesh Nerlige Ramappa wrote:
>>> Engine busyness samples around a 10ms period is failing with busyness
>>> ranging approx. from 87% to 115%. The expected range is +/- 5% of the
>>> sample period.
>>>
>>> When determining busyness of active engine, the GuC based engine
>>> busyness implementation relies on a 64 bit timestamp register read. The
>>> latency incurred by this register read causes the failure.
>>>
>>> On DG1, when the test fails, the observed latencies range from 900us -
>>> 1.5ms.
>>
>> Do I read this right - that the latency of a 64 bit timestamp register
>> read is 0.9 - 1.5ms? That would be the read in guc_update_pm_timestamp?
>
> Correct. That is total time taken by intel_uncore_read64_2x32() measured
> with local_clock().
>
> One other thing I missed out in the comments is that enable_dc=0 also
> resolves the issue, but display team confirmed there is no relation to
> display in this case other than that it somehow introduces a latency in
> the reg read.
Could it be the DMC wreaking havoc something similar to b68763741aa2
("drm/i915: Restore GT performance in headless mode with DMC loaded")?
>>> One solution tried was to reduce the latency between reg read and
>>> CPU timestamp capture, but such optimization does not add value to user
>>> since the CPU timestamp obtained here is only used for (1) selftest and
>>> (2) i915 rps implementation specific to execlist scheduler. Also, this
>>> solution only reduces the frequency of failure and does not eliminate
>>> it.
>
> Note that this solution is here -
> https://patchwork.freedesktop.org/patch/509991/?series=110497&rev=1
>
> but I am not intending to use it since it just reduces the frequency of
> failues, but the inherent issue still exists.
Right, I'd just go with that as well if it makes a significant
improvement. Or even just refactor intel_uncore_read64_2x32 to be under
one spinlock/fw. I don't see that it can have an excuse to be less
efficient since there's a loop in there.
Regards,
Tvrtko
> Regards,
> Umesh
>
>>>
>>> In order to make the selftest more robust and account for such
>>> latencies, increase the sample period to 100 ms.
>>>
>>> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa at intel.com>
>>> ---
>>> drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>> b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>> index 0dcb3ed44a73..87c94314cf67 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>> @@ -317,7 +317,7 @@ static int live_engine_busy_stats(void *arg)
>>> ENGINE_TRACE(engine, "measuring busy time\n");
>>> preempt_disable();
>>> de = intel_engine_get_busy_time(engine, &t[0]);
>>> - mdelay(10);
>>> + mdelay(100);
>>> de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de);
>>> preempt_enable();
>>> dt = ktime_sub(t[1], t[0]);
More information about the Intel-gfx
mailing list