[Intel-gfx] [PATCH] drm/i915/selftest: Bump up sample period for busy stats selftest

Fri Nov 4 15:45:53 UTC 2022

On 04/11/2022 14:58, Umesh Nerlige Ramappa wrote:
> On Fri, Nov 04, 2022 at 08:29:38AM +0000, Tvrtko Ursulin wrote:
>>
>> On 03/11/2022 18:08, Umesh Nerlige Ramappa wrote:
>>> On Thu, Nov 03, 2022 at 12:28:46PM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 03/11/2022 00:11, Umesh Nerlige Ramappa wrote:
>>>>> Engine busyness samples around a 10ms period is failing with busyness
>>>>> ranging approx. from 87% to 115%. The expected range is +/- 5% of the
>>>>> sample period.
>>>>>
>>>>> When determining busyness of active engine, the GuC based engine
>>>>> busyness implementation relies on a 64 bit timestamp register read. 
>>>>> The
>>>>> latency incurred by this register read causes the failure.
>>>>>
>>>>> On DG1, when the test fails, the observed latencies range from 900us -
>>>>> 1.5ms.
>>>>
>>>> Do I read this right - that the latency of a 64 bit timestamp 
>>>> register read is 0.9 - 1.5ms? That would be the read in 
>>>> guc_update_pm_timestamp?
>>>
>>> Correct. That is total time taken by intel_uncore_read64_2x32() 
>>> measured with local_clock().
>>>
>>> One other thing I missed out in the comments is that enable_dc=0 also 
>>> resolves the issue, but display team confirmed there is no relation 
>>> to display in this case other than that it somehow introduces a 
>>> latency in the reg read.
>>
>> Could it be the DMC wreaking havoc something similar to b68763741aa2 
>> ("drm/i915: Restore GT performance in headless mode with DMC loaded")?
>>
> 
> __gt_unpark is already doing a
> gt->awake = intel_display_power_get(i915, POWER_DOMAIN_GT_IRQ);
> 
> I would assume that __gt_unpark was called prior to running the 
> selftest, need to confirm that though.

Right, I meant maybe something similar but not necessarily the same. 
Similar in the sense that it may be DMC doing many MMIO invisible to 
i915 and so introducing latency.

>>>>> One solution tried was to reduce the latency between reg read and
>>>>> CPU timestamp capture, but such optimization does not add value to 
>>>>> user
>>>>> since the CPU timestamp obtained here is only used for (1) selftest 
>>>>> and
>>>>> (2) i915 rps implementation specific to execlist scheduler. Also, this
>>>>> solution only reduces the frequency of failure and does not eliminate
>>>>> it.
>>>
>>> Note that this solution is here - 
>>> https://patchwork.freedesktop.org/patch/509991/?series=110497&rev=1
>>>
>>> but I am not intending to use it since it just reduces the frequency 
>>> of failues, but the inherent issue still exists.
>>
>> Right, I'd just go with that as well if it makes a significant 
>> improvement. Or even just refactor intel_uncore_read64_2x32 to be 
>> under one spinlock/fw. I don't see that it can have an excuse to be 
>> less efficient since there's a loop in there.
> 
> The patch did reduce the failure to once in 200 runs vs once in 10 runs.
> I will refactor the helper in that case.

Yeah it makes sense to make it efficient. But feel free to go with the 
msleep increase as well to workaround the issue fully.

Regards,

Tvrtko