[igt-dev] [Intel-gfx] [PATCH i-g-t v4] tests/perf_pmu: Avoid RT thread for accuracy test

Mon Apr 16 09:55:29 UTC 2018

On 14/04/2018 12:35, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-04-11 14:52:36)
>>
>> On 11/04/2018 14:23, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-04-04 10:51:52)
>>>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>
>>>> Realtime scheduling interferes with execlists submission (tasklet) so try
>>>> to simplify the PWM loop in a few ways:
>>>>
>>>>    * Drop RT.
>>>>    * Longer batches for smaller systematic error.
>>>>    * More truthful test duration calculation.
>>>>    * Less clock queries.
>>>>    * No self-adjust - instead just report the achieved cycle and let the
>>>>      parent check against it.
>>>>    * Report absolute cycle error.
>>>>
>>>> v2:
>>>>    * Bring back self-adjust. (Chris Wilson)
>>>>      (But slightly fixed version with no overflow.)
>>>>
>>>> v3:
>>>>    * Log average and mean calibration for each pass.
>>>>
>>>> v4:
>>>>    * Eliminate development leftovers.
>>>>    * Fix variance logging.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>
>>>   From a pragmatic point of view, there's no point waiting for me to be
>>> happy with the convergence if CI is, and the variance will definitely be
>>> interesting (although you could have used igt_mean to compute the
>>> iterative variance), so
>>>
>>> Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
>>
>> Thanks, I've pushed it and so we'll see.
> 
> We should resurrect the RT variant in the near future. It's definitely
> an issue in our driver that random userspace can impact execution of
> unconnected others. (Handling RT starvation of workers is something we
> have to be aware of elsewhere, commonly hits oom if we don't have an
> escape clause.) Lots of words just to say, we should add a test for RT
> to exercise the bad behaviour. Hmm, doesn't need to be pmu, just we need
> an assertion that execution latency is bounded and no RT hog will delay
> it.

Agreed, I can add a simple test to gem_exec_latency.

But with regards on how to fix this - re-enabling direct submission 
sounds simplest (not only indirect via tasklet) in theory although I do 
remember you were raising some issues with this route last time I 
mentioned it. It does sound like a conceptually correct thing to do.

As an alternative we could explore conversion effort and resulting 
latencies from conversion to threaded irq handler.

You also had a patch to improve tasklet scheduling in some cases now I 
remember. We can try that after I write the test as well. Although I 
have no idea how hard of a sell that would be.

Regards,

Tvrtko