[igt-dev] ✗ Fi.CI.IGT: failure for tests/perf_pmu: Use absolute tolerance in accuracy tests (rev2)

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Fri Mar 9 17:37:38 UTC 2018


On 09/03/2018 17:09, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-03-09 16:45:31)
>>
>> Much better accuracy with these tweaks.
>>
>> Looks like WC writes and ioctls were slow and were affecting the
>> self-calibration. Although I don't have the explanation on why were the
>> 50% tests most affected, especially compared with 2% ones. Shrug.
> 
> loop duration for 2%: 2500 + 122500 = 125000us
> loop duration for 50%: 2500 + 2500 = 5000us
> 
> So 25x more ioctls at 50%.
> 
> Bound to be the ioctls, scary. We need to track down the cause of that.
> A latency histogram just to see the distribution? kcov tracing for the
> outliers? Though ftrace is probably better, if we assume that it's
> likely outside forces (the path through i915 should be pretty static --
> or is it???).
> 
> Something like capture an ftrace snippet for each ioctl (having
> reproduced something that show the spikes or whatever); then throw away
> all traces that lie within the normal distribution and look for patterns
> in the outliers?
> 
> Hmm, my guess would be ksoftirqd. If the submit wasn't immediate then it
> would only be run when the RT calibration thread slept (the submit will
> be from the same cpu because it's a tasklet). Ho hum, that sounds very
> plausible.

Yes tasklet delays.. I even realized, well suspected, that while 
tweaking the test. By the time shard results came in it was long 
forgotten. :(

Huge difference in number of execbufs did not strike me though, well 
spotted. It is fewest towards the edges, and most in the middle. So it 
makes perfect sense. (98% comes up as 160000us busy, 3264us idle.)

Regards,

Tvrtko


More information about the igt-dev mailing list