[igt-dev] ✗ Fi.CI.IGT: failure for tests/perf_pmu: Use absolute tolerance in accuracy tests (rev2)

Fri Mar 9 17:09:23 UTC 2018

Quoting Tvrtko Ursulin (2018-03-09 16:45:31)
> 
> Much better accuracy with these tweaks.
> 
> Looks like WC writes and ioctls were slow and were affecting the 
> self-calibration. Although I don't have the explanation on why were the 
> 50% tests most affected, especially compared with 2% ones. Shrug.

loop duration for 2%: 2500 + 122500 = 125000us
loop duration for 50%: 2500 + 2500 = 5000us

So 25x more ioctls at 50%.

Bound to be the ioctls, scary. We need to track down the cause of that.
A latency histogram just to see the distribution? kcov tracing for the
outliers? Though ftrace is probably better, if we assume that it's
likely outside forces (the path through i915 should be pretty static --
or is it???).

Something like capture an ftrace snippet for each ioctl (having
reproduced something that show the spikes or whatever); then throw away
all traces that lie within the normal distribution and look for patterns
in the outliers?

Hmm, my guess would be ksoftirqd. If the submit wasn't immediate then it
would only be run when the RT calibration thread slept (the submit will
be from the same cpu because it's a tasklet). Ho hum, that sounds very
plausible.
-Chris