[Intel-gfx] [PATCH v4 i-g-t] tests/kms: increase max threshold time for edid read

Mon Aug 14 08:43:34 UTC 2017

On Fri, 11 Aug 2017, Clint Taylor <clinton.a.taylor at intel.com> wrote:
> On 08/11/2017 12:49 AM, Lofstedt, Marta wrote:
>> We can't have this lspcon as an argument to the test, it will not work with automated testing using piglit as we do for CI.
>> Theoretically we could add a "has_lspcon" to debugfs, but I believe that this test has started to be over complicated.
> The test would need to know if an LSPcon is connected on a port by port 
> basis. This is easy if the LSPcon driver is loaded but in the case of 
> USB_C->HDMI is gets a little more complicated (not impossible) to figure 
> out. Even if we know exactly what device is connected failures can still 
> occur if the TCON/Monitor clock stretches the EDID read.
>
>> The intention of the test is to do a sanity check so that we don't drift off here, so I actually prefer the V1.
> Unfortunately with the timing differences (3ms to 96ms) based on the 
> monitor type connected and EDID size there is no way for a one size fits 
> all sanity check to be reliable. If the original point of this test was 
> to figure out if probe caused more than 1 EDID read, maybe we should 
> actually count EDID reads and not infer it by measuring time.

I'll take a step back here and try to look at the big picture.

I think the point of the test originally was to catch regressions in the
EDID code that would slow down EDID reading. That's a valid thing to
test per se, but unfortunately it's not easy to do that accurately. The
basic alternatives are, from most accurate to least accurate:

1) Start off with reference timing, and make relative comparisons
against that. This is not unlike performance testing. The problem is, to
not compare apples and oranges, you'll need to take into account
platform, connector type, adapters, cables, hubs, EDID size, and the
display. Change one, and you can't make the comparison anymore. You end
up tracking configurations and timings, a lot.

2) Try to make a reasonable estimate of the absolute maximum based on
some subset of the configuration (such as connector type and
adapters). If you stay below, regardless of the timing changes, you
assume it's fine, and otherwise you consider it a regression. So you try
to keep the limits tight to catch regressions, but not flag normal
behaviour too much. You end up accumulating heuristics for the
configuration and timing, because, let's face it, there is a lot of
variance in what's acceptable. (This is v2+ of the patch at hand.)

3) Try to make a reasonable estimate of the absolute maximum independent
of the configuration. You'll never catch regressions in the fast
configurations, because your limits are based on the worst case. You'll
only catch the really pathological problems. Note that the current code
uses mean, so it evens out connector type specific problems; we might
also not catch pathological problems in, say, DP if the HDMI is
fast. (This is the original and v1 of the patch.)

I think 1) is more trouble than it's worth. 3) is simplistic but
easy. The question is, do we get enough benefit from 2) to offset the
complications?

BR,
Jani.

-- 
Jani Nikula, Intel Open Source Technology Center