[Intel-gfx] [PATCH i-g-t] tests/gem_eio: Skip in-flight-suspend on snb

Martin Peres martin.peres at linux.intel.com
Mon Oct 23 09:28:46 UTC 2017


On 20/10/17 12:26, Joonas Lahtinen wrote:
> + Petri
> 
> On Thu, 2017-10-19 at 16:29 +0300, Martin Peres wrote:
>> On 19/10/17 12:51, Daniel Vetter wrote:
>>> CI gets upset about it resulting in an incomplete, let's skip it until
>>> that's fixed to avoid havoc in the CI farm. Of course this should/will
>>> be reverted as soon as we have a fix (similar to how we dealt with the
>>> snb-dies-in-blt-hangs issue).
>>>
>>> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: "Lofstedt, Marta" <marta.lofstedt at intel.com>
>>> Cc: Martin Peres <martin.peres at linux.intel.com>
>>> References: https://intel-gfx-ci.01.org/tree/drm-tip/igt@gem_eio@in-flight-suspend.html
>>> References: https://bugs.freedesktop.org/show_bug.cgi?id=103289
>>> Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> 
> <SNIP>
> 
>> So, let's recap the problem here:
>>   - Any incomplete in sharded runs mean that the platform is unfit for 
>> pre-merge (because any other test after will go from pass to notrun)
>>   - We can't fix issues immediately, especially for old platforms
>>
>> This patch is sweeping the test under the rug by using the skip output, 
>> which is not only hard to track, it is also misleading.
>>
>> After discussing with Marta, Arek and Petri, we found some consensus on 
>> the following proposal (terminology is up for debate):
>>
>> - Introduce igt_dodge_on(cond, label): Report a pre-emptive 'fail' when 
>> the condition is true. Make sure this is over-ridable with IGT_DODGE=0 
>> so as we can easily run these tests without recompiling them.
> 
> Make this igt_skip_on_ci(cond) and require IGT_CI=1 to activate them.
> Much like with simulation.

replace skip with fail, and we agree. Skips are too easy to ignore!

> 
> Still, a BIOS update to one of the CI machines might mean (if it's not
> now the case, not very far fetched for the future) that we go churn in
> the IGT codebase to drop bunch of these. That's not the optimal
> workflow I can think of when we're discussing a separate mailing list
> for IGT discussion and patches to make it more self-contained. Then we
> bind that new mailing list to our CI farm contents, and bind making
> fixes to the CI farm operation directly to the IGT reviewing bandwidth?

Isn't what we are proposing doing exactly this? By changing the source
code of IGT, we allow people to send patches to remove some workarounds
and see if they pass or fail in the same way they would propose any
change to IGT. Moreover, we make the running of IGT in our farm as
transparent as possible.

> 
> I'm still thinking best way would be that CI would mask the known
> problematic ones from the failure/pass criteria, and then somebody
> actually looks at the masked on after their testing coverage is
> prioritized. I think IGT should try to provide a wide range of tests
> that are supposed to work on any certain hardware. If they don't, it's
> not a reason to change the tests itself.

This is true that some skips will be highly-machine specific, but isn't
our role as developers to know what and what doesn't work? By pushing
this to an external whitelist only for CI, we miss an opportunity to
improve this CI blacklist.

Now, let me remind you that this blacklist is *only* for tests that hang
the machine or leave it in an inconsistant state which will lead to a
crash later.

> 
> With the filter, we can grow the testing coverage for the new
> platforms, even if CI happens to have odd machines that may not pass
> those tests (and we may not have the resources to immediately fix
> those). All this without churning on the IGT codebase.

You are describing what cibuglog is already doing. Failing tests cases
are suppressed in pre-merge, and associated bugs[1].

See above for what this proposal is about.

[1] https://intel-gfx-ci.01.org/cibuglog/

> 
> But if this is the only technically viable solution in short-term, then
> so be it. I just see better options too.

Maybe we need a write up our workflow. This time, a public one!

I hope


More information about the Intel-gfx mailing list