[Intel-gfx] [PATCH i-g-t] tests/gem_eio: Skip in-flight-suspend on snb

Lofstedt, Marta marta.lofstedt at intel.com
Thu Oct 26 10:57:26 UTC 2017


Since the discussion on this died and I believe that everyone are scared that the dodge suggestion would require someone to do some work.
I could Ack the patch if: 

+	igt_skip_on_f(IS_SANDYBRIDGE(intel_get_drm_devid(fd)),
+		      "random incompletes in CI with this test\n");
+

Was replaced with an igt_warn

/Marta

> -----Original Message-----
> From: Martin Peres [mailto:martin.peres at linux.intel.com]
> Sent: Monday, October 23, 2017 12:29 PM
> To: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>; Daniel Vetter:
> <daniel.vetter at ffwll.ch>; Intel Graphics Development <intel-
> gfx at lists.freedesktop.org>; Latvala, Petri <petri.latvala at intel.com>
> Cc: Chris Wilson <chris at chris-wilson.co.uk>; Lofstedt, Marta
> <marta.lofstedt at intel.com>
> Subject: Re: [PATCH i-g-t] tests/gem_eio: Skip in-flight-suspend on snb
> 
> On 20/10/17 12:26, Joonas Lahtinen wrote:
> > + Petri
> >
> > On Thu, 2017-10-19 at 16:29 +0300, Martin Peres wrote:
> >> On 19/10/17 12:51, Daniel Vetter wrote:
> >>> CI gets upset about it resulting in an incomplete, let's skip it
> >>> until that's fixed to avoid havoc in the CI farm. Of course this
> >>> should/will be reverted as soon as we have a fix (similar to how we
> >>> dealt with the snb-dies-in-blt-hangs issue).
> >>>
> >>> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> >>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> >>> Cc: "Lofstedt, Marta" <marta.lofstedt at intel.com>
> >>> Cc: Martin Peres <martin.peres at linux.intel.com>
> >>> References:
> >>> https://intel-gfx-ci.01.org/tree/drm-tip/igt@gem_eio@in-flight-suspe
> >>> nd.html
> >>> References: https://bugs.freedesktop.org/show_bug.cgi?id=103289
> >>> Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> >
> > <SNIP>
> >
> >> So, let's recap the problem here:
> >>   - Any incomplete in sharded runs mean that the platform is unfit
> >> for pre-merge (because any other test after will go from pass to notrun)
> >>   - We can't fix issues immediately, especially for old platforms
> >>
> >> This patch is sweeping the test under the rug by using the skip
> >> output, which is not only hard to track, it is also misleading.
> >>
> >> After discussing with Marta, Arek and Petri, we found some consensus
> >> on the following proposal (terminology is up for debate):
> >>
> >> - Introduce igt_dodge_on(cond, label): Report a pre-emptive 'fail'
> >> when the condition is true. Make sure this is over-ridable with
> >> IGT_DODGE=0 so as we can easily run these tests without recompiling
> them.
> >
> > Make this igt_skip_on_ci(cond) and require IGT_CI=1 to activate them.
> > Much like with simulation.
> 
> replace skip with fail, and we agree. Skips are too easy to ignore!
> 
> >
> > Still, a BIOS update to one of the CI machines might mean (if it's not
> > now the case, not very far fetched for the future) that we go churn in
> > the IGT codebase to drop bunch of these. That's not the optimal
> > workflow I can think of when we're discussing a separate mailing list
> > for IGT discussion and patches to make it more self-contained. Then we
> > bind that new mailing list to our CI farm contents, and bind making
> > fixes to the CI farm operation directly to the IGT reviewing bandwidth?
> 
> Isn't what we are proposing doing exactly this? By changing the source code
> of IGT, we allow people to send patches to remove some workarounds and
> see if they pass or fail in the same way they would propose any change to
> IGT. Moreover, we make the running of IGT in our farm as transparent as
> possible.
> 
> >
> > I'm still thinking best way would be that CI would mask the known
> > problematic ones from the failure/pass criteria, and then somebody
> > actually looks at the masked on after their testing coverage is
> > prioritized. I think IGT should try to provide a wide range of tests
> > that are supposed to work on any certain hardware. If they don't, it's
> > not a reason to change the tests itself.
> 
> This is true that some skips will be highly-machine specific, but isn't our role
> as developers to know what and what doesn't work? By pushing this to an
> external whitelist only for CI, we miss an opportunity to improve this CI
> blacklist.
> 
> Now, let me remind you that this blacklist is *only* for tests that hang the
> machine or leave it in an inconsistant state which will lead to a crash later.
> 
> >
> > With the filter, we can grow the testing coverage for the new
> > platforms, even if CI happens to have odd machines that may not pass
> > those tests (and we may not have the resources to immediately fix
> > those). All this without churning on the IGT codebase.
> 
> You are describing what cibuglog is already doing. Failing tests cases are
> suppressed in pre-merge, and associated bugs[1].
> 
> See above for what this proposal is about.
> 
> [1] https://intel-gfx-ci.01.org/cibuglog/
> 
> >
> > But if this is the only technically viable solution in short-term,
> > then so be it. I just see better options too.
> 
> Maybe we need a write up our workflow. This time, a public one!
> 
> I hope


More information about the Intel-gfx mailing list