[igt-dev] [PATCH i-g-t] runner: Treat dmesg warnings as pure warnings
Daniel Vetter
daniel at ffwll.ch
Thu Nov 22 14:26:46 UTC 2018
On Thu, Nov 22, 2018 at 02:59:35PM +0200, Joonas Lahtinen wrote:
> Quoting Daniel Vetter (2018-11-22 11:41:26)
> > On Wed, Nov 21, 2018 at 12:05:45PM +0200, Joonas Lahtinen wrote:
> > > Quoting Chris Wilson (2018-11-21 11:18:43)
> > > > Quoting Joonas Lahtinen (2018-11-21 08:07:30)
> > > > > Quoting Ville Syrjälä (2018-11-14 15:53:26)
> > > > > > On Wed, Nov 14, 2018 at 01:34:10PM +0000, Chris Wilson wrote:
> > > > > > > Quoting Ville Syrjälä (2018-11-14 13:25:39)
> > > > > > > > On Wed, Nov 14, 2018 at 01:56:53PM +0200, Petri Latvala wrote:
> > > > > > > > > On Tue, Nov 13, 2018 at 05:25:47PM +0000, Chris Wilson wrote:
> > > > > > > > > > I have whinged on for ages about the dmesg-warnings being an expected
> > > > > > > > > > part of kernel testing (where else is the kernel meant to log its
> > > > > > > > > > errors?) and should be treated the same as our stderr for the test. That
> > > > > > > > > > is if a test fails, it fails and does not need to be conflated with
> > > > > > > > > > whether or not there was a dmesg warning (just as the test saying why it
> > > > > > > > > > failed on stderr does not need flagging), and that a passing test with a
> > > > > > > > > > dmesg warning is simply a warn.
> > > > > > > > > >
> > > > > > > > > > The effect is that we simply remove the "dmesg-" flagging from results
> > > > > > > > > > names, as the err/dmesg output is simply collated for the error report
> > > > > > > > > > already.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > > > > > > > Cc: Petri Latvala <petri.latvala at intel.com>
> > > > > > > > > > Cc: Martin Peres <martin.peres at linux.intel.com>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yep, now's a good a time as any to remove this cargo cult if ever.
> > > > > > > > >
> > > > > > > > > Some acks from kernel people in general would be nice.
> > > > > > > > >
> > > > > > > > > TODO:
> > > > > > > > > Martin: cibuglog filters ready for this
> > > > > > > > > Tomi et al: Visualization ready for this (tooltips etc)
> > > > > > > >
> > > > > > > > Can I still see from the color/etc. whether there was some dmesg spew or
> > > > > > > > not? I often glance at those and would prefer to not have to go through
> > > > > > > > all the fails to find them.
> > > > > > >
> > > > > > > It was never shown in the colour:
> > > > > > >
> > > > > > > warn/dmesg-warn == orange
> > > > > > > fail/dmesg-fail == red
> > > > > >
> > > > > > Indeed. Which makes me wonder whether I've been overlooking some
> > > > > > dmesg noise. Either that or it's just my memory playing tricks
> > > > > > and telling me that I've mostly been looking at the orange stuff.
> > > > > >
> > > > > > So let's consider this a feature request then. I would like to see
> > > > > > from the overview which tests produced dmesg noise, regardless of
> > > > > > whether the test otherwise succeeded or failed.
> > > > >
> > > > > I think this is a fair change.
> > > > >
> > > > > And for documentation purposes, what I proposed in IRC:
> > > > >
> > > > > My suggestion was to add letter "D" to the matrix when there is dmesg
> > > > > noise in the specific subtest (Similarly as there's letter T in
> > > > > timeout).
> > > >
> > > > I just don't understand what is "dmesg noise". dmesg is where I log my
> > > > errors, there is no other suitable log.
> > >
> > > My take on this is that it provides a quick differentation of:
> > >
> > > "The IGT test fails, functionality lost"
> > >
> > > "The IGT test passes, functionality is there, but makes kernel spew
> > > error (recoverably)"
> > >
> > > Both are errors, but there's informational value in seeing the
> > > difference, at least to me.
> >
> > There's lots of igt tests that entirely rely on the in-kernel checks to
> > validate stuff. dmesg output in that case means "test failed,
> > functionality lost".
> >
> > I'm kinda leaning towards just marking anything with dmesg warn-and-worse
> > as "fail". And restrict warn/orange to igt_warn and friends. If that's too
> > much we can think about a DRM_WARN and using warning level in syslog to
> > result in "warn", and error and worse levels in "fail".
> >
> > Kernel on fire, but hey look the test passed is not a conclusion we should
> > ever support imo. Given that Joonas has it, I'd say we need to be more
> > clear here that dmesg = stuff failed hard.
>
> Well, such tests sound like a candidate to be moved to kernel selftests,
> if they look to just make sure kernel doesn't trip over itself. Maybe
> helps to identify such tests, where only expected failure mode is kernel
> message.
>
> Because there for sure is a difference between having WARNs in your dmesg
> log while the program continues to work vs. program crashing/not working,
> when it comes to severity.
lockdep warnings, modeset state checker, GEM_BUG_ON, the list is _very_
long. You're essentially proposing that we move pretty much all of igt
into the kernel, that doesn't make sense.
Most of these really are nothing else than assert() calls in a different
place. If you test userspace, then if you hit an assert() in one of the
libraries you're testing, you're also not declaring those bugs to be just
warnings instead of full failures.
We could just convert them all to BUG_ON, which would more closely
resemble what we'd do in userspace. But that makes debugging unecesarily
painful. But "it's easier to debug if the kernel keeps chugging along" is
the only reason we're trying to keep going, not that these failures aren't
fairly severe issues. And I think this is the same case Chris made earlier
in this thread: WARN_ON() == assert() for the kernel, more or less. And
these WARNING backtraces are the clear majority of the dmesg failures we have.
Hence why I think if we're going to simplify all this, we should map dmesg
failures to an overall "fail" status. And then maybe later on figure out
how we could wire up the stderr output/"warn" level for the kernel too
(e.g. through a DRM_WARNING() set of macros, and corresponding log
levels).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the igt-dev
mailing list