[igt-dev] [PATCH i-g-t] runner: Make taint aborts more verbose

Petri Latvala petri.latvala at intel.com
Mon Nov 26 12:22:00 UTC 2018


On Mon, Nov 26, 2018 at 12:36:44PM +0200, Arkadiusz Hiler wrote:
> On Mon, Nov 26, 2018 at 09:12:46AM +0000, Chris Wilson wrote:
> > Quoting Arkadiusz Hiler (2018-11-26 08:52:22)
> > > Provide the reader with the taint names and a short explanation, as well
> > > as point in the direction of the dmesg for more details.
> > > 
> > > Cc: Petri Latvala <petri.latvala at intel.com>
> > > Cc: Martin Peres <martin.peres at linux.intel.com>
> > > Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler at intel.com>
> > > ---
> > >  runner/executor.c | 42 ++++++++++++++++++++++++++++++++----------
> > >  1 file changed, 32 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/runner/executor.c b/runner/executor.c
> > > index 2038c3fd..954cd9a7 100644
> > > --- a/runner/executor.c
> > > +++ b/runner/executor.c
> > > @@ -139,24 +139,46 @@ static char *handle_lockdep(void)
> > >  
> > >  static char *handle_taint(void)
> > >  {
> > > -       const unsigned long bad_taints =
> > > -               0x20  | /* TAINT_PAGE */
> > > -               0x80  | /* TAINT_DIE */
> > > -               0x200; /* TAINT_OOPS */
> > > -       unsigned long taints = 0;
> > > +       /* see Linux's include/linux/kernel.h */
> > > +       static const struct {
> > > +               unsigned long bit;
> > > +               const char *explanation;
> > > +       } taints[] = {
> > > +         {(1 << 5), "TAINT_BAD_PAGE: Bad page reference or an unexpected page flags."},
> > > +         {(1 << 7), "TAINT_DIE: Kernel has died - BUG/OOPS."},
> > > +         {(1 << 9), "TAINT_WARN: WARN_ON has happened."},
> > 
> > Pure noise, these don't add any useful information, so why? The only
> > information is the explanation in dmesg. Checking tainted is just
> > shorthand for us to determine a significant event.
> 
> People not knowing what taint means (or that it is a "bit field") were
> getting confused by those aborts. I just got tired of explaining that we
> don't check for "unsafe module option taint" which is usually the first
> result they get while greping dmesg for "taint".


Yeah this has been causing confusion in all kinds of circles.

In general it's always better to explain numbers in outputs meant for
humans. Surely if people ask about stuff, adding explanations to it is
useful information by definition?


> 
> > If you want to add the kmsg output filtered by WARN+, that would be more
> > useful than hinting to the user they need to do it themselves. But that
> > should be redundant as that is already part of the runner -- so is not
> > the problem the actual integration, that this appears as a separate test
> > and not flagging the igt itself, with exceptions to handle tainted
> > before a test?
> > -Chris
> 
> Providing more logs, as a part of the aborted results, indeed sounds
> good, but it is harder to implement, as we would have to keep track of
> dmesg between taint checks too. Petri, what do you think?


It's possible [citation needed] to inject the previous test
execution's dmesg into the aborted test result, but only in the form
of igt_runner CI is using today. Resuming, especially when using
multiple-mode, will make it harder. Actually, even today CI is not
fully able to visualize shard results when multiple shards abort.

What might be easier is injecting the aborting blame into the
previously executed test but even that is not a one-liner.

Either way, aborting and its reporting is not done and abandoned yet. WIP.



-- 
Petri Latvala


More information about the igt-dev mailing list