[igt-dev] [PATCH i-g-t v2] runner: Make taint abort messages more verbose

Arkadiusz Hiler arkadiusz.hiler at intel.com
Mon Dec 3 15:45:39 UTC 2018


On Mon, Dec 03, 2018 at 03:09:29PM +0000, Chris Wilson wrote:
> Quoting Arkadiusz Hiler (2018-12-03 15:00:12)
> > Since not everyone is familiar with kernel taints, and it is easy to get
> > confused and attribute an abort to an innocent TAINT_USER caused by an
> > unsafe module option, which is usually the first thing people find
> > greping dmesg for "taint", we should provide more guidance.
> > 
> > This patch extends the abort log by printing the taint names, as found
> > in the kernel, along with a short explanation, so people know what to
> > look for in the dmesg.
> > 
> > v2: rebase, reword
> > 
> > Cc: Petri Latvala <petri.latvala at intel.com>
> > Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler at intel.com>
> > ---
> >  runner/executor.c | 38 ++++++++++++++++++++++++++++++--------
> >  1 file changed, 30 insertions(+), 8 deletions(-)
> > 
> > diff --git a/runner/executor.c b/runner/executor.c
> > index 54c530b7..c0639a66 100644
> > --- a/runner/executor.c
> > +++ b/runner/executor.c
> > @@ -137,13 +137,23 @@ static char *handle_lockdep(void)
> >         return NULL;
> >  }
> >  
> > +/* see Linux's include/linux/kernel.h */
> > +static const struct {
> > +       unsigned long bit;
> > +       const char *explanation;
> > +} abort_taints[] = {
> > +  {(1 << 5), "TAINT_BAD_PAGE: Bad page reference or an unexpected page flags."},
> > +  {(1 << 7), "TAINT_DIE: Kernel has died - BUG/OOPS."},
> > +  {(1 << 9), "TAINT_WARN: WARN_ON has happened."},
> > +  {0, 0}};
> > +
> >  static unsigned long tainted(unsigned long *taints)
> >  {
> > -       const unsigned long bad_taints =
> > -               0x20  | /* TAINT_PAGE */
> > -               0x80  | /* TAINT_DIE */
> > -               0x200; /* TAINT_OOPS */
> >         FILE *f;
> > +       unsigned long bad_taints = 0;
> > +
> > +       for (typeof(*abort_taints) *taint = abort_taints; taint->bit; taint++)
> > +               bad_taints |= taint->bit;
> >  
> >         *taints = 0;
> >  
> > @@ -158,14 +168,26 @@ static unsigned long tainted(unsigned long *taints)
> >  
> >  static char *handle_taint(void)
> >  {
> > -       unsigned long taints, bad_taints;
> > +       unsigned long taints;
> >         char *reason;
> >  
> > -       bad_taints = tainted(&taints);
> > -       if (!bad_taints)
> > +       if (!tainted(&taints))
> >                 return NULL;
> >  
> > -       asprintf(&reason, "Kernel tainted (%#lx -- %lx)", taints, bad_taints);
> > +       asprintf(&reason, "Kernel badly tainted (%#lx) (check dmesg for details):\n",
> > +                taints);
> > +
> > +       for (typeof(*abort_taints) *taint = abort_taints; taint->bit; taint++) {
> > +               if (taint->bit & taints) {
> > +                       char *old_reason = reason;
> > +                       asprintf(&reason, "%s\t(%#lx) %s\n",
> > +                                       old_reason,
> > +                                       taint->bit,
> > +                                       taint->explanation);
> > +                       free(old_reason);
> 
> Still this gives no more information, than having to manually check
> dmesg for the explanation.

I still stand by what has been said last week:
https://lists.freedesktop.org/archives/igt-dev/2018-November/007128.html

I have got two more questions on why are we aborting on unsafe module
options since then.

> 	"TAINT_BAD_PAGE: Bad page reference or an unexpected page flags."},
> 	"TAINT_DIE: Kernel has died - BUG/OOPS."},
> 	"TAINT_WARN: WARN_ON has happened."},
> 
> are quite frankly misleading, and not informative.

Any suggestions on how to make them less misleading?

> And since the runner is handling the dmesg, do the work to present the
> relevant information.
> -Chris

WIP by Petri, but that is going to take a while.

-- 
Cheers,
Arek


More information about the igt-dev mailing list