[Intel-gfx] [PATCH igt] lib: Check and report if a subtest triggers a new kernel taint
Chris Wilson
chris at chris-wilson.co.uk
Tue Dec 5 12:38:19 UTC 2017
Quoting Petri Latvala (2017-12-05 12:32:36)
> On Tue, Dec 05, 2017 at 12:24:53PM +0000, Chris Wilson wrote:
> > Checking for a tainted kernel is a convenient way to see if the test
> > generated a critical error such as a oops, or machine check.
> >
> > v2: Docs?
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg at intel.com>
> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > ---
> > lib/Makefile.sources | 2 +
> > lib/igt_core.c | 19 +++++++-
> > lib/igt_kernel_taint.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++++
> > lib/igt_kernel_taint.h | 34 ++++++++++++++
> > 4 files changed, 174 insertions(+), 1 deletion(-)
> > create mode 100644 lib/igt_kernel_taint.c
> > create mode 100644 lib/igt_kernel_taint.h
> >
> > diff --git a/lib/Makefile.sources b/lib/Makefile.sources
> > index 6e968d9f7..152153908 100644
> > --- a/lib/Makefile.sources
> > +++ b/lib/Makefile.sources
> > @@ -22,6 +22,8 @@ lib_source_list = \
> > igt_gt.h \
> > igt_gvt.c \
> > igt_gvt.h \
> > + igt_kernel_taint.c \
> > + igt_kernel_taint.h \
> > igt_primes.c \
> > igt_primes.h \
> > igt_rand.c \
> > diff --git a/lib/igt_core.c b/lib/igt_core.c
> > index 03fa6e4e8..486c5989d 100644
> > --- a/lib/igt_core.c
> > +++ b/lib/igt_core.c
> > @@ -63,6 +63,7 @@
> > #include "intel_chipset.h"
> > #include "intel_io.h"
> > #include "igt_debugfs.h"
> > +#include "igt_kernel_taint.h"
> > #include "version.h"
> > #include "config.h"
> >
> > @@ -261,6 +262,7 @@ static bool list_subtests = false;
> > static char *run_single_subtest = NULL;
> > static bool run_single_subtest_found = false;
> > static const char *in_subtest = NULL;
> > +static unsigned long saved_kernel_taint;
> > static struct timespec subtest_time;
> > static clockid_t igt_clock = (clockid_t)-1;
> > static bool in_fixture = false;
> > @@ -937,6 +939,8 @@ bool __igt_run_subtest(const char *subtest_name)
> > return false;
> > }
> >
> > + saved_kernel_taint = igt_read_kernel_taint();
> > +
> > kmsg(KERN_INFO "[IGT] %s: starting subtest %s\n", command_str, subtest_name);
> > igt_debug("Starting subtest: %s\n", subtest_name);
> >
> > @@ -1083,8 +1087,21 @@ void __igt_skip_check(const char *file, const int line,
> > void igt_success(void)
> > {
> > succeeded_one = true;
> > - if (in_subtest)
> > + if (in_subtest) {
> > + unsigned long new_kernel_taints =
> > + igt_read_kernel_taint() & ~saved_kernel_taint;
> > + unsigned int tainted = igt_kernel_tainted(new_kernel_taints);
> > +
> > + if (tainted) {
> > + igt_kernel_taint_print(new_kernel_taints);
> > + if (tainted & TAINT_ERROR)
> > + exit_subtest("FAIL");
> > + else
> > + exit_subtest("WARN");
> > + }
> > +
> > exit_subtest("SUCCESS");
> > + }
> > }
>
>
> If you change the result to FAIL or WARN here, succeeded_one should not be changed.
>
> What of tests that don't have subtests?
I don't know where they are tracked. If there is a location we can place
such a before/after check. Or even if we do want to change test status
at all, and just make it a warn if the kernel becomes tainted.
The ambition here isn't just to flag oops reliably, but to respond when
we know the HW is broken and requires rebooting.
-Chris
More information about the Intel-gfx
mailing list