[igt-dev] [PATCH i-g-t] runner: Also consider TAINT_MACHINE_CHECK as abortable taint

Petri Latvala petri.latvala at intel.com
Wed Jun 5 12:53:54 UTC 2019


On Wed, Jun 05, 2019 at 02:36:56PM +0200, Daniel Vetter wrote:
> On Wed, Jun 05, 2019 at 03:16:07PM +0300, Petri Latvala wrote:
> > Signed-off-by: Petri Latvala <petri.latvala at intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> 
> I've seen lots of machines where these happen as normal side-effect of
> thermal throtlling. For some value of "normal".
> 
> Do we really want to reboot on these? It could be like the network thing I
> recently disabled, and then everyone started screaming because our
> machines where constantly rebooting due to network cards/drivers
> temporarily having a bad time (but usually recovering).


I've seen some MCE log messages on dmesgs, quite often on one of the
BXTs for example. How often those MCE triggers caused taint is another
question.

Reading the mce code, it seems to be thermal _failure_ that causes a
taint. And all of these add_taint() calls also use
LOCKDEP_NOW_UNRELIABLE so we're already deep under the bus if we get
that taint.



-- 
Petri Latvala


More information about the igt-dev mailing list