[Piglit] [PATCH][RFC] Add dmesg option for reboot policy
Martin Peres
martin.peres at linux.intel.com
Mon Nov 23 04:48:35 PST 2015
On 02/11/15 16:57, yann.argotti at linux.intel.com wrote:
> This adds a policy which advises when user should reboot system to avoid
> noisy test results due to system becoming unstable, for instance, and
> therefore continues testing successfully.
> To do this, a new Dmesg class is proposed which is not filtering dmesg and
> monitors whether or not one of the following event occurs:
> - gpu reset failed (not just gpu reset happened, that happens way too
> often and many tests even provoke hangs intentionally)
> - gpu crash,
> - Oops:
> - BUG
> - lockdep splat that causes the locking validator to get disabled
> If one of these issues happen, piglit test execution is stopped
> -terminating test thread pool- and exit with code 3 to inform that reboot
> is advised.
> Then test execution resume, after rebooting system or not, is done like
> usually with command line parameter "resume".
>
> To call it, use command line parameter: --dmesg monitored
Hello Yann,
The rationale behind this patch is very sound and we need something like
this. Here are however a list of nitpicks:
- Please send patches with git send-email, otherwise, it makes it
impossible for us to comment inline which is the usual process for patch
review. Please re-send :)
- varaiable -> variable; double space after "when a reboot may be required"
- I am not a big fan of changing the semantic of arguments that have
been there forever. Can you think of a case where the user would not
want the test to abort if we reach a state where we cannot trust the
result? I am including Dylan on this. Also, if we are to keep these
modes, can we rename the "simple" mode to "warning" and "monitored" to
"abort"? This would make more sense and clearly state the goal of the
modes.
- I see a potential issue with the dmesg-monitoring class, it can
generate false positives. Indeed, it completely ignores the fact that
there may be more than one GPU installed and that it may come from a
different vendor. It would be good to know which gpu the test got run on
and only track this gpu for bugs. This is however a complex task and
would require env_dump to be done properly. I guess we could pass an
argument to env_dump to track the gpu-vendor-dependent codepath and
leave the general cases (Oops, BUG and lockdep) up to python. Anyway,
better safe than sorry in the mean time!
- I am not a big fan of using dmesg to get information out of the
kernel because the output format may change. How about listening for
udev events coming from the gpu or reading i915_error_state/i915_wedged ?
That's it for the first round :)
Thanks for the proposal!
Martin
More information about the Piglit
mailing list