[Piglit] [PATCH][RFC] Add dmesg option for reboot policy
Dylan Baker
baker.dylan.c at gmail.com
Mon Nov 23 17:07:09 PST 2015
On Mon, Nov 23, 2015 at 02:48:35PM +0200, Martin Peres wrote:
> On 02/11/15 16:57, yann.argotti at linux.intel.com wrote:
> >This adds a policy which advises when user should reboot system to avoid
> >noisy test results due to system becoming unstable, for instance, and
> >therefore continues testing successfully.
> >To do this, a new Dmesg class is proposed which is not filtering dmesg and
> >monitors whether or not one of the following event occurs:
> >- gpu reset failed (not just gpu reset happened, that happens way too
> >often and many tests even provoke hangs intentionally)
> >- gpu crash,
> >- Oops:
> >- BUG
> >- lockdep splat that causes the locking validator to get disabled
> >If one of these issues happen, piglit test execution is stopped
> >-terminating test thread pool- and exit with code 3 to inform that reboot
> >is advised.
> >Then test execution resume, after rebooting system or not, is done like
> >usually with command line parameter "resume".
> >
> >To call it, use command line parameter: --dmesg monitored
>
> Hello Yann,
>
> The rationale behind this patch is very sound and we need something like
> this. Here are however a list of nitpicks:
>
> - Please send patches with git send-email, otherwise, it makes it
> impossible for us to comment inline which is the usual process for patch
> review. Please re-send :)
>
> - varaiable -> variable; double space after "when a reboot may be required"
>
> - I am not a big fan of changing the semantic of arguments that have been
> there forever. Can you think of a case where the user would not want the
> test to abort if we reach a state where we cannot trust the result? I am
> including Dylan on this. Also, if we are to keep these modes, can we rename
> the "simple" mode to "warning" and "monitored" to "abort"? This would make
> more sense and clearly state the goal of the modes.
Ilia, Daniel, Thomas, Glenn, I know that y'all use the dmesg support.
What do you think?
>
> - I see a potential issue with the dmesg-monitoring class, it can generate
> false positives. Indeed, it completely ignores the fact that there may be
> more than one GPU installed and that it may come from a different vendor. It
> would be good to know which gpu the test got run on and only track this gpu
> for bugs. This is however a complex task and would require env_dump to be
> done properly. I guess we could pass an argument to env_dump to track the
> gpu-vendor-dependent codepath and leave the general cases (Oops, BUG and
> lockdep) up to python. Anyway, better safe than sorry in the mean time!
>
> - I am not a big fan of using dmesg to get information out of the kernel
> because the output format may change. How about listening for udev events
> coming from the gpu or reading i915_error_state/i915_wedged ?
>
> That's it for the first round :)
>
> Thanks for the proposal!
> Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/piglit/attachments/20151123/7bc1145b/attachment.sig>
More information about the Piglit
mailing list