[Piglit] [PATCH][RFC] Add dmesg option for reboot policy

Ilia Mirkin imirkin at alum.mit.edu
Tue Nov 24 07:19:07 PST 2015


On Tue, Nov 24, 2015 at 5:33 AM, Martin Peres
<martin.peres at linux.intel.com> wrote:
> On 24/11/15 07:09, Dylan Baker wrote:
>>
>> On Mon, Nov 23, 2015 at 08:54:46PM -0500, Ilia Mirkin wrote:
>>>
>>> On Mon, Nov 23, 2015 at 8:41 PM, Dylan Baker <baker.dylan.c at gmail.com>
>>> wrote:
>>>>
>>>> On Mon, Nov 23, 2015 at 08:24:22PM -0500, Ilia Mirkin wrote:
>>>>>
>>>>> On Mon, Nov 23, 2015 at 8:07 PM, Dylan Baker <baker.dylan.c at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On Mon, Nov 23, 2015 at 02:48:35PM +0200, Martin Peres wrote:
>>>>>>>
>>>>>>> On 02/11/15 16:57, yann.argotti at linux.intel.com wrote:
>>>>>>>>
>>>>>>>> This adds a policy which advises when user should reboot system to
>>>>>>>> avoid
>>>>>>>> noisy test results due to system becoming unstable, for instance,
>>>>>>>> and
>>>>>>>> therefore continues testing successfully.
>>>>>>>> To do this, a new Dmesg class is proposed which is not filtering
>>>>>>>> dmesg and
>>>>>>>> monitors whether or not one of the following event occurs:
>>>>>>>> - gpu reset failed (not just gpu reset happened, that happens way
>>>>>>>> too
>>>>>>>> often and many tests even provoke hangs intentionally)
>>>>>>>> - gpu crash,
>>>>>>>> - Oops:
>>>>>>>> - BUG
>>>>>>>> - lockdep splat that causes the locking validator to get disabled
>>>>>>>> If one of these issues happen, piglit test execution is stopped
>>>>>>>> -terminating test thread pool- and exit with code 3 to inform that
>>>>>>>> reboot
>>>>>>>> is advised.
>>>>>>>> Then test execution resume, after rebooting system or not, is done
>>>>>>>> like
>>>>>>>> usually with command line parameter "resume".
>>>>>>>>
>>>>>>>> To call it, use command line parameter: --dmesg monitored
>>>>>>>
>>>>>>> Hello Yann,
>>>>>>>
>>>>>>> The rationale behind this patch is very sound and we need something
>>>>>>> like
>>>>>>> this. Here are however a list of nitpicks:
>>>>>>>
>>>>>>>   - Please send patches with git send-email, otherwise, it makes it
>>>>>>> impossible for us to comment inline which is the usual process for
>>>>>>> patch
>>>>>>> review. Please re-send :)
>>>>>>>
>>>>>>> - varaiable -> variable; double space after "when a reboot may be
>>>>>>> required"
>>>>>>>
>>>>>>>   - I am not a big fan of changing the semantic of arguments that
>>>>>>> have been
>>>>>>> there forever. Can you think of a case where the user would not want
>>>>>>> the
>>>>>>> test to abort if we reach a state where we cannot trust the result? I
>>>>>>> am
>>>>>>> including Dylan on this. Also, if we are to keep these modes, can we
>>>>>>> rename
>>>>>>> the "simple" mode to "warning" and "monitored" to "abort"? This would
>>>>>>> make
>>>>>>> more sense and clearly state the goal of the modes.
>>>>>>
>>>>>> Ilia, Daniel, Thomas, Glenn, I know that y'all use the dmesg support.
>>>>>> What do you think?
>>>>>
>>>>> Can you provide a summary of what this patch does? It was submitted as
>>>>> an attachment, so I can't (easily) look at it... either way, as long
>>>>> as running with --dmesg doesn't break, I probably don't care. I use
>>>>> --dmesg to know which tests cause the GPU to complain, which I then,
>>>>> in turn, use to pick which tests to debug further. (And since I
>>>>> normally run with -1 anyways, it's ~free to add...)
>>>>>
>>>>>    -ilia
>>>>
>>>> I'm going to interpret that as "I'd be annoyed if piglit just stopped
>>>> when there was dmesg chatter"?
>>>
>>> That would definitely be a deal breaker. My wifi goes in and out and
>>> complains loudly about it in dmesg :)
>
>
> That should not happen since it will wait for specific events in dmesg for
> this (gpu reset failed, gpu crash, oops. bug and lockdep splat).
> What does your wifi driver produce?

Nothing horrible... just errors about... stuff. And obviously nouveau
prints errors too, many of which are expected due to tests doing funny
things (like reading out-of-bounds things).

>
>>>
>>>    -ilia
>>
>> I think that answers that question Martin.
>>
>> I would lean towards adding a "--dmesg-abort" (or whatever name) flag to
>> be passed in addition to --dmesg when the user wants to stop on dmesg
>> errors.
>
>
> I would personally say that we should not call it dmesg at all and just use
> whatever information we have to detect when we need to abort. So, something
> like --abort-on-critical-error would make more sense and would not be
> misleading the users into thinking it is solely using dmesg as an input.

That makes a lot more sense. And if it did that automatically, I'd be
fine with that... as long as it had good "critical error" detection. I
can't imagine what that would look like. So until it's 100%, it should
stay off by default.

  -ilia


More information about the Piglit mailing list