[Piglit] [PATCH] Add dmesg option for reboot policy

Tue Nov 24 15:40:32 PST 2015

On Tue, Nov 24, 2015 at 04:14:23AM -0800, yann.argotti at linux.intel.com wrote:
> From: Yann Argotti <yann.argotti at linux.intel.com>
> Date: Tue, 24 Nov 2015 12:16:34 +0100
> 
>  This adds a policy which advises when user should reboot system to avoid
>  noisy test results due to system becoming unstable, for instance, and
>  therefore continues testing successfully. To do this, a new Dmesg class is
>  proposed which is not filtering dmesg and monitors whether or not one of
>  the following event occurs:
>   - gpu reset failed (not just gpu reset happened, that happens
>  way too often and many tests even provoke hangs intentionally)  - gpu crash,
>  - Oops:  - BUG  - lockdep splat that causes the locking validator to get
>  disabled If one of these issues happen, piglit test execution is stopped
>  -terminating test thread pool- and exit with code 3 to inform that reboot is
>  advised. Then test execution resume, after rebooting system or not, is done
>  like usually with command line parameter "resume".
> 
>  To call it, use command line parameter: --dmesg monitored
>  (this is original patch for review and therefore didn't apply any proposed
>  change yet)
> 
> Signed-off-by: Yann Argotti <yann.argotti at linux.intel.com>

I don't like the aproach of passing the returncode through every level
of piglit at all. I'd much rather use an exception for this, either by
extending PiglitFatalError or by adding a new exception, and then
catching it either in the exceptions.handler decorator, or if we have to
in run/resume.

I've got a couple more comments inline.

Also, please figure out where the line-wrapping is coming from and
remove it (if you're pasting this into a window in your email client you
will get line-wrap which breaks git am and git apply). Using git
send-email is really the best way to do this.

[snip]

> +
> +    def reboot_is_needed(self):
> +        """ Simply return the reboot_needed variable state """
> +        return self._reboot_needed

This is what a property is for:
@property
def reboot_needed(self):
return self._reboot_needed

Then you access the value as an attribute rather than as a function.

If self.reboot_needed:
    print('reboot now!')

> +
> +    def update_dmesg(self):
> +        """ Call dmesg using subprocess.check_output
> +
> +        Get the contents of dmesg, then calculate new messages, finally set
> +        self.dmesg equal to the just read contents of dmesg.
> +
> +        """
> +        dmesg = subprocess.check_output(self.DMESG_COMMAND).strip().splitlines()
> +
> +        # Find all new entries, do this by slicing the list of dmesg to only
> +        # returns elements after the last element stored. If there are not
> +        # matches a value error is raised, that means all of dmesg is new
> +        l = 0
> +        for index, item in enumerate(reversed(dmesg)):
> +            if item == self._last_message:
> +                l = len(dmesg) - index  # don't include the matched element
> +                break
> +        self._new_messages = dmesg[l:]
> +
> +        if (not self._first_dmesg_process):
> +            if (any("[drm] Failed to reset chip" in msg for msg in self._new_messages) or
> +                any("[drm] GPU crash" in msg for msg in self._new_messages) or
> +                any("Oops:" in msg for msg in self._new_messages) or
> +                any("BUG:" in msg for msg in self._new_messages) or
> +                any("turning off the locking correctness validator" in msg for msg in self._new_messages)):
> +               self._reboot_needed = True
> +        else:
> +           self._first_dmesg_process = False
> +
> +        # Attempt to store the last element of dmesg, unless there was no dmesg
> +        self._last_message = dmesg[-1] if dmesg else None
> +
> +    def update_result(self, result):
> +        """ Takes a TestResult object and updates it with dmesg statuses
> +
> +        There is no filtering/change of status in dmesg and return the
> +        original result passed in.
> +
> +        Arguments:
> +        result -- A TestResult instance
> +
> +        """
> +
> +        # Get a new snapshot of dmesg
> +        self.update_dmesg()
> +
> +        # if update_dmesg() found new entries replace the results of the test
> +        # and subtests
> +        if self._new_messages:
> +
> +            if self.regex:
> +                for line in self._new_messages:
> +                    if self.regex.search(line):
> +                        break
> +                else:
> +                    return result
> +
> +            # Add the dmesg values to the result
> +            result.dmesg = "\n".join(self._new_messages)
> +
> +        return result
> +

There is a ton of duplication between LinuxDmesg and
LinuxMonitoredDmesg. I haven't looked too closely, but it seems like
LinuxMonitoredDmesg could inherit from LinuxDmesg?

[snip]

Dylan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/piglit/attachments/20151124/be8b2c7d/attachment.sig>