[Piglit] [PATCH 2/3] framework: Add support for a test timeout

Tue Nov 12 06:35:12 PST 2013

On Tuesday, November 12, 2013 11:36:32 AM Daniel Vetter wrote:
> i-g-t tests can take a long time, and for a bunch of reasons (bad
> tuning on disparate set of platforms, stuck in the kernel which often
> can be recovered by interrupting with a signal, ...) that sometimes
> extends to a good approximation of forever.
> 
> Hence this adds a per-test timeout value and a bit of infrastructure
> to adjust the results. Test results adjusting is done after calling
> interpretResult so that we don't have to replicate this all over the
> place. This might need to be adjusted for the piglit-native subtest
> stuff, but otoh igt is a bit special with it's crazy long-running
> tests. So I think we can fix this once it's actually needed.
> 
> The default timeout is None, so this is purely opt-in.
> 
> Note on the implementation: SIG_ALARM doesn't exists on Windows and
> stackoverflow overwhelmingly recommended to go with this thread-based
> approach here. But only tested on my Linux box here.

I've been looking into this myself, could you link the SO that recomends this? 
I'm concerned that the GIL is going to bite us using a thread.

> ---
>  framework/exectest.py | 28 +++++++++++++++++++++++++---
>  1 file changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/framework/exectest.py b/framework/exectest.py
> index 986d803..32b0062 100644
> --- a/framework/exectest.py
> +++ b/framework/exectest.py
> @@ -23,6 +23,7 @@
>  import errno
>  import os
>  import subprocess
> +import threading
>  import shlex
>  import types
>  import re
> @@ -72,6 +73,7 @@ class ExecTest(Test):
>          self.command = command
>          self.split_command = os.path.split(self.command[0])[1]
>          self.env = {}
> +        self.timeout = None
> 
>          if isinstance(self.command, basestring):
>              self.command = shlex.split(str(self.command))
> @@ -113,7 +115,7 @@ class ExecTest(Test):
>                  else:
>                      if env.dmesg:
>                          old_dmesg = read_dmesg()
> -                    (out, err, returncode) = \
> +                    (out, err, returncode, timeout) = \
>                          self.get_command_result(command, fullenv)
>                      if env.dmesg:
>                          dmesg_diff = get_dmesg_diff(old_dmesg,
> read_dmesg()) @@ -172,6 +174,9 @@ class ExecTest(Test):
>              elif returncode != 0:
>                  results['note'] = 'Returncode was {0}'.format(returncode)
> 
> +            if timeout:
> +                results['result'] = 'timeout'
> +

I would prefer not creating a new result status, since this probably should 
just be a fail or a crash with a note. If we really do want another status 
you'll need to add a timeout status in framework/status.py and css code in 
templates/index.css.

>              if env.valgrind:
>                  # If the underlying test failed, simply report
>                  # 'skip' for this valgrind test.
> @@ -196,6 +201,7 @@ class ExecTest(Test):
>              results['returncode'] = returncode
>              results['command'] = ' '.join(self.command)
>              results['dmesg'] = dmesg_diff
> +            results['timeout'] = timeout
> 
>              self.handleErr(results, err)
> 
> @@ -222,7 +228,23 @@ class ExecTest(Test):
>                                      stderr=subprocess.PIPE,
>                                      env=fullenv,
>                                      universal_newlines=True)
> -            out, err = proc.communicate()
> +            out = ''
> +            err = ''
> +
> +            def thread_fn():
> +                out, err = proc.communicate()
> +
> +            thread = threading.Thread(target=thread_fn)
> +            thread.start()
> +
> +            thread.join(self.timeout)
> +
> +            if thread.is_alive():
> +                proc.terminate()
> +                thread.join()
> +                timeout = True
> +            else:
> +                timeout = False

Does it make more sense to put this in the threadpool rather than in each 
test? It would come with the limitation of one timeout value on a given run, 
but I doubt that most people run igt and glean/native tests together, and we 
could easily just make the timeout value a property of the .tests file, 
possibly with a commandline option?

>              returncode = proc.returncode
>          except OSError as e:
>              # Different sets of tests get built under
> @@ -237,7 +259,7 @@ class ExecTest(Test):
>                  returncode = None
>              else:
>                  raise e
> -        return out, err, returncode
> +        return out, err, returncode, timeout
> 
> 
>  class PlainExecTest(ExecTest):
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/piglit/attachments/20131112/19f66d4a/attachment.pgp>