[Piglit] [PATCH 2/3] framework: Add support for a test timeout

Tue Nov 12 10:13:22 PST 2013

On Tue, Nov 12, 2013 at 3:35 PM, Dylan Baker <baker.dylan.c at gmail.com> wrote:
> On Tuesday, November 12, 2013 11:36:32 AM Daniel Vetter wrote:
>> i-g-t tests can take a long time, and for a bunch of reasons (bad
>> tuning on disparate set of platforms, stuck in the kernel which often
>> can be recovered by interrupting with a signal, ...) that sometimes
>> extends to a good approximation of forever.
>>
>> Hence this adds a per-test timeout value and a bit of infrastructure
>> to adjust the results. Test results adjusting is done after calling
>> interpretResult so that we don't have to replicate this all over the
>> place. This might need to be adjusted for the piglit-native subtest
>> stuff, but otoh igt is a bit special with it's crazy long-running
>> tests. So I think we can fix this once it's actually needed.
>>
>> The default timeout is None, so this is purely opt-in.
>>
>> Note on the implementation: SIG_ALARM doesn't exists on Windows and
>> stackoverflow overwhelmingly recommended to go with this thread-based
>> approach here. But only tested on my Linux box here.
>
> I've been looking into this myself, could you link the SO that recomends this?
> I'm concerned that the GIL is going to bite us using a thread.

http://stackoverflow.com/questions/1191374/subprocess-with-timeout

>> ---
>>  framework/exectest.py | 28 +++++++++++++++++++++++++---
>>  1 file changed, 25 insertions(+), 3 deletions(-)
>>
>> diff --git a/framework/exectest.py b/framework/exectest.py
>> index 986d803..32b0062 100644
>> --- a/framework/exectest.py
>> +++ b/framework/exectest.py
>> @@ -23,6 +23,7 @@
>>  import errno
>>  import os
>>  import subprocess
>> +import threading
>>  import shlex
>>  import types
>>  import re
>> @@ -72,6 +73,7 @@ class ExecTest(Test):
>>          self.command = command
>>          self.split_command = os.path.split(self.command[0])[1]
>>          self.env = {}
>> +        self.timeout = None
>>
>>          if isinstance(self.command, basestring):
>>              self.command = shlex.split(str(self.command))
>> @@ -113,7 +115,7 @@ class ExecTest(Test):
>>                  else:
>>                      if env.dmesg:
>>                          old_dmesg = read_dmesg()
>> -                    (out, err, returncode) = \
>> +                    (out, err, returncode, timeout) = \
>>                          self.get_command_result(command, fullenv)
>>                      if env.dmesg:
>>                          dmesg_diff = get_dmesg_diff(old_dmesg,
>> read_dmesg()) @@ -172,6 +174,9 @@ class ExecTest(Test):
>>              elif returncode != 0:
>>                  results['note'] = 'Returncode was {0}'.format(returncode)
>>
>> +            if timeout:
>> +                results['result'] = 'timeout'
>> +
>
> I would prefer not creating a new result status, since this probably should
> just be a fail or a crash with a note. If we really do want another status
> you'll need to add a timeout status in framework/status.py and css code in
> templates/index.css.

Hm, I think the special timeout failure mode is valuable to check for
tests that take too long. I'll update the patch with the missing stuff
(I usually just run the tests without generating summaries here, hence
why I've missed this).

>>              if env.valgrind:
>>                  # If the underlying test failed, simply report
>>                  # 'skip' for this valgrind test.
>> @@ -196,6 +201,7 @@ class ExecTest(Test):
>>              results['returncode'] = returncode
>>              results['command'] = ' '.join(self.command)
>>              results['dmesg'] = dmesg_diff
>> +            results['timeout'] = timeout
>>
>>              self.handleErr(results, err)
>>
>> @@ -222,7 +228,23 @@ class ExecTest(Test):
>>                                      stderr=subprocess.PIPE,
>>                                      env=fullenv,
>>                                      universal_newlines=True)
>> -            out, err = proc.communicate()
>> +            out = ''
>> +            err = ''
>> +
>> +            def thread_fn():
>> +                out, err = proc.communicate()
>> +
>> +            thread = threading.Thread(target=thread_fn)
>> +            thread.start()
>> +
>> +            thread.join(self.timeout)
>> +
>> +            if thread.is_alive():
>> +                proc.terminate()
>> +                thread.join()
>> +                timeout = True
>> +            else:
>> +                timeout = False
>
> Does it make more sense to put this in the threadpool rather than in each
> test? It would come with the limitation of one timeout value on a given run,
> but I doubt that most people run igt and glean/native tests together, and we
> could easily just make the timeout value a property of the .tests file,
> possibly with a commandline option?

Oh right I didn't benchmark this, probably slows down normal piglit
runs quite a bit. I'll test that and if there's an measurable effect
add a check for timeout == None.

The reason that I've gone with a per-test timeout is that in igt I
only expect some tests to take long, others should complete fairly
quickly. So the next step would be to add a bit more smarts and adjust
the timeout a bit.

This was essentially just a quick hack for igt tests, if other people
think this is useful for them I can try to make it fancy.
-Daniel

>>              returncode = proc.returncode
>>          except OSError as e:
>>              # Different sets of tests get built under
>> @@ -237,7 +259,7 @@ class ExecTest(Test):
>>                  returncode = None
>>              else:
>>                  raise e
>> -        return out, err, returncode
>> +        return out, err, returncode, timeout
>>
>>
>>  class PlainExecTest(ExecTest):

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch