[Piglit] [PATCH 0/6] Recursion tests v2 and fix OOM tests

Thu Jul 28 10:58:35 PDT 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/28/2011 04:22 AM, Jose Fonseca wrote:
> ----- Original Message -----
>> This patch series does a couple things.  First, it adds the ability
>> to
>> limit the amount of memory that a test can use.  There are several
>> test cases (including ones added in this series) that can exhaust
>> system memory on failing implementations.  Setting the rlimit
>> prevents
>> this unfriendly behavior.  The rlimit can be set either via the
>> command line (-rlimit option) or in the .shader_test file (rlimit in
>> the requirements section).
>>
>> The second thing it does is resubmit my GLSL recursion tests with an
>> rlimit set in all.tests.
>>
>> Finally, it sets an rlimit for the glsl-*-explosion tests and removes
>> them from the blacklist.
> 
> Another place to do this is in the piglit python framework, via subprocess's preexec_fn argument.
> 
> I have code to do that for several OSes in a bunch of scripts I use to automate many GL testsuites with Hudson/Jenkins. I'm working on folding some of these into piglit python framework, but I haven't been able to do this.
> 
> FWIW, below are the relevant bits:

That's really cool, and I think that will be useful for a bunch of other
things.  Chad had made a similar comment to me about running the tests
on embedded / small devices.  There we'll want to limit the memory usage
even more.  I especially like that it limits the memory usage to some
fraction of physical memory rather than being hardcoded to 256MB.  I
should probably add something like that to the code that I have.

The reason I made it an option on the test is so that a person can
reproduce the test run exactly in a debugger.  That's much harder to do
if an important bit of the test environment is controlled by the python
framework.

>     # http://www.velocityreviews.com/forums/t587425-python-system-information.html
> 
>     _meminfo_re = re.compile(r'^(?P<key>\S*):\s*(?P<value>\d*)\s*kB' )
> 
>     def meminfo():
>         """-> dict of data from meminfo (str:int).
>         Values are in kilobytes.
>         """
>         result = {}
>         for line in open('/proc/meminfo'):
>             match = _meminfo_re.match(line)
>             if match:
>                 key, value = match.groups(['key', 'value'])
>                 result[key] = int(value) * 1024
>         return result
> 
>     def total_physical_memory():
>         return meminfo()['MemTotal']
> 
>     def preexec_fn():
>         # http://stackoverflow.com/questions/1689505/python-ulimit-and-nice-for-subprocess-call-subprocess-popen
>         import resource
> 
>         # Generate core files so that we can see back traces
>         if sys.platform == 'darwin':
>             lim = resource.RLIM_INFINITY
>         else:
>             lim = 128*1024*1024
>         resource.setrlimit(resource.RLIMIT_CORE, (lim, lim))
>         
>         # Don't let the test program to use more than 3/4 of the physical memory, to prevent excessive swapping, or termination of the test harness programs via OOM
>         maxmem = total_physical_memory()*3/4
>         resource.setrlimit(resource.RLIMIT_AS, (maxmem, maxmem))
> 
>     p = subprocess.Popen(
>         args, 
>         preexec_fn = preexec_fn,
>     )
> 
> I also have code for timeouts etc.

Timeouts are tricky.  We want to be able to have tests that legitimately
take a long time, but we want to detect cases where the test isn't going
to make progress.

You and Ken should talk.  He's been working on some piglit hooks to
determine when a test causes a GPU hang, kernel oops, etc.  Ideally we'd
like to be able to detect these catastrophic cases, kill the test,
reboot, and continue the run with the next test.  Some of Chad's work on
the JSON serialization will also help.

> IMHO, the python harness is a better place for health monitoring / limit enforcing, first because in certain OSes (such as Window) is impossible to do from inside the program reliably; second because it makes it easier to integrate w
> 
> 
> I also think this should be enabled in all tests, not just these ones.

Part of the problem is that the rlimit used (RLIMIT_AS) applies to all
mapped memory.  I had tired to RLIMIT_DATA, but malloc doesn't use sbrk
these days.  It uses mmap of anonymous files.  As a result, if the
driver maps a big chunk of GPU memory, it can easily exceed the set
limit.  I don't want my texture test that uses an 4096x4096 texture to
fail because it exceeds the artificial rlimit.

This is even worse on DRI1-like drivers that map all of GPU (or AGP)
memory just for fun.  Those architectures would either fail every test
or wouldn't be able to prevent tests that OOM from disrupting the rest
to the test run.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk4xo0oACgkQX1gOwKyEAw9lkgCeNTaiKAmhKBN1DULIDwlzZupw
/wIAoIf2irpdpB/D/pUVmLKUIMH+jXfy
=VwEs
-----END PGP SIGNATURE-----