[Piglit] [PATCH 0/6] Recursion tests v2 and fix OOM tests

Fri Jul 29 13:09:28 PDT 2011

----- Original Message -----
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 07/28/2011 04:22 AM, Jose Fonseca wrote:
> > ----- Original Message -----
> >> This patch series does a couple things.  First, it adds the
> >> ability
> >> to
> >> limit the amount of memory that a test can use.  There are several
> >> test cases (including ones added in this series) that can exhaust
> >> system memory on failing implementations.  Setting the rlimit
> >> prevents
> >> this unfriendly behavior.  The rlimit can be set either via the
> >> command line (-rlimit option) or in the .shader_test file (rlimit
> >> in
> >> the requirements section).
> >>
> >> The second thing it does is resubmit my GLSL recursion tests with
> >> an
> >> rlimit set in all.tests.
> >>
> >> Finally, it sets an rlimit for the glsl-*-explosion tests and
> >> removes
> >> them from the blacklist.
> > 
> > Another place to do this is in the piglit python framework, via
> > subprocess's preexec_fn argument.
> > 
> > I have code to do that for several OSes in a bunch of scripts I use
> > to automate many GL testsuites with Hudson/Jenkins. I'm working on
> > folding some of these into piglit python framework, but I haven't
> > been able to do this.
> > 
> > FWIW, below are the relevant bits:
> 
> That's really cool, and I think that will be useful for a bunch of
> other
> things.  Chad had made a similar comment to me about running the
> tests
> on embedded / small devices.  There we'll want to limit the memory
> usage
> even more.  I especially like that it limits the memory usage to some
> fraction of physical memory rather than being hardcoded to 256MB.  I
> should probably add something like that to the code that I have.
> 
> The reason I made it an option on the test is so that a person can
> reproduce the test run exactly in a debugger.  That's much harder to
> do
> if an important bit of the test environment is controlled by the
> python
> framework.
> 
> >     #
> >     http://www.velocityreviews.com/forums/t587425-python-system-information.html
> > 
> >     _meminfo_re =
> >     re.compile(r'^(?P<key>\S*):\s*(?P<value>\d*)\s*kB' )
> > 
> >     def meminfo():
> >         """-> dict of data from meminfo (str:int).
> >         Values are in kilobytes.
> >         """
> >         result = {}
> >         for line in open('/proc/meminfo'):
> >             match = _meminfo_re.match(line)
> >             if match:
> >                 key, value = match.groups(['key', 'value'])
> >                 result[key] = int(value) * 1024
> >         return result
> > 
> >     def total_physical_memory():
> >         return meminfo()['MemTotal']
> > 
> >     def preexec_fn():
> >         #
> >         http://stackoverflow.com/questions/1689505/python-ulimit-and-nice-for-subprocess-call-subprocess-popen
> >         import resource
> > 
> >         # Generate core files so that we can see back traces
> >         if sys.platform == 'darwin':
> >             lim = resource.RLIM_INFINITY
> >         else:
> >             lim = 128*1024*1024
> >         resource.setrlimit(resource.RLIMIT_CORE, (lim, lim))
> >         
> >         # Don't let the test program to use more than 3/4 of the
> >         physical memory, to prevent excessive swapping, or
> >         termination of the test harness programs via OOM
> >         maxmem = total_physical_memory()*3/4
> >         resource.setrlimit(resource.RLIMIT_AS, (maxmem, maxmem))
> > 
> >     p = subprocess.Popen(
> >         args,
> >         preexec_fn = preexec_fn,
> >     )
> > 
> > I also have code for timeouts etc.
> 
> Timeouts are tricky.  We want to be able to have tests that
> legitimately
> take a long time, but we want to detect cases where the test isn't
> going
> to make progress.
> 
> You and Ken should talk.  He's been working on some piglit hooks to
> determine when a test causes a GPU hang, kernel oops, etc.  Ideally
> we'd
> like to be able to detect these catastrophic cases, kill the test,
> reboot, and continue the run with the next test.  Some of Chad's work
> on
> the JSON serialization will also help.
> 
> > IMHO, the python harness is a better place for health monitoring /
> > limit enforcing, first because in certain OSes (such as Window) is
> > impossible to do from inside the program reliably; second because
> > it makes it easier to integrate w
> > 
> > 
> > I also think this should be enabled in all tests, not just these
> > ones.
> 
> Part of the problem is that the rlimit used (RLIMIT_AS) applies to
> all
> mapped memory.  I had tired to RLIMIT_DATA, but malloc doesn't use
> sbrk
> these days.  It uses mmap of anonymous files.  As a result, if the
> driver maps a big chunk of GPU memory, it can easily exceed the set
> limit.  I don't want my texture test that uses an 4096x4096 texture
> to
> fail because it exceeds the artificial rlimit.
> 
> This is even worse on DRI1-like drivers that map all of GPU (or AGP)
> memory just for fun.  Those architectures would either fail every
> test
> or wouldn't be able to prevent tests that OOM from disrupting the
> rest
> to the test run.

Yes, that's tricky.

A possible solution would be to parse /proc/<pid>/maps periodically then.

Jose