[igt-dev] [PATCH i-g-t] i915/gem_exec_parallel: Add skip_on_simulation to whole binary

Chris Wilson chris at chris-wilson.co.uk
Fri Feb 8 13:57:44 UTC 2019


Quoting Katarzyna Dec (2019-02-08 13:49:28)
> On Fri, Feb 08, 2019 at 01:42:42PM +0000, Chris Wilson wrote:
> > Quoting Katarzyna Dec (2019-02-08 13:32:38)
> > > On Fri, Feb 08, 2019 at 01:27:51PM +0000, Chris Wilson wrote:
> > > > Quoting Katarzyna Dec (2019-02-08 13:23:59)
> > > > > On Fri, Feb 08, 2019 at 01:14:26PM +0000, Chris Wilson wrote:
> > > > > > Quoting Katarzyna Dec (2019-02-08 13:10:37)
> > > > > > > On simulation environment these tests were failing. During debug
> > > > > > > we decided that whole binary should be skipped, because there is
> > > > > > > no additional coverage in simulation.
> > > > > > 
> > > > > > How did they fail? There are some paths here that are rarely touched
> > > > > > elsewhere.
> > > > > > -Chris
> > > > > I like you challenges Chris :)
> > > > > 
> > > > > gem_exec_parallel@*context* were failing with error:
> > > > > 
> > > > > (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: Test assertion failure
> > > > > function gem_execbuf, file ../lib/ioctl_wrappers.c:605:
> > > > > (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: Failed assertion:
> > > > > __gem_execbuf(fd, execbuf) == 0
> > > > > (gem_exec_parallel:1245) ioctl_wrappers-CRITICAL: error: -28 != 0
> > > > 
> > > > Should not happen.
> > > But happends and is reproducible on some CI runs.
> > 
> > Well fix it. It is not a valid error in this case. ENOSPC is only valid
> > in the case the user submitted more than could be fitted into the GGTT
> > (or had an overlap in their softpin). In this instance, it means that
> > one _context_ impacted another client, and that is simply not allowed.
> >
> According to what you are saying we need to debug and fix the issue.

Yes, it'll be either unable to pin the context image or allocate the id.
Both of which should invoke a GC to recover resources (no client should
have a permanent allocation, mere temporarily leases) rather than fail
with ENOSPC, 

> > > > > and @*fds* with:
> > > > > 
> > > > > (gem_exec_parallel:1315) igt_debugfs-CRITICAL: Last errno: 24, Too many open
> > > > > files
> > > > 
> > > > That is boring, but more indicative of system setup failure. We should
> > > > be configuring the system to allow ourselves, as root, to use as many fd
> > > > as can fit in memory.
> > > > 
> > > > I'd recommend a pass through the basic stress test at least. It won't
> > > > take a lot to exercise some unusual conditions.
> > > > -Chris
> > > 
> > > What if we skip all *context* and all *fds* and leave rest as it is?
> > > Will it be resonable in this shape?
> > 
> > I would rewrite it such that the threads ran for a certain period of
> > time rather than iterations; set basic to something small (say 2s) and
> > left everything else to run for 150s (tends to be the common value I've
> Is this comment to *fds* testcases or general to whole binary?

Whole binary; I think it's useful enough to run at least 2 passes with a
1024 or more competing threads (more!) because this test has proven
itself to be an early warning sign of problems that have escaped the
"purpose built" tests. It doesn't have to be in the first round of
testing, and it doesn't have to be extensive, just enough to load the
system up to as much as it can handle and check it doesn't fall over.

It would be nice to extend the normal test out to huge number of threads
(even say 16,000 but there are realistic limits to both the number of fd
and contexts that can fit in memory that need to be determined to find
the ideal number).
-Chris


More information about the igt-dev mailing list