[Intel-gfx] [PATCH i-g-t] gem_flink_race/prime_self_import: Improve test reliability

Daniel Vetter daniel at ffwll.ch
Thu Dec 10 04:52:53 PST 2015


On Thu, Dec 10, 2015 at 11:51:29AM +0000, Morton, Derek J wrote:
> >
> >
> >-----Original Message-----
> >From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of Daniel Vetter
> >Sent: Thursday, December 10, 2015 10:13 AM
> >To: Morton, Derek J
> >Cc: intel-gfx at lists.freedesktop.org; Wood, Thomas
> >Subject: Re: [Intel-gfx] [PATCH i-g-t] gem_flink_race/prime_self_import: Improve test reliability
> >
> >On Tue, Dec 08, 2015 at 12:44:44PM +0000, Derek Morton wrote:
> >> gem_flink_race and prime_self_import have subtests which read the 
> >> number of open gem objects from debugfs to determine if objects have 
> >> leaked during the test. However the test can fail sporadically if the 
> >> number of gem objects changes due to other process activity.
> >> This patch introduces a change to check the number of gem objects 
> >> several times to filter out any fluctuations.
> >
> >Why exactly does this happen? IGT tests should be run on bare metal,
> >with everything else killed/subdued/shutup. If there's still things
> >going on that create objects, we need to stop them from doing that.
> >
> >If this only applies to Android, or some special Android deamon them
> >imo check for that at runtime and igt_skip("your setup is invalid,
> >deamon %s running\n"); is the correct fix. After all just because you
> >sampled for a bit doesn't mean that it wont still change right when you
> >start running the test for real, so this is still fragile.
> 
> Before running tests on android we do stop everything possible. I
> suspect the culprit is coreu getting automatically restarted after it is
> stopped. I had additional debug while developing this patch and what I
> saw was the system being mostly quiescent but with some very low level
> background activity. 1 extra object being created and then deleted
> occasionally. Depending on whether it occurred at the start or end of
> the test it was resulting in a reported leak of either 1 or -1 objects.
> The patch fixes that issue by taking several samples and requiring them
> to be the same, therefore filtering out the low level background noise.
> It would not help if something in the background allocated an object and
> kept it allocated, but I have not seen that happen. I only saw once the
> object count increasing for 2 consecutive reads hence the count to 4 to
> give a margin. The test was failing about 10%. With this patch I got
> 100% pass across 300 runs of each of the tests.

Hm, piglit checks that there's no other drm clients running. Have you
tried re-running that check to zero in on the culprit?

> If you are concerned about the behaviour when running the test with a
> load of background activity I could add code to limit to the reset of
> the count and fail the test in that instance. That would give a benefit
> of distinguishing a test fail due to excessive background activity from
> a detected leak.

I'm also concerned for the overhead this causes everyone else. If this
really is some Android trouble then I think it'd be good to only compile
this on Android. But would still be much better if you can get to a
reliably clean test environment.

> I would not want to just have the test skip as that introduces a hole in
> our test coverage.
> 
> >Also would be good to extract get_stable_obj_count to a proper igt
> >library function, if it indeed needs to be this tricky. And then add
> >the explanation for why we need this in the gtkdoc.
> 
> I  can move the code to an igt library. Which library would you suggest? Igt_debugfs ?

Hm yeah, it's a bit the dumping ground for all things debugfs access ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the Intel-gfx mailing list