[Intel-gfx] [PATCH i-g-t] gem_flink_race/prime_self_import: Improve test reliability

Morton, Derek J derek.j.morton at intel.com
Fri Dec 11 02:33:46 PST 2015


>
>
>-----Original Message-----
>From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of Daniel Vetter
>Sent: Thursday, December 10, 2015 12:53 PM
>To: Morton, Derek J
>Cc: Daniel Vetter; intel-gfx at lists.freedesktop.org; Wood, Thomas
>Subject: Re: [Intel-gfx] [PATCH i-g-t] gem_flink_race/prime_self_import: Improve test reliability
>
>On Thu, Dec 10, 2015 at 11:51:29AM +0000, Morton, Derek J wrote:
>> >
>> >
>> >-----Original Message-----
>> >From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of 
>> >Daniel Vetter
>> >Sent: Thursday, December 10, 2015 10:13 AM
>> >To: Morton, Derek J
>> >Cc: intel-gfx at lists.freedesktop.org; Wood, Thomas
>> >Subject: Re: [Intel-gfx] [PATCH i-g-t] 
>> >gem_flink_race/prime_self_import: Improve test reliability
>> >
>> >On Tue, Dec 08, 2015 at 12:44:44PM +0000, Derek Morton wrote:
>> >> gem_flink_race and prime_self_import have subtests which read the 
>> >> number of open gem objects from debugfs to determine if objects 
>> >> have leaked during the test. However the test can fail sporadically 
>> >> if the number of gem objects changes due to other process activity.
>> >> This patch introduces a change to check the number of gem objects 
>> >> several times to filter out any fluctuations.
>> >
>> >Why exactly does this happen? IGT tests should be run on bare metal, 
>> >with everything else killed/subdued/shutup. If there's still things 
>> >going on that create objects, we need to stop them from doing that.
>> >
>> >If this only applies to Android, or some special Android deamon them 
>> >imo check for that at runtime and igt_skip("your setup is invalid, 
>> >deamon %s running\n"); is the correct fix. After all just because you 
>> >sampled for a bit doesn't mean that it wont still change right when 
>> >you start running the test for real, so this is still fragile.
>> 
>> Before running tests on android we do stop everything possible. I 
>> suspect the culprit is coreu getting automatically restarted after it 
>> is stopped. I had additional debug while developing this patch and 
>> what I saw was the system being mostly quiescent but with some very 
>> low level background activity. 1 extra object being created and then 
>> deleted occasionally. Depending on whether it occurred at the start or 
>> end of the test it was resulting in a reported leak of either 1 or -1 objects.
>> The patch fixes that issue by taking several samples and requiring 
>> them to be the same, therefore filtering out the low level background noise.
>> It would not help if something in the background allocated an object 
>> and kept it allocated, but I have not seen that happen. I only saw 
>> once the object count increasing for 2 consecutive reads hence the 
>> count to 4 to give a margin. The test was failing about 10%. With this 
>> patch I got 100% pass across 300 runs of each of the tests.
>
>Hm, piglit checks that there's no other drm clients running. Have you tried re-running that check to zero in on the culprit?

We don't use piglet to run IGT tests on Android. I have had a look at what piglet does and added the same check to our scripts. (It reads a list of clients from /sys/kernel/debug/dri/0/clients)
For CHV it shows a process called 'y', though that seems to be some issue on CHV that all driver clients are called 'y'. I checked on BXT which properly shows the process names and it looks like it is the binder process (which  is handling some inter process communication). I don't think this is something we can stop. 

>
>> If you are concerned about the behaviour when running the test with a 
>> load of background activity I could add code to limit to the reset of 
>> the count and fail the test in that instance. That would give a 
>> benefit of distinguishing a test fail due to excessive background 
>> activity from a detected leak.
>
>I'm also concerned for the overhead this causes everyone else. If this really is some Android trouble then I think it'd be good to only compile this on Android. But would still be much better if you can get to a reliably clean test environment.

I will make the loop part android specific.


//Derek

>
>> I would not want to just have the test skip as that introduces a hole 
>> in our test coverage.
>> 
>> >Also would be good to extract get_stable_obj_count to a proper igt 
>> >library function, if it indeed needs to be this tricky. And then add 
>> >the explanation for why we need this in the gtkdoc.
>> 
>> I  can move the code to an igt library. Which library would you suggest? Igt_debugfs ?
>
>Hm yeah, it's a bit the dumping ground for all things debugfs access ;-) -Daniel
>--
>Daniel Vetter
>Software Engineer, Intel Corporation
>http://blog.ffwll.ch
>


More information about the Intel-gfx mailing list