[Piglit] [PATCH 00/35] Serialize profiles into XML at build time

Mon May 7 08:20:46 UTC 2018

On 05/07/2018 10:17 AM, Tomi Sarvela wrote:
> On 05/04/2018 07:57 PM, Dylan Baker wrote:
>> Quoting Juan A. Suarez Romero (2018-05-04 04:50:27)
>>> On Fri, 2018-05-04 at 12:03 +0200, Juan A. Suarez Romero wrote:
>>>> On Wed, 2018-05-02 at 13:57 -0700, Dylan Baker wrote:
>>>>> Quoting Juan A. Suarez Romero (2018-05-02 09:49:08)
>>>>>> Hi, Dylan.
>>>>>>
>>>>>> I see you've pushed this series.
>>>>>>
>>>>>> Now, when I'm trying to run some profiles (mainly, tests/crucible and
>>>>>> tests/khr_gl* ), seems they are broken:
>>>>>>
>>>>>> [0000/7776]
>>>>>> Traceback (most recent call last):
>>>>>>    File "./piglit", line 178, in <module>
>>>>>>      main()
>>>>>>    File "./piglit", line 174, in main
>>>>>>      sys.exit(runner(args))
>>>>>>    File "/home/igalia/jasuarez/piglit/framework/exceptions.py", 
>>>>>> line 51, in
>>>>>> _inner
>>>>>>      func(*args, **kwargs)
>>>>>>    File "/home/igalia/jasuarez/piglit/framework/programs/run.py", 
>>>>>> line 370, in
>>>>>> run
>>>>>>      backend.finalize({'time_elapsed': time_elapsed.to_json()})
>>>>>>    File "/home/igalia/jasuarez/piglit/framework/backends/json.py", 
>>>>>> line 163, in
>>>>>> finalize
>>>>>>      assert data['tests']
>>>>>> AssertionError
>>>>>>
>>>>>>          J.A.
>>>>>>
>>>>>
>>>>> Dang.
>>>>>
>>>>> I can't reproduce any failures with crucible, though I did make it 
>>>>> thread safe
>>>>> and fix the using a config file :)
>>>>>
>>>>> I can't get the glcts binary to run, no matter what target I build 
>>>>> for I run
>>>>> into either EGL errors of GL errors.
>>>>>
>>>>
>>>> More info on this issue.
>>>>
>>>> It seems it happens with the profiles that requires to use an 
>>>> external runner
>>>> (crucible, vk-gl-cts, deqp, ...).
>>>>
>>>>
>>>> When executing, it tells it will run all the tests, but sometimes it 
>>>> just
>>>> execute one test, other times 2, and other times none. It is in the 
>>>> last case
>>>> when the error above is shown.
>>>>
>>>> Still don't know why.
>>>>
>>>
>>>
>>> Found the problem in this commit:
>>>
>>> commit 9461d92301e72807eba4776a16a05207e3a16477
>>> Author: Dylan Baker <dylan at pnwbakers.com>
>>> Date:   Mon Mar 26 15:23:17 2018 -0700
>>>
>>>      framework/profile: Add a __len__ method to TestProfile
>>>      This exposes a standard interface for getting the number of 
>>> tests in a
>>>      profile, which is itself nice. It will also allow us to 
>>> encapsulate the
>>>      differences between the various profiles added in this series.
>>>      Tested-by: Rafael Antognolli <rafael.antognolli at intel.com>
>>>
>>>
>>
>> I'm really having trouble reproducing this, the vulkan cts and 
>> crucible both run
>> fine for me, no matter how many times I stop and start them. I even 
>> tried with
>> python2 and couldn't reproduce. Can you give me some more information 
>> about your
>> system?
> 
> I think I've hit this same issue on our CI.
> 
> Symptoms match so that we sometimes run the whole 25k piglit gbm 
> testset, sometimes we stop around the test 400-600. This behaviour can 
> change with subsequent runs without rebooting the machine. Test where 
> run is stopped is usually the same, and changes if filters change.
> 
> I can reproduce this with -d / --dry-run so the tests themselves are not 
> an issue. Filtering with large -x / --exclude-tests might play a part. 
> The command line is max 25kB, so there shouldn't be cutoff point with 
> partial regex, which then would match too much.
> 
> I'm just starting to investigate where does the test list size drop so 
> dramatically, probably by inserting testlist size debugs around to see 
> where it takes me.
> 
> Environment: Ubuntu 18.04 LTS with default mesa
> Kernel: DRM-Tip HEAD or Ubuntu default.
> 
> Commandline is built with bash array from blacklist. This looks correct, 
> and sometimes works correctly. Eg
> 
> ./piglit run tests/gpu ~/results -d -o -l verbose "${OPTIONS[@]}"
> 
> where $OPTIONS is an array of
> '-x', 'timestamp-get',
> '-x', 'glsl-routing', ...
> 
> Successful CI runlog:
> http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_4148/pig-glk-j5005/run0.log
> 
> Unsuccessful CI runlog:
> http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_4149/pig-glk-j5005/run0.log
> 
> Between those two runs, only kernel has changed.
> 
> The issue is easiest to reproduce with GLK. HSW seems to be somewhat 
> affected too, so the host speed might play a part.

Patch below makes the issue disappear for my GLK testrig.

With multiprocessing.pool.imap I'm getting rougly 50% correct behaviour 
and 50% early exists on dry-runs.

With multiprocessing.pool.map I'm not getting early exists at all.

Sample size is ~50 runs for both setups.

With the testset of 26179 on GLK dry-run, the runtime difference is 
negligible: pool.map 49s vs pool.imap 50s

piglit/framework$ diff -c profile.py.orig profile.py
*** profile.py.orig	2018-05-07 19:11:37.649994643 +0300
--- profile.py	2018-05-07 19:11:46.880994608 +0300
***************
*** 584,591 ****
               # more code, and adding side-effects
               test_list = (x for x in test_list if filterby(x))

!         pool.imap(lambda pair: test(pair[0], pair[1], profile, pool),
!                   test_list, chunksize)

       def run_profile(profile, test_list):
           """Run an individual profile."""
--- 584,591 ----
               # more code, and adding side-effects
               test_list = (x for x in test_list if filterby(x))

!         pool.map(lambda pair: test(pair[0], pair[1], profile, pool),
!                  test_list, chunksize)

       def run_profile(profile, test_list):
           """Run an individual profile."""

Tomi
-- 
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo