[Piglit] [PATCH 00/35] Serialize profiles into XML at build time

Mon May 7 07:17:58 UTC 2018

On 05/04/2018 07:57 PM, Dylan Baker wrote:
> Quoting Juan A. Suarez Romero (2018-05-04 04:50:27)
>> On Fri, 2018-05-04 at 12:03 +0200, Juan A. Suarez Romero wrote:
>>> On Wed, 2018-05-02 at 13:57 -0700, Dylan Baker wrote:
>>>> Quoting Juan A. Suarez Romero (2018-05-02 09:49:08)
>>>>> Hi, Dylan.
>>>>>
>>>>> I see you've pushed this series.
>>>>>
>>>>> Now, when I'm trying to run some profiles (mainly, tests/crucible and
>>>>> tests/khr_gl* ), seems they are broken:
>>>>>
>>>>> [0000/7776]
>>>>> Traceback (most recent call last):
>>>>>    File "./piglit", line 178, in <module>
>>>>>      main()
>>>>>    File "./piglit", line 174, in main
>>>>>      sys.exit(runner(args))
>>>>>    File "/home/igalia/jasuarez/piglit/framework/exceptions.py", line 51, in
>>>>> _inner
>>>>>      func(*args, **kwargs)
>>>>>    File "/home/igalia/jasuarez/piglit/framework/programs/run.py", line 370, in
>>>>> run
>>>>>      backend.finalize({'time_elapsed': time_elapsed.to_json()})
>>>>>    File "/home/igalia/jasuarez/piglit/framework/backends/json.py", line 163, in
>>>>> finalize
>>>>>      assert data['tests']
>>>>> AssertionError
>>>>>
>>>>>          J.A.
>>>>>
>>>>
>>>> Dang.
>>>>
>>>> I can't reproduce any failures with crucible, though I did make it thread safe
>>>> and fix the using a config file :)
>>>>
>>>> I can't get the glcts binary to run, no matter what target I build for I run
>>>> into either EGL errors of GL errors.
>>>>
>>>
>>> More info on this issue.
>>>
>>> It seems it happens with the profiles that requires to use an external runner
>>> (crucible, vk-gl-cts, deqp, ...).
>>>
>>>
>>> When executing, it tells it will run all the tests, but sometimes it just
>>> execute one test, other times 2, and other times none. It is in the last case
>>> when the error above is shown.
>>>
>>> Still don't know why.
>>>
>>
>>
>> Found the problem in this commit:
>>
>> commit 9461d92301e72807eba4776a16a05207e3a16477
>> Author: Dylan Baker <dylan at pnwbakers.com>
>> Date:   Mon Mar 26 15:23:17 2018 -0700
>>
>>      framework/profile: Add a __len__ method to TestProfile
>>      
>>      This exposes a standard interface for getting the number of tests in a
>>      profile, which is itself nice. It will also allow us to encapsulate the
>>      differences between the various profiles added in this series.
>>      
>>      Tested-by: Rafael Antognolli <rafael.antognolli at intel.com>
>>
>>
> 
> I'm really having trouble reproducing this, the vulkan cts and crucible both run
> fine for me, no matter how many times I stop and start them. I even tried with
> python2 and couldn't reproduce. Can you give me some more information about your
> system?

I think I've hit this same issue on our CI.

Symptoms match so that we sometimes run the whole 25k piglit gbm 
testset, sometimes we stop around the test 400-600. This behaviour can 
change with subsequent runs without rebooting the machine. Test where 
run is stopped is usually the same, and changes if filters change.

I can reproduce this with -d / --dry-run so the tests themselves are not 
an issue. Filtering with large -x / --exclude-tests might play a part. 
The command line is max 25kB, so there shouldn't be cutoff point with 
partial regex, which then would match too much.

I'm just starting to investigate where does the test list size drop so 
dramatically, probably by inserting testlist size debugs around to see 
where it takes me.

Environment: Ubuntu 18.04 LTS with default mesa
Kernel: DRM-Tip HEAD or Ubuntu default.

Commandline is built with bash array from blacklist. This looks correct, 
and sometimes works correctly. Eg

./piglit run tests/gpu ~/results -d -o -l verbose "${OPTIONS[@]}"

where $OPTIONS is an array of
'-x', 'timestamp-get',
'-x', 'glsl-routing', ...

Successful CI runlog:
http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_4148/pig-glk-j5005/run0.log

Unsuccessful CI runlog:
http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_4149/pig-glk-j5005/run0.log

Between those two runs, only kernel has changed.

The issue is easiest to reproduce with GLK. HSW seems to be somewhat 
affected too, so the host speed might play a part.

Tomi
-- 
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo