[Piglit] [PATCH 00/35] Serialize profiles into XML at build time
Tomi Sarvela
tomi.p.sarvela at intel.com
Mon May 7 08:20:46 UTC 2018
On 05/07/2018 10:17 AM, Tomi Sarvela wrote:
> On 05/04/2018 07:57 PM, Dylan Baker wrote:
>> Quoting Juan A. Suarez Romero (2018-05-04 04:50:27)
>>> On Fri, 2018-05-04 at 12:03 +0200, Juan A. Suarez Romero wrote:
>>>> On Wed, 2018-05-02 at 13:57 -0700, Dylan Baker wrote:
>>>>> Quoting Juan A. Suarez Romero (2018-05-02 09:49:08)
>>>>>> Hi, Dylan.
>>>>>>
>>>>>> I see you've pushed this series.
>>>>>>
>>>>>> Now, when I'm trying to run some profiles (mainly, tests/crucible and
>>>>>> tests/khr_gl* ), seems they are broken:
>>>>>>
>>>>>> [0000/7776]
>>>>>> Traceback (most recent call last):
>>>>>> File "./piglit", line 178, in <module>
>>>>>> main()
>>>>>> File "./piglit", line 174, in main
>>>>>> sys.exit(runner(args))
>>>>>> File "/home/igalia/jasuarez/piglit/framework/exceptions.py",
>>>>>> line 51, in
>>>>>> _inner
>>>>>> func(*args, **kwargs)
>>>>>> File "/home/igalia/jasuarez/piglit/framework/programs/run.py",
>>>>>> line 370, in
>>>>>> run
>>>>>> backend.finalize({'time_elapsed': time_elapsed.to_json()})
>>>>>> File "/home/igalia/jasuarez/piglit/framework/backends/json.py",
>>>>>> line 163, in
>>>>>> finalize
>>>>>> assert data['tests']
>>>>>> AssertionError
>>>>>>
>>>>>> J.A.
>>>>>>
>>>>>
>>>>> Dang.
>>>>>
>>>>> I can't reproduce any failures with crucible, though I did make it
>>>>> thread safe
>>>>> and fix the using a config file :)
>>>>>
>>>>> I can't get the glcts binary to run, no matter what target I build
>>>>> for I run
>>>>> into either EGL errors of GL errors.
>>>>>
>>>>
>>>> More info on this issue.
>>>>
>>>> It seems it happens with the profiles that requires to use an
>>>> external runner
>>>> (crucible, vk-gl-cts, deqp, ...).
>>>>
>>>>
>>>> When executing, it tells it will run all the tests, but sometimes it
>>>> just
>>>> execute one test, other times 2, and other times none. It is in the
>>>> last case
>>>> when the error above is shown.
>>>>
>>>> Still don't know why.
>>>>
>>>
>>>
>>> Found the problem in this commit:
>>>
>>> commit 9461d92301e72807eba4776a16a05207e3a16477
>>> Author: Dylan Baker <dylan at pnwbakers.com>
>>> Date: Mon Mar 26 15:23:17 2018 -0700
>>>
>>> framework/profile: Add a __len__ method to TestProfile
>>> This exposes a standard interface for getting the number of
>>> tests in a
>>> profile, which is itself nice. It will also allow us to
>>> encapsulate the
>>> differences between the various profiles added in this series.
>>> Tested-by: Rafael Antognolli <rafael.antognolli at intel.com>
>>>
>>>
>>
>> I'm really having trouble reproducing this, the vulkan cts and
>> crucible both run
>> fine for me, no matter how many times I stop and start them. I even
>> tried with
>> python2 and couldn't reproduce. Can you give me some more information
>> about your
>> system?
>
> I think I've hit this same issue on our CI.
>
> Symptoms match so that we sometimes run the whole 25k piglit gbm
> testset, sometimes we stop around the test 400-600. This behaviour can
> change with subsequent runs without rebooting the machine. Test where
> run is stopped is usually the same, and changes if filters change.
>
> I can reproduce this with -d / --dry-run so the tests themselves are not
> an issue. Filtering with large -x / --exclude-tests might play a part.
> The command line is max 25kB, so there shouldn't be cutoff point with
> partial regex, which then would match too much.
>
> I'm just starting to investigate where does the test list size drop so
> dramatically, probably by inserting testlist size debugs around to see
> where it takes me.
>
> Environment: Ubuntu 18.04 LTS with default mesa
> Kernel: DRM-Tip HEAD or Ubuntu default.
>
> Commandline is built with bash array from blacklist. This looks correct,
> and sometimes works correctly. Eg
>
> ./piglit run tests/gpu ~/results -d -o -l verbose "${OPTIONS[@]}"
>
> where $OPTIONS is an array of
> '-x', 'timestamp-get',
> '-x', 'glsl-routing', ...
>
> Successful CI runlog:
> http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_4148/pig-glk-j5005/run0.log
>
> Unsuccessful CI runlog:
> http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_4149/pig-glk-j5005/run0.log
>
> Between those two runs, only kernel has changed.
>
> The issue is easiest to reproduce with GLK. HSW seems to be somewhat
> affected too, so the host speed might play a part.
Patch below makes the issue disappear for my GLK testrig.
With multiprocessing.pool.imap I'm getting rougly 50% correct behaviour
and 50% early exists on dry-runs.
With multiprocessing.pool.map I'm not getting early exists at all.
Sample size is ~50 runs for both setups.
With the testset of 26179 on GLK dry-run, the runtime difference is
negligible: pool.map 49s vs pool.imap 50s
piglit/framework$ diff -c profile.py.orig profile.py
*** profile.py.orig 2018-05-07 19:11:37.649994643 +0300
--- profile.py 2018-05-07 19:11:46.880994608 +0300
***************
*** 584,591 ****
# more code, and adding side-effects
test_list = (x for x in test_list if filterby(x))
! pool.imap(lambda pair: test(pair[0], pair[1], profile, pool),
! test_list, chunksize)
def run_profile(profile, test_list):
"""Run an individual profile."""
--- 584,591 ----
# more code, and adding side-effects
test_list = (x for x in test_list if filterby(x))
! pool.map(lambda pair: test(pair[0], pair[1], profile, pool),
! test_list, chunksize)
def run_profile(profile, test_list):
"""Run an individual profile."""
Tomi
--
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo
More information about the Piglit
mailing list