[Piglit] [PATCH 00/35] Serialize profiles into XML at build time

Dylan Baker dylan at pnwbakers.com
Mon May 7 16:44:58 UTC 2018


Quoting Tomi Sarvela (2018-05-07 01:20:46)
> On 05/07/2018 10:17 AM, Tomi Sarvela wrote:
> > On 05/04/2018 07:57 PM, Dylan Baker wrote:
> >> Quoting Juan A. Suarez Romero (2018-05-04 04:50:27)
> >>> On Fri, 2018-05-04 at 12:03 +0200, Juan A. Suarez Romero wrote:
> >>>> On Wed, 2018-05-02 at 13:57 -0700, Dylan Baker wrote:
> >>>>> Quoting Juan A. Suarez Romero (2018-05-02 09:49:08)
> >>>>>> Hi, Dylan.
> >>>>>>
> >>>>>> I see you've pushed this series.
> >>>>>>
> >>>>>> Now, when I'm trying to run some profiles (mainly, tests/crucible and
> >>>>>> tests/khr_gl* ), seems they are broken:
> >>>>>>
> >>>>>> [0000/7776]
> >>>>>> Traceback (most recent call last):
> >>>>>>    File "./piglit", line 178, in <module>
> >>>>>>      main()
> >>>>>>    File "./piglit", line 174, in main
> >>>>>>      sys.exit(runner(args))
> >>>>>>    File "/home/igalia/jasuarez/piglit/framework/exceptions.py", 
> >>>>>> line 51, in
> >>>>>> _inner
> >>>>>>      func(*args, **kwargs)
> >>>>>>    File "/home/igalia/jasuarez/piglit/framework/programs/run.py", 
> >>>>>> line 370, in
> >>>>>> run
> >>>>>>      backend.finalize({'time_elapsed': time_elapsed.to_json()})
> >>>>>>    File "/home/igalia/jasuarez/piglit/framework/backends/json.py", 
> >>>>>> line 163, in
> >>>>>> finalize
> >>>>>>      assert data['tests']
> >>>>>> AssertionError
> >>>>>>
> >>>>>>          J.A.
> >>>>>>
> >>>>>
> >>>>> Dang.
> >>>>>
> >>>>> I can't reproduce any failures with crucible, though I did make it 
> >>>>> thread safe
> >>>>> and fix the using a config file :)
> >>>>>
> >>>>> I can't get the glcts binary to run, no matter what target I build 
> >>>>> for I run
> >>>>> into either EGL errors of GL errors.
> >>>>>
> >>>>
> >>>> More info on this issue.
> >>>>
> >>>> It seems it happens with the profiles that requires to use an 
> >>>> external runner
> >>>> (crucible, vk-gl-cts, deqp, ...).
> >>>>
> >>>>
> >>>> When executing, it tells it will run all the tests, but sometimes it 
> >>>> just
> >>>> execute one test, other times 2, and other times none. It is in the 
> >>>> last case
> >>>> when the error above is shown.
> >>>>
> >>>> Still don't know why.
> >>>>
> >>>
> >>>
> >>> Found the problem in this commit:
> >>>
> >>> commit 9461d92301e72807eba4776a16a05207e3a16477
> >>> Author: Dylan Baker <dylan at pnwbakers.com>
> >>> Date:   Mon Mar 26 15:23:17 2018 -0700
> >>>
> >>>      framework/profile: Add a __len__ method to TestProfile
> >>>      This exposes a standard interface for getting the number of 
> >>> tests in a
> >>>      profile, which is itself nice. It will also allow us to 
> >>> encapsulate the
> >>>      differences between the various profiles added in this series.
> >>>      Tested-by: Rafael Antognolli <rafael.antognolli at intel.com>
> >>>
> >>>
> >>
> >> I'm really having trouble reproducing this, the vulkan cts and 
> >> crucible both run
> >> fine for me, no matter how many times I stop and start them. I even 
> >> tried with
> >> python2 and couldn't reproduce. Can you give me some more information 
> >> about your
> >> system?
> > 
> > I think I've hit this same issue on our CI.
> > 
> > Symptoms match so that we sometimes run the whole 25k piglit gbm 
> > testset, sometimes we stop around the test 400-600. This behaviour can 
> > change with subsequent runs without rebooting the machine. Test where 
> > run is stopped is usually the same, and changes if filters change.
> > 
> > I can reproduce this with -d / --dry-run so the tests themselves are not 
> > an issue. Filtering with large -x / --exclude-tests might play a part. 
> > The command line is max 25kB, so there shouldn't be cutoff point with 
> > partial regex, which then would match too much.
> > 
> > I'm just starting to investigate where does the test list size drop so 
> > dramatically, probably by inserting testlist size debugs around to see 
> > where it takes me.
> > 
> > Environment: Ubuntu 18.04 LTS with default mesa
> > Kernel: DRM-Tip HEAD or Ubuntu default.
> > 
> > Commandline is built with bash array from blacklist. This looks correct, 
> > and sometimes works correctly. Eg
> > 
> > ./piglit run tests/gpu ~/results -d -o -l verbose "${OPTIONS[@]}"
> > 
> > where $OPTIONS is an array of
> > '-x', 'timestamp-get',
> > '-x', 'glsl-routing', ...
> > 
> > Successful CI runlog:
> > http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_4148/pig-glk-j5005/run0.log
> > 
> > Unsuccessful CI runlog:
> > http://gfx-ci.fi.intel.com/tree/drm-tip/CI_DRM_4149/pig-glk-j5005/run0.log
> > 
> > Between those two runs, only kernel has changed.
> > 
> > The issue is easiest to reproduce with GLK. HSW seems to be somewhat 
> > affected too, so the host speed might play a part.
> 
> Patch below makes the issue disappear for my GLK testrig.
> 
> With multiprocessing.pool.imap I'm getting rougly 50% correct behaviour 
> and 50% early exists on dry-runs.
> 
> With multiprocessing.pool.map I'm not getting early exists at all.
> 
> Sample size is ~50 runs for both setups.
> 
> With the testset of 26179 on GLK dry-run, the runtime difference is 
> negligible: pool.map 49s vs pool.imap 50s
> 
> 
> 
> piglit/framework$ diff -c profile.py.orig profile.py
> *** profile.py.orig     2018-05-07 19:11:37.649994643 +0300
> --- profile.py  2018-05-07 19:11:46.880994608 +0300
> ***************
> *** 584,591 ****
>                # more code, and adding side-effects
>                test_list = (x for x in test_list if filterby(x))
> 
> !         pool.imap(lambda pair: test(pair[0], pair[1], profile, pool),
> !                   test_list, chunksize)
> 
>        def run_profile(profile, test_list):
>            """Run an individual profile."""
> --- 584,591 ----
>                # more code, and adding side-effects
>                test_list = (x for x in test_list if filterby(x))
> 
> !         pool.map(lambda pair: test(pair[0], pair[1], profile, pool),
> !                  test_list, chunksize)
> 
>        def run_profile(profile, test_list):
>            """Run an individual profile."""
> 
> 
> Tomi

Juan, can you test this patch and see if it resolves your issue as well? I'm not
sure why this is fixing things, but if it does I'm happy to merge it and deal
with any performance problems it introduces later.

Dylan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: signature
URL: <https://lists.freedesktop.org/archives/piglit/attachments/20180507/914b1a81/attachment-0001.sig>


More information about the Piglit mailing list