[igt-dev] [PATCH i-g-t v2] intel-ci: add a pre-merge blacklist to reduce the testing queue
Chris Wilson
chris at chris-wilson.co.uk
Fri Feb 21 10:43:33 UTC 2020
Quoting Martin Peres (2020-02-21 09:00:47)
> When arriving at the office on Monday morning, the reported queue
> size was ~100 hours. This defeats the point of pre-merge testing and
> vastly exceeds our target of ~6 hours.
>
> We have a lot of work needed to reduce testing time, but this patches
> reduces the reported run time by 15-30% depending on the platforms:
>
> - shard-skl: 23.9 -> 18.2 minutes (18.5%)
> - shard-kbl: 21.2 -> 16.2 minutes (20%)
> - shard-apl: 25.9 -> 18.5 minutes (24.3%)
> - shard-glk: 24.7 -> 17.6 minutes (24.8%)
> - shard-icl: 25.1 -> 16.7 minutes (28.7%)
> - shard-tgl: 28.2 -> 19.6 minutes (26.4%)
>
> The reason why the reported runtime is so low compared to the
> actual time is due to:
>
> - Unaccounted time spent outside of the IGT subtests (exec(), fixtures)
> - Unaccounted time spent in suspend (monotonic clock, 20s / suspend)
> - Boot time / extra reboots between shards to workaround kernel failures
> - Intel GFX CI shard scheduling overhead
> - More?
>
> Tomi and Petri are working on reducing these overheads by detecting the
> bad conditions and rebooting the machine only at this point rather than
> between every single shard, and increasing the size of the shard test
> lists to reduce the per-shard CI overhead.
>
> Because of this, the actual savings are way smaller in percentage
> but still compound over the tens of executions we do per week:
>
> - shard-skl: ~58 -> ~52 minutes
> - shard-kbl: ~50 -> ~45 minutes
> - shard-apl: ~53 -> ~46 minutes
> - shard-glk: ~38 -> ~31 minutes
> - shard-icl: ~47 -> ~39 minutes
> - shard-tgl: ~60 -> ~51 minutes
>
> More work needed, but we'll get there :)
>
> v2:
> - Avoid using | in the regular expressions (Petri Latvala)
> - Update the description for igt at gem_pwrite@big-.* (Chris Wilson)
> - Drop igt at sw_sync@sync_expired_merge (fixed by Chris Wilson)
> - Drop igt at gem_eio@kms (fixed by Chris Wilson)
> - Drop igt at perf@gen12-mi-rpc as it is serious kernel bug (Chris Wilson)
> - Add links to issues tracking this for all blacklisted item
>
> NOTICE: The above numbers have not been edited for the v2 since
> blacklisting or improving the runtime dramatically yields the
> same results, and only igt at perf@gen12-mi-rpc is back to being
> slow.
>
> Signed-off-by: Martin Peres <martin.peres at linux.intel.com>
Acked-by: Chris Wilson <chris at chris-wilson.co.uk>
I dream of a day where the test lists are autogenerated based on
historical information on how effective each one is at rejecting
patches, tuned for a particular test runtime. And with feedback from
bugs reported after the fact (along with the new testcases we need to
capture new code and user reported bugs). [Oh and fuzzing to generate
new tests.]
Imagine if we can do 95% patch^W bug rejection within 10min and 99.9%
rejection within 1hour. Then we might have enough free time for the
extended tests on CI_DRM.
-Chris
More information about the igt-dev
mailing list