[Mesa-dev] Mesa CI is too slow

Mon Feb 18 17:31:41 UTC 2019

Hi all,
A few people have noted that Mesa's GitLab CI is just too slow, and
not usable in day-to-day development, which is a massive shame.

I looked into it a bit this morning, and also discussed it with Emil,
though nothing in this is speaking for him.

Taking one of the last runs as representative (nothing in it looks
like an outlier to me, and 7min to build RadeonSI seems entirely
reasonable):
https://gitlab.freedesktop.org/mesa/mesa/pipelines/19692/builds

This run executed 24 jobs, which is beyond the limit of our CI
parallelism. As documented on
https://www.freedesktop.org/wiki/Infrastructure/ we have 14 concurrent
job slots (each with roughly 4 vCPUs). Those 24 jobs cumulatively took
177 minutes of execution time, taking 120 minutes for the end-to-end
pipeline.

177 minutes of runtime is too long for the runners we have now: if it
perfectly occupies all our runners it will take over 12 minutes, which
means that even if no-one else was using the runners, they could
execute 5 Mesa builds per hour at full occupancy. Unfortunately,
VirGL, Wayland/Weston, libinput, X.Org, IGT, GStreamer,
NetworkManager/ModemManager, Bolt, Poppler, etc, would all probably
have something to say about that.

When the runners aren't occupied and there's less contention for jobs,
it looks quite good:
https://gitlab.freedesktop.org/anholt/mesa/pipelines/19621/builds

This run 'only' took 20.5 minutes to execute, but then again, 3
pipelines per hour isn't that great either.

Two hours of end-to-end pipeline time is also obviously far too long.
Amongst other things, it practically precludes pre-merge CI: by the
time your build has finished, someone will have pushed to the tree, so
you need to start again. Even if we serialised it through a bot, that
would limit us to pushing 12 changesets per day, which seems too low.

I'm currently talking to two different hosts to try to get more
sponsored time for CI runners. Those are both on hold this week due to
travel / personal circumstances, but I'll hopefully find out more next
week. Eric E filed an issue
(https://gitlab.freedesktop.org/freedesktop/freedesktop/issues/120) to
enable ccache cache but I don't see myself having the time to do it
before next month.

In the meantime, it would be great to see how we could reduce the
number of jobs Mesa runs for each pipeline. Given we're already
exceeding the limits of parallelism, having so many independent jobs
isn't reducing the end-to-end pipeline time, but instead just
duplicating effort required to fetch and check out sources, cache (in
the future), start the container, run meson or ./configure, and build
any common files.

I'm taking it as a given that at least three separate builds are
required: autotools, Meson, and SCons. Fair enough.

It's been suggested to me that SWR should remain separate, as it takes
longer to build than the other drivers, and getting fast feedback is
important, which is fair enough.

Suggestion #1: merge scons-swr into scons-llvm. scons-nollvm will
already provide fast feedback on if we've broken the SCons build, and
the rest is pretty uninteresting, so merging scons-swr into scons-llvm
might help cut down on duplication.

Suggestion #2: merge the misc Gallium jobs together. Building
gallium-radeonsi and gallium-st-other are both relatively quick. We
could merge these into gallium-drivers-other for a very small increase
in overall runtime for that job, and save ourselves probably about 10%
of the overall build time here.

Suggestion #3: don't build so much LLVM in autotools. The Meson
clover-llvm builds take half the time the autotools builds do. Perhaps
we should only build one LLVM variant within autotools (to test the
autotools LLVM selection still works), and then build all the rest
only in Meson. That would be good for another 15-20% reduction in
overall pipeline run time.

Suggestion #4 (if necessary): build SWR less frequently. Can we
perhaps demote SWR to an 'only:' job which will only rebuild SWR if
SWR itself or Gallium have changed? This would save a good chunk of
runtime - again close to 10%.

Doing the above would reduce the run time fairly substantially, for
what I can tell is no loss in functional coverage, and bring the
parallelism to a mere 1.5x oversubscription of the whole
organisation's available job slots, from the current 2x.

Any thoughts?

Cheers,
Daniel