[Mesa-dev] Gitlab migration

Tue May 29 11:54:54 UTC 2018

Hi Mark,

On 26 May 2018 at 00:47, Mark Janes <mark.a.janes at intel.com> wrote:
> Daniel Stone <daniel at fooishbar.org> writes:
>> We had a go at using Jenkins for some of this: Intel's been really
>> quite successful at doing it internally, but our community efforts
>> have been a miserable failure. After a few years I've concluded that
>> it's not going to change - even with Jenkins 2.0. [...]
>
> I agree with some of your Jenkins critiques.  I have implemented CI on
> *many* different frameworks over the past 15 years, and I think that
> every implementation has its fans and haters.
>
> It is wise to create automation which is mostly independent of the CI
> framework.  Mesa i965 CI could immediately switch from Jenkins to
> BuildBot or GitLab, if there was a reason to do so.  It may be that
> GitLab is superior to Jenkins by now, but the selection of the CI
> framework is a minor detail anyways.

I don't think there'd be any benefit, to be honest. You have an
experienced and capable team who can and have been dealing with
Jenkins successfully for years; the system works. For the above
reasons, I think it's totally inappropriate for fd.o to be offering as
a general service to all our projects; this is very different to Intel
offering a very specific and targeted service receiving full-time paid
attention.

> CI frameworks are often based on build/test pipelines, which I think is
> exactly the wrong concept for the domain.  Flexible CI is best thought
> of as a multiplatform `make` system.  Setting up a "pipeline" is similar
> to building your project with a shell script instead of a makefile.

Unless I've totally misunderstood you, I agree.

>> GitLab CI fixes all of these things. Pipelines are strongly and
>> directly correlated with commits in repositories, though you can also
>> trigger them manually or on a schedule. Permissions are that of the
>> repository, and just like Travis, people can fork and work on CI
>> improvements in their own sandbox without impacting anything else. The
>> job configuration is in relatively clean YAML, and it strongly
>> suggests idiomatic form rather than a forest of thousands of
>> unmaintained plugins.
>>
>> Jobs get run in clean containers, rather than special unicorn workers
>> pre-configured just so, meaning that the builds are totally
>> reproducible locally and you can use whatever build dependencies you
>> want without having to bug the admins to install LLVM in some
>> particular chroot. Those containers can be stored in a registry
>> attached to the project, with their own lifetime/ownership/etc
>> tracking. Jenkins can use Docker if you have an external registry, but
>> again this requires setting up external authentication and
>> permissions, not to mention that there's no lifetime/ownership/expiry
>> tracking, so you have to write more special admin cronjob scripts to
>> clean up old images in the registry.
>
> GitLab may be perfectly suitable for CI, but please do not select Mesa
> dev infrastructure based on CI features.
>
> Any Mesa CI needs to trigger from multiple projects: drm, dEQP, Piglit,
> VulkanCTS, SPIRV-Tools, crucible, glslang.  They are not all going to be
> in GitLab.
>
> The cart (CI) follows the horse (upstream development process).  CI
> automation is cheap and flexible, and can easily adapt to changes in the
> driver implementation / dev process.

Sure, though it depends of course on your definition. If you're taking
'CI' to mean 'exactly the thing Intel does today' (i.e. building
dozens of different modules with completely disconnected development
streams and testing driver behaviour on a huge farm of hardware
maintained specifically for that purpose), then yes, it's totally
unsuitable, and I wouldn't advise anyone to try. If using that
definition, 'cheap and flexible' also doesn't fit at all, because the
actual maintenance of the hardware (especially if using a system which
doesn't account for hardware failure), and accounting for the
combinatorial explosion of components is several full-time jobs, as
you know.

I'm suggesting that using GitLab CI as a visible and obvious part of
the community development process would be useful for a far more
limited suite of automated tasks run to provide feedback to developers
on the suitability of their code for merging.

>> It _is_ possible to bend Jenkins to your will - Mark's excellent and
>> super-helpful work with Intel's CI is testament to that - and in some
>> environments it's fine, but after a few years of trying, I just don't
>> think it's suitable to run on fd.o, and I also don't think it's a good
>> fit for what Mesa wants to be doing with CI as a community. (That was
>> much longer than expected, sorry: the wound is still raw, I guess.)
>
> CI may not be suitable for running on fd.o at all.  Similar systems have
> large staff and generally provide far poorer results than i965 CI.  Even
> when done well, it takes a lot of work and hardware.
>
> If an entity like the Linux Foundation were to underwrite an effort
> where dedicated staff could provide CI and triage regressions for all
> types of hardware, then the Linux graphics community would certainly
> benefit greatly.
>
> Barring this type of significant investment, a community CI would have
> to be pretty modest.  It makes sense to build any smaller effort
> incrementally, and evaluate its cost-effectiveness as you add test
> targets.

Yes, as above. The KernelCI experience mirrors this exactly: the labs
are maintained by KernelCI developers, full-time platform maintainers,
Linaro employees, or various consulting companies, e.g. Collabora's
two labs consume a surprising amount of time from people who are
responsible for it as their job.

But the fact KernelCI is a hugely complex and involved project which
is difficult to approach, doesn't mean that the 0day bots which just
perform compile/analysis/etc tests don't or shouldn't exist. There are
a number of projects which can all happily coexist.

>> One benefit you get from using MRs is that you can use CI (as above)
>> to do pre-commit tests. Those tests are what we make of them - it's
>> trivial to set up various build tests, though doing actual run tests
>> is much more difficult - but having it run automatically is nice. The
>> Intel kernel team have snowpatch and Jenkins set up to do this, which
>> is impressive, but again I don't think it's something we can really
>> run generally on fd.o. OTOH, GitLab CI will run the full battery of
>> tests on MRs, show you the logs, let you download any generated
>> artifacts, etc. It's pretty slick, and in fact not even limited to
>> MRs: it will just run it on whatever branch you push. So you can
>> replicate what currently happens with Intel CI by pushing a branch
>> before you send out patches and checking the CI pipeline status for
>> that branch: in fact slightly easier since you can actually directly
>> access the instance rather than only getting what's mailed to you.
>
> I am eager to make Intel CI results more visible.  Can I push our
> artifacts to an fd.o directory, and have GitLab display them to users as
> if the build ran on fd.o?  A typical build produces ~500MB of junit test
> output with ~4M results.

Not currently, though it is on their radar:
https://gitlab.com/gitlab-org/gitlab-ce/issues/17081

Even if it were though, I don't think parsing a 500MB XML file
server-side is a particularly good idea; same goes for Piglit's JSON,
which is nicely human readable but far too verbose to be usable at
scale. Think about scaling it out to the hundreds of projects (with
however many repos) hosted by fd.o as a whole; all the forks thereof,
the platforms and configurations they care about, and the number of
developers and submissions. Even if we write off the 1.5 petabytes
required to store results from this year's Mesa commits (ignoring
pre-commit submissions) as workable, the server load to parse and
display those results is just not workable for a large public service.

There are some other things we could do though, such as:
  - share common build configurations, build those for MRs and commits
providing full feedback to developers for build errors in various
configurations, and expose them (for a limited time) as downloadable
artifacts
  - have Intel Jenkins reuse those builds in order to run its various
test suites on real hardware
  - accept and display summary results (e.g. 'all tests successful' or
a note of which tests regressed, etc)
  - a link to download a reasonably-compressed result set stored as an
artifact (e.g. the last 10MB results.tar.xz I got mailed back from the
Intel CI is fine)

Does that sound like something reasonable to aim for?

Cheers,
Daniel