[Mesa-dev] Gitlab migration

Mark Janes mark.a.janes at intel.com
Fri May 25 23:47:26 UTC 2018


Daniel Stone <daniel at fooishbar.org> writes:
> We had a go at using Jenkins for some of this: Intel's been really
> quite successful at doing it internally, but our community efforts
> have been a miserable failure. After a few years I've concluded that
> it's not going to change - even with Jenkins 2.0.
>
> Firstly, Jenkins configuration is an absolute dumpster fire. Working
> out how to configure it and create the right kind of jobs (and debug
> it!) is surprisingly difficult, and involves a lot of clicking through
> the web UI, or using external tools like jenkins-job-builder which
> seem to be in varying levels of disrepair. If you have dedicated 'QA
> people' whose job is driving Jenkins for you, then great! Jenkins will
> probably work well for you. This doesn't scale to a community model
> though. Especially when people have different usecases and need to
> install different plugins.
>
> Jenkins security is also a tyre fire. Plugins are again in varying
> levels of disrepair, and seem remarkably prone to CVEs. There's no
> real good model for updating plugins (and doing so is super fragile).
> Worse still, Jenkins 2.0 really pushes you to be writing scripts in
> Groovy, which can affect Jenkins in totally arbitrary ways, and
> subvert the security model entirely. The way upstream deals with this
> is to enforce a 'sandbox' model preventing most scripts from doing
> anything useful unless manually audited and approved by an admin.
> Again, this is fine for companies or small teams where you trust
> people to not screw up, but doesn't scale to something like fd.o.
>
> Adding to these is the permission model, which again requires painful
> configuration and a lot of admin clicking. It doesn't integrate well
> with external services, and granularity is mostly at an instance
> rather than a project level: again not suitable for something like
> fd.o.
>
> From the UI and workflow perspective, something I've never liked is
> that the first-order view is very specific pipelines, e.g. 'Mesa
> master build', 'daily Piglit run', etc etc. If all you care about is
> master, then this is fine. You _can_ make those pipelines run against
> arbitrary branches and commits you pick up from MRs or similar, but
> you really are trying to jam it sideways into the UI it wants to
> present. Again this is so deeply baked into how Jenkins works that I
> don't see it as really being fixable.
>
> I have a pile of other gripes, like how difficult their remote API is
> to use, and the horrible race conditions it has. For instance, when
> you schedule a run of a particular job, it doesn't report the run ID
> back to you: you have to poll the last job number before you submit,
> then poll again for a few seconds to find the next run ID. Good luck
> to you if two runs of the same job (e.g. 'build specific Mesa commit')
> get scheduled at the same time.

I agree with some of your Jenkins critiques.  I have implemented CI on
*many* different frameworks over the past 15 years, and I think that
every implementation has its fans and haters.

It is wise to create automation which is mostly independent of the CI
framework.  Mesa i965 CI could immediately switch from Jenkins to
BuildBot or GitLab, if there was a reason to do so.  It may be that
GitLab is superior to Jenkins by now, but the selection of the CI
framework is a minor detail anyways.

CI frameworks are often based on build/test pipelines, which I think is
exactly the wrong concept for the domain.  Flexible CI is best thought
of as a multiplatform `make` system.  Setting up a "pipeline" is similar
to building your project with a shell script instead of a makefile.

I disagree with your critique of the Jenkins remote API.  It is more
flexible than any other API that I have seen for CI.  We implement our
multiplatform-make system on top of it.  It would be nice to have an ID
returned when triggering a job, but you can work around by including a
GUID as a build parameter, then polling for the GUID.

The reasons I chose Jenkins over what was available at the time:

  - job/system configuration is saved as XML for backup/diff/restore
  - huge number of users -> fewer quality issues

> GitLab CI fixes all of these things. Pipelines are strongly and
> directly correlated with commits in repositories, though you can also
> trigger them manually or on a schedule. Permissions are that of the
> repository, and just like Travis, people can fork and work on CI
> improvements in their own sandbox without impacting anything else. The
> job configuration is in relatively clean YAML, and it strongly
> suggests idiomatic form rather than a forest of thousands of
> unmaintained plugins.
>
> Jobs get run in clean containers, rather than special unicorn workers
> pre-configured just so, meaning that the builds are totally
> reproducible locally and you can use whatever build dependencies you
> want without having to bug the admins to install LLVM in some
> particular chroot. Those containers can be stored in a registry
> attached to the project, with their own lifetime/ownership/etc
> tracking. Jenkins can use Docker if you have an external registry, but
> again this requires setting up external authentication and
> permissions, not to mention that there's no lifetime/ownership/expiry
> tracking, so you have to write more special admin cronjob scripts to
> clean up old images in the registry.

GitLab may be perfectly suitable for CI, but please do not select Mesa
dev infrastructure based on CI features.

Any Mesa CI needs to trigger from multiple projects: drm, dEQP, Piglit,
VulkanCTS, SPIRV-Tools, crucible, glslang.  They are not all going to be
in GitLab.

The cart (CI) follows the horse (upstream development process).  CI
automation is cheap and flexible, and can easily adapt to changes in the
driver implementation / dev process.

> It _is_ possible to bend Jenkins to your will - Mark's excellent and
> super-helpful work with Intel's CI is testament to that - and in some
> environments it's fine, but after a few years of trying, I just don't
> think it's suitable to run on fd.o, and I also don't think it's a good
> fit for what Mesa wants to be doing with CI as a community. (That was
> much longer than expected, sorry: the wound is still raw, I guess.)

CI may not be suitable for running on fd.o at all.  Similar systems have
large staff and generally provide far poorer results than i965 CI.  Even
when done well, it takes a lot of work and hardware.

If an entity like the Linux Foundation were to underwrite an effort
where dedicated staff could provide CI and triage regressions for all
types of hardware, then the Linux graphics community would certainly
benefit greatly.

Barring this type of significant investment, a community CI would have
to be pretty modest.  It makes sense to build any smaller effort
incrementally, and evaluate its cost-effectiveness as you add test
targets.

> One benefit you get from using MRs is that you can use CI (as above)
> to do pre-commit tests. Those tests are what we make of them - it's
> trivial to set up various build tests, though doing actual run tests
> is much more difficult - but having it run automatically is nice. The
> Intel kernel team have snowpatch and Jenkins set up to do this, which
> is impressive, but again I don't think it's something we can really
> run generally on fd.o. OTOH, GitLab CI will run the full battery of
> tests on MRs, show you the logs, let you download any generated
> artifacts, etc. It's pretty slick, and in fact not even limited to
> MRs: it will just run it on whatever branch you push. So you can
> replicate what currently happens with Intel CI by pushing a branch
> before you send out patches and checking the CI pipeline status for
> that branch: in fact slightly easier since you can actually directly
> access the instance rather than only getting what's mailed to you.

I am eager to make Intel CI results more visible.  Can I push our
artifacts to an fd.o directory, and have GitLab display them to users as
if the build ran on fd.o?  A typical build produces ~500MB of junit test
output with ~4M results.



More information about the mesa-dev mailing list