[Intel-gfx] gitlab.fd.o financial situation and impact on services

Fri Feb 28 08:17:30 UTC 2020

On Fri, 28 Feb 2020 at 03:38, Dave Airlie <airlied at gmail.com> wrote:
> b) we probably need to take a large step back here.
>
> Look at this from a sponsor POV, why would I give X.org/fd.o
> sponsorship money that they are just giving straight to google to pay
> for hosting credits? Google are profiting in some minor way from these
> hosting credits being bought by us, and I assume we aren't getting any
> sort of discounts here. Having google sponsor the credits costs google
> substantially less than having any other company give us money to do
> it.

The last I looked, Google GCP / Amazon AWS / Azure were all pretty
comparable in terms of what you get and what you pay for them.
Obviously providers like Packet and Digital Ocean who offer bare-metal
services are cheaper, but then you need to find someone who is going
to properly administer the various machines, install decent
monitoring, make sure that more storage is provisioned when we need
more storage (which is basically all the time), make sure that the
hardware is maintained in decent shape (pretty sure one of the fd.o
machines has had a drive in imminent-failure state for the last few
months), etc.

Given the size of our service, that's a much better plan (IMO) than
relying on someone who a) isn't an admin by trade, b) has a million
other things to do, and c) hasn't wanted to do it for the past several
years. But as long as that's the resources we have, then we're paying
the cloud tradeoff, where we pay more money in exchange for fewer
problems.

> If our current CI architecture is going to burn this amount of money a
> year and we hadn't worked this out in advance of deploying it then I
> suggest the system should be taken offline until we work out what a
> sustainable system would look like within the budget we have, whether
> that be never transferring containers and build artifacts from the
> google network, just having local runner/build combos etc.

Yes, we could federate everything back out so everyone runs their own
builds and executes those. Tinderbox did something really similar to
that IIRC; not sure if Buildbot does as well. Probably rules out
pre-merge testing, mind.

The reason we hadn't worked everything out in advance of deploying is
because Mesa has had 3993 MRs in the not long over a year since
moving, and a similar number in GStreamer, just taking the two biggest
users. At the start it was 'maybe let's use MRs if you want to but
make sure everything still goes through the list', and now it's
something different. Similarly the CI architecture hasn't been
'designed', so much as that people want to run dEQP and Piglit on
their hardware pre-merge in an open fashion that's actually accessible
to people, and have just done it.

Again, if you want everything to be centrally
designed/approved/monitored/controlled, that's a fine enough idea, and
I'd be happy to support whoever it was who was doing that for all of
fd.o.

Cheers,
Daniel