[Mesa-dev] [Intel-gfx] gitlab.fd.o financial situation and impact on services

Sat Feb 29 01:51:58 UTC 2020

On 2/28/20 4:22 PM, Daniel Vetter wrote:
> On Fri, Feb 28, 2020 at 9:31 PM Dave Airlie <airlied at gmail.com> wrote:
>> On Sat, 29 Feb 2020 at 05:34, Eric Anholt <eric at anholt.net> wrote:
>>> On Fri, Feb 28, 2020 at 12:48 AM Dave Airlie <airlied at gmail.com> wrote:
>>>> On Fri, 28 Feb 2020 at 18:18, Daniel Stone <daniel at fooishbar.org> wrote:
>>>>> On Fri, 28 Feb 2020 at 03:38, Dave Airlie <airlied at gmail.com> wrote:
>>>>>> b) we probably need to take a large step back here.
>>>>>>
>>>>>> Look at this from a sponsor POV, why would I give X.org/fd.o
>>>>>> sponsorship money that they are just giving straight to google to pay
>>>>>> for hosting credits? Google are profiting in some minor way from these
>>>>>> hosting credits being bought by us, and I assume we aren't getting any
>>>>>> sort of discounts here. Having google sponsor the credits costs google
>>>>>> substantially less than having any other company give us money to do
>>>>>> it.
>>>>> The last I looked, Google GCP / Amazon AWS / Azure were all pretty
>>>>> comparable in terms of what you get and what you pay for them.
>>>>> Obviously providers like Packet and Digital Ocean who offer bare-metal
>>>>> services are cheaper, but then you need to find someone who is going
>>>>> to properly administer the various machines, install decent
>>>>> monitoring, make sure that more storage is provisioned when we need
>>>>> more storage (which is basically all the time), make sure that the
>>>>> hardware is maintained in decent shape (pretty sure one of the fd.o
>>>>> machines has had a drive in imminent-failure state for the last few
>>>>> months), etc.
>>>>>
>>>>> Given the size of our service, that's a much better plan (IMO) than
>>>>> relying on someone who a) isn't an admin by trade, b) has a million
>>>>> other things to do, and c) hasn't wanted to do it for the past several
>>>>> years. But as long as that's the resources we have, then we're paying
>>>>> the cloud tradeoff, where we pay more money in exchange for fewer
>>>>> problems.
>>>> Admin for gitlab and CI is a full time role anyways. The system is
>>>> definitely not self sustaining without time being put in by you and
>>>> anholt still. If we have $75k to burn on credits, and it was diverted
>>>> to just pay an admin to admin the real hw + gitlab/CI would that not
>>>> be a better use of the money? I didn't know if we can afford $75k for
>>>> an admin, but suddenly we can afford it for gitlab credits?
>>> As I think about the time that I've spent at google in less than a
>>> year on trying to keep the lights on for CI and optimize our
>>> infrastructure in the current cloud environment, that's more than the
>>> entire yearly budget you're talking about here.  Saying "let's just
>>> pay for people to do more work instead of paying for full-service
>>> cloud" is not a cost optimization.
>>>
>>>
>>>>> Yes, we could federate everything back out so everyone runs their own
>>>>> builds and executes those. Tinderbox did something really similar to
>>>>> that IIRC; not sure if Buildbot does as well. Probably rules out
>>>>> pre-merge testing, mind.
>>>> Why? does gitlab not support the model? having builds done in parallel
>>>> on runners closer to the test runners seems like it should be a thing.
>>>> I guess artifact transfer would cost less then as a result.
>>> Let's do some napkin math.  The biggest artifacts cost we have in Mesa
>>> is probably meson-arm64/meson-arm (60MB zipped from meson-arm64,
>>> downloaded by 4 freedreno and 6ish lava, about 100 pipelines/day,
>>> makes ~1.8TB/month ($180 or so).  We could build a local storage next
>>> to the lava dispatcher so that the artifacts didn't have to contain
>>> the rootfs that came from the container (~2/3 of the insides of the
>>> zip file), but that's another service to build and maintain.  Building
>>> the drivers once locally and storing it would save downloading the
>>> other ~1/3 of the inside of the zip file, but that requires a big
>>> enough system to do builds in time.
>>>
>>> I'm planning on doing a local filestore for google's lava lab, since I
>>> need to be able to move our xml files off of the lava DUTs to get the
>>> xml results we've become accustomed to, but this would not bubble up
>>> to being a priority for my time if I wasn't doing it anyway.  If it
>>> takes me a single day to set all this up (I estimate a couple of
>>> weeks), that costs my employer a lot more than sponsoring the costs of
>>> the inefficiencies of the system that has accumulated.
>> I'm not trying to knock the engineering works the CI contributors have
>> done at all, but I've never seen a real discussion about costs until
>> now. Engineers aren't accountants.
>>
>> The thing we seem to be missing here is fiscal responsibility. I know
>> this email is us being fiscally responsible, but it's kinda after the
>> fact.
>>
>> I cannot commit my employer to spending a large amount of money (> 0
>> actually) without a long and lengthy process with checks and bounds.
>> Can you?
>>
>> The X.org board has budgets and procedures as well. I as a developer
>> of Mesa should not be able to commit the X.org foundation to spending
>> large amounts of money without checks and bounds.
>>
>> The CI infrastructure lacks any checks and bounds. There is no link
>> between editing .gitlab-ci/* and cashflow. There is no link to me
>> adding support for a new feature to llvmpipe that blows out test times
>> (granted it won't affect CI budget but just an example).
> We're working to get the logging in place to know which projects
> exactly burn down the money so that we can take specific actions. If
> needed. So pretty soon you wont be able to just burn down endless
> amounts of cash with a few gitlab-ci commits. Or at least not for long
> until we catch you and you either fix things up or CI is gone for your
> project.
>
>> The fact that clouds run on credit means that it's not possible to say
>> budget 30K and say when that runs out it runs out, you end up getting
>> bills for ever increasing amounts that you have to cover, with nobody
>> "responsible" for ever reducing those bills. Higher Faster Further
>> baby comes to mind.
> We're working on this, since it's the boards responsibility to be on
> top of stuff. It's simply that we didn't expect a massive growth of
> this scale and this quickly, so we're a bit behind on the controlling
> aspect.
>
> Also I guess it wasnt clear, but the board decision yesterday was the
> stop loss order where we cut the cord (for CI at least). So yeah the
> short term budget is firmly in place now.
>
>> Has X.org actually allocated the remaining cash in it's bank account
>> to this task previously? Was there plans for this money that can't be
>> executed now because we have to pay the cloud fees? If we continue to
>> May and the X.org bank account hits 0, can XDC happen?
> There's numbers elsewhere in this thread, but if you'd read the
> original announcement it states that the stop loss would still
> guarantee that we can pay for everything for at least one year. We're
> not going to get even close to 0 in the bank account.
>
> So yeah XDC happens, and it'll also still happen next year. Also fd.o
> servers will keep running. The only thing we might need to switch off
> is the CI support.
>
>> Budgeting and cloud is hard, the feedback loops are messy. In the old
>> system the feedback loop was simple, we don't have admin time or money
>> for servers we don't get the features, cloud allows us to get the
>> features and enjoy them and at some point in the future the bill gets
>> paid by someone else. Credit cards lifestyles all the way.
> Uh ... where exactly do you get the credit card approach from? SPI is
> legally not allowed to extend us a credit (we're not a legal org
> anymore), so if we hit 0 it's out real quick. No credit for us. If SPI
> isnt on top of that it's their loss (but they're getting pretty good
> at tracking stuff with the contractor they now have and all that).
>
> Which is not going to happen btw, if you've read the announcement mail
> and all that.
>
> Cheers, Daniel
Sorry to enter mid conversation. You may want to see how the
GCC test Farm does it or the Yocto Project. I do get their different
projects but they seem to be managing fine. I'm not sure of
their funding but I do recall that a lot of the machines I use
for work on the farm are donated from IBM or some company.

Not sure if you can get a company(ies) to donate some machines
and store them in a data center like GCC or the Yocto Project,

Nick
>> Like maybe we can grow up here and find sponsors to cover all of this,
>> but it still feels a bit backwards from a fiscal pov.
>>
>> Again I'm not knocking the work people have done at all, CI is very
>> valuable to the projects involved, but that doesn't absolve us from
>> costs.
>>
>> Dave.
>
>