git and Marge troubles this week

Sat Jan 8 17:49:15 UTC 2022

The ac_surface_meta_address_test timeout occurs rarely and it's because the
test is computationally demanding. It's also possible the machine got
slower for some reason.

Marek

On Fri, Jan 7, 2022 at 12:32 PM Emma Anholt <emma at anholt.net> wrote:

> On Fri, Jan 7, 2022 at 6:18 AM Connor Abbott <cwabbott0 at gmail.com> wrote:
> >
> > Unfortunately batch mode has only made it *worse* - I'm sure it's not
> > intentional, but it seems that it's still running the CI pipelines
> > individually after the batch pipeline passes and not merging them
> > right away, which completely defeats the point. See, for example,
> > !14213 which has gone through 8 cycles being batched with earlier MRs,
> > 5 of those passing only to have an earlier job in the batch spuriously
> > fail when actually merging and Marge seemingly giving up on merging it
> > (???). As I type it was "lucky" enough to be the first job in a batch
> > which passed and is currently running its pipeline and is blocked on
> > iris-whl-traces-performance (I have !14453 to disable that broken job,
> > but who knows with the Marge chaos when it's going to get merged...).
> >
> > Stepping back, I think it was a bad idea to push a "I think this might
> > help" type change like this without first carefully monitoring things
> > afterwards. An hour or so of babysitting Marge would've caught that
> > this wasn't working, and would've prevented many hours of backlog and
> > perception of general CI instability.
>
> I spent the day watching marge, like I do every day.  Looking at the
> logs, we got 0 MRs in during my work hours PST, out of about 14 or so
> marge assignments that day.  Leaving marge broken for the night would
> have been indistinguishable from the status quo, was my assessment.
>
> There was definitely some extra spam about trying batches, more than
> there were actual batches attempted.  My guess would be gitlab
> connection reliability stuff, but I'm not sure.
>
> Of the 5 batches marge attempted before the change was reverted, three
> fell to https://gitlab.freedesktop.org/mesa/mesa/-/issues/5837, one to
> the git fetch fails, and one to a new timeout I don't think I've seen
> before: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/17357425#L1731.
> Of all the sub-MRs involved in those batches, I think two of those
> might have gotten through by dodging the LAVA lab fail.  Marge's batch
> backoff did work, and !14436 and maybe !14433 landed during that time.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20220108/9e574034/attachment.htm>