<div dir="ltr"><div>The <code class="gmail-job-log gmail-d-block"><span class="gmail-gl-white-space-pre-wrap">ac_surface_meta_address_test</span></code> timeout occurs rarely and it's because the test is computationally demanding. It's also possible the machine got slower for some reason.<br></div><div><br></div><div>Marek<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jan 7, 2022 at 12:32 PM Emma Anholt <<a href="mailto:emma@anholt.net">emma@anholt.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, Jan 7, 2022 at 6:18 AM Connor Abbott <<a href="mailto:cwabbott0@gmail.com" target="_blank">cwabbott0@gmail.com</a>> wrote:<br> ><br> > Unfortunately batch mode has only made it *worse* - I'm sure it's not<br> > intentional, but it seems that it's still running the CI pipelines<br> > individually after the batch pipeline passes and not merging them<br> > right away, which completely defeats the point. See, for example,<br> > !14213 which has gone through 8 cycles being batched with earlier MRs,<br> > 5 of those passing only to have an earlier job in the batch spuriously<br> > fail when actually merging and Marge seemingly giving up on merging it<br> > (???). As I type it was "lucky" enough to be the first job in a batch<br> > which passed and is currently running its pipeline and is blocked on<br> > iris-whl-traces-performance (I have !14453 to disable that broken job,<br> > but who knows with the Marge chaos when it's going to get merged...).<br> ><br> > Stepping back, I think it was a bad idea to push a "I think this might<br> > help" type change like this without first carefully monitoring things<br> > afterwards. An hour or so of babysitting Marge would've caught that<br> > this wasn't working, and would've prevented many hours of backlog and<br> > perception of general CI instability.<br> <br> I spent the day watching marge, like I do every day. Looking at the<br> logs, we got 0 MRs in during my work hours PST, out of about 14 or so<br> marge assignments that day. Leaving marge broken for the night would<br> have been indistinguishable from the status quo, was my assessment.<br> <br> There was definitely some extra spam about trying batches, more than<br> there were actual batches attempted. My guess would be gitlab<br> connection reliability stuff, but I'm not sure.<br> <br> Of the 5 batches marge attempted before the change was reverted, three<br> fell to <a href="https://gitlab.freedesktop.org/mesa/mesa/-/issues/5837" rel="noreferrer" target="_blank">https://gitlab.freedesktop.org/mesa/mesa/-/issues/5837</a>, one to<br> the git fetch fails, and one to a new timeout I don't think I've seen<br> before: <a href="https://gitlab.freedesktop.org/mesa/mesa/-/jobs/17357425#L1731" rel="noreferrer" target="_blank">https://gitlab.freedesktop.org/mesa/mesa/-/jobs/17357425#L1731</a>.<br> Of all the sub-MRs involved in those batches, I think two of those<br> might have gotten through by dodging the LAVA lab fail. Marge's batch<br> backoff did work, and !14436 and maybe !14433 landed during that time.<br> </blockquote></div>