[PATCH v11] drm: Add initial ci/ subdirectory

Tue Sep 12 13:16:41 UTC 2023

Hi Maxime,
Hopefully less mangled formatting this time: turns out Thunderbird +
plain text is utterly unreadable, so that's one less MUA that is
actually usable to send email to kernel lists without getting shouted
at.

On Mon, 11 Sept 2023 at 15:46, Maxime Ripard <mripard at kernel.org> wrote:
> On Mon, Sep 11, 2023 at 03:30:55PM +0200, Michel Dänzer wrote:
> > > There's in 6.6-rc1 around 240 reported flaky tests. None of them have
> > > any context. That new series hads a few dozens too, without any context
> > > either. And there's no mention about that being a plan, or a patch
> > > adding a new policy for all tests going forward.
> >
> > That does sound bad, would need to be raised in review.
> >
> > > Any concern I raised were met with a giant "it worked on Mesa" handwave
> >
> > Lessons learned from years of experience with big real-world CI
> > systems like this are hardly "handwaving".
>
> Your (and others) experience certainly isn't. It is valuable, welcome,
> and very much appreciated.
>
> However, my questions and concerns being ignored time and time again
> about things like what is the process is going to be like, what is going
> to be tested, who is going to be maintaining that test list, how that
> interacts with stable, how we can possibly audit the flaky tests list,
> etc. have felt like they were being handwaived away.

Sorry it ended up coming across like that. It wasn't the intent.

> I'm not saying that because I disagree, I still do on some, but that's
> fine to some extent. However, most of these issues are not so much an
> infrastructure issue, but a community issue. And I don't even expect a
> perfect solution right now, unlike what you seem to think. But I do
> expect some kind of plan instead of just ignoring that problem.
>
> Like, I had to ask the MT8173 question 3 times in order to get an
> answer, and I'm still not sure what is going to be done to address that
> particular issue.
>
> So, I'm sorry, but I certainly feel like it here.

I don't quite see the same picture from your side though. For example,
my reading of what you've said is that flaky tests are utterly
unacceptable, as are partial runs, and we shouldn't pretend otherwise.
With your concrete example (which is really helpful, so thanks), what
happens to the MT8173 hdmi-inject test? Do we skip all MT8173 testing
until it's perfect, or does MT8173 testing always fail because that
test does?

Both have their downsides. Not doing any testing has the obvious
downside, and means that the driver can get worse until it gets
perfect. Always marking the test as failed means that the test results
are useless: if failure is expected, then red is good. I mean, say
you're contributing a patch to fix some documentation or add a helper
to common code which only v3d uses. The test results come back, and
your branch is failing tests on MT8173, specifically the
hdmi-inject at 4k test. What then? Either as a senior contributor you
'know' that's the case, or as a casual contributor you get told 'oh
yeah, don't worry about the test results, they always fail'. Both lead
to the same outcome, which is that no-one pays any attention to the
results, and they get worse.

What we do agree on is that yes, those tests should absolutely be
fixed, and not just swept under the rug. Part of this is having
maintainers actually meaningfully own their test results. For example,
I'm looking at the expectation lists for the Intel gen in my laptop,
and I'm seeing a lot of breakage in blending tests, as well as
dual-display fails which include the resolution of my external
display. I'd expect the Intel driver maintainers to look at them, get
them fixed, and gradually prune those xfails/flakes down towards zero.

If the maintainers don't own it though, then it's not going to get
fixed. And we are exactly where we are today: broken plane blending
and 1440p on KBL, broken EDID injection on MT8173, and broken atomic
commits on stoney. Without stronger action from the maintainers (e.g.
throwing i915 out of the tree until it has 100% pass 100% of the
time), adding testing isn't making the situation better or worse in
and of itself. What it _is_ doing though, is giving really clear
documentation of the status of each driver, and backing that up by
verifying it.

Only maintainers can actually fix the drivers (or the tests tbf). But
doing the testing does let us be really clear to everyone what the
actual state is, and that way people can make informed decisions too.
And the only way we're going to drive the test rate down is by the
subsystem maintainers enforcing it.

Does that make sense on where I'm (and I think a lot of others are) coming from?

To answer the other question about 'where are the logs?': some of them
have the failure data in them, others don't. They all should going
forward at least though.

Cheers,
Daniel