[PATCH v11] drm: Add initial ci/ subdirectory

Mon Sep 11 09:34:13 UTC 2023

Hi

(Removing most of the context that got scrambled)

On Thu, Sep 07, 2023 at 01:40:02PM +0200, Daniel Stone wrote:
> Yeah, this is what our experience with Mesa (in particular) has taught us.
> 
> Having 100% of the tests pass 100% of the time on 100% of the platforms is a
> great goal that everyone should aim for. But it will also never happen.
> 
> Firstly, we're just not there yet today. Every single GPU-side DRM driver
> has userspace-triggerable faults which cause occasional errors in GL/Vulkan
> tests. Every single one. We deal with these in Mesa by retrying; if we
> didn't retry, across the breadth of hardware we test, I'd expect 99% of
> should-succeed merges to fail because of these intermittent bugs in the DRM
> drivers.

So the plan is only to ever test rendering devices? It should have been
made clearer then.

> We don't have the same figure for KMS - because we don't test it - but
> I'd be willing to bet no driver is 100% if you run tests often enough.

And I would still consider that a bug that we ought to fix, and
certainly not something we should sweep under the rug. If half the tests
are not running on a driver, then fine, they aren't. I'm not really
against having failing tests, I'm against not flagging unreliable tests
on a given hardware as failing tests.

> Secondly, we will never be there. If we could pause for five years and sit
> down making all the current usecases for all the current hardware on the
> current kernel run perfectly, we'd probably get there. But we can't: there's
> new hardware, new userspace, and hundreds of new kernel trees.

Not with that attitude :)

I'm not sure it's actually an argument, really. 10 years ago, we would
never have been at "every GPU on the market has an open-source driver"
here. 5 years ago, we would never have been at this-series-here. That
didn't stop anyone making progress, everyone involved in that thread
included.

> Even without the first two, what happens when the Arm SMMU maintainers
> (choosing a random target to pick on, sorry Robin) introduce subtle
> breakage which makes a lot of tests fail some of the time? Do we
> refuse to backmerge Linus into DRM until it's fixed, or do we disable
> all testing on Arm until it's fixed? When we've done that, what
> happens when we re-enable testing, and discover that a bunch of tests
> get broken because we haven't been testing?

I guess that's another thing that needs to be made clearer then. Do you
want to test Mesa, or the kernel?

For Mesa, I'd very much expect to rely on a stable kernel, and for the
kernel on a stable Mesa.

And if we're testing the kernel, then let's turn it the other way
around. How are we even supposed to detect those failures in the first
place if tests are flagged as unreliable?

No matter what we do here, what you describe will always happen. Like,
if we do flag those tests as unreliable, what exactly prevents another
issue to come on top undetected, and what will happen when we re-enable
testing?

On top of that, you kind of hinted at that yourself, but what set of
tests will pass is a property linked to a single commit. Having that
list within the kernel already alters that: you'll need to merge a new
branch, add a bunch of fixes and then change the test list state. You
won't have the same tree you originally tested (and defined the test
state list for).

It might or might not be an issue for Linus' release, but I can
definitely see the trouble already for stable releases where fixes will
be backported, but the test state list certainly won't be updated.

> Thirdly, hardware is capricious. 'This board doesn't make it to u-boot' is a
> clear infrastructure error, but if you test at sufficient scale, cold solder
> or failing caps surface way more often than you might think. And you can't
> really pick those out by any other means than running at scale, dealing with
> non-binary results, and looking at the trends over time. (Again this is
> something we do in Mesa - we graph test failures per DUT, look for outliers,
> and pull DUTs out of the rotation when they're clearly defective. But that
> only works if you actually run enough tests on them in the first place to
> discover trends - if you stop at the first failed test, it's impossible to
> tell the difference between 'infuriatingly infrequent kernel/test bug?' and
> 'cracked main board maybe?'.)
> 
> What we do know is that we _can_ classify tests four ways in expectations.
> Always-passing tests should always pass. Always-failing tests should always
> fail (and update the expectations if you make them pass). Flaking tests work
> often enough that they'll always pass if you run them a couple/few times,
> but fail often enough that you can't rely on them. Then you just skip tests
> which exhibit catastrophic failure i.e. local DoS which affects the whole
> test suite.
> 
> By keeping those sets of expectations, we've been able to keep Mesa pretty
> clear of regressions, whilst having a very clear set of things that should
> be fixed to point to. It would be great if those set of things were zero,
> but it just isn't. Having that is far better than the two alternatives:
> either not testing at all (obviously bad), or having the test always be red
> so it's always ignored (might as well just not test).

Isn't that what happens with flaky tests anyway? Even more so since we
have 0 context when updating that list.

I've asked a couple of times, I'll ask again. In that other series, on
the MT8173, kms_hdmi_inject at inject-4k is setup as flaky (which is a KMS
test btw).

I'm a maintainer for that part of the kernel, I'd like to look into it,
because it's seriously something that shouldn't fail, ever, the hardware
isn't involved.

How can I figure out now (or worse, let's say in a year) how to
reproduce it? What kernel version was affected? With what board? After
how many occurences?

Basically, how can I see that the bug is indeed there (or got fixed
since), and how to start fixing it?

And then repeat for any other test listed in there.

I got no other reply before because I very well know the answer: nobody
knows. And that's a serious issue to me, because that effectively means
that the flaky test list will only ever increase (since we can't even
check that it's fixed, and the CI infrastructure won't check that it got
fixed either), and we won't be able to address any of the bugs listed
there.

Maxime
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20230911/7e3d509c/attachment.sig>