[Intel-gfx] [Draft] Testing Requirements for drm/i915 Patches

Wed Oct 30 12:38:39 CET 2013

On Wed, Oct 30, 2013 at 12:38 AM, Jesse Barnes <jbarnes at virtuousgeek.org> wrote:
> Since a number of people internally are also involved in i915
> development but not on the mailing list, I think we'll need to have an
> internal meeting or two to cover this stuff and get buy in.

Yeah I'll do that. I simply didn't get around to spaming internal
mailing lists with this yet yesterday evening.

> Overall, developing tests along with code is a good goal.  A few
> comments below.
>
> On Tue, 29 Oct 2013 20:00:49 +0100
> Daniel Vetter <daniel at ffwll.ch> wrote:
>
>> - Tests must fully cover userspace interfaces. By this I mean exercising all the
> [snip]
>> - Tests need to provide a reasonable baseline coverage of the internal driver
>>   state. The idea here isn't to aim for full coverage, that's an impossible and
> [snip]
>
> What you've described here is basically full validation.  Something
> that most groups at Intel have large teams to dedicate all their time
> to.  I'm not sure how far we can go down this path with just the
> development resources we have today (though maybe we'll get some help
> from the validation teams in the product groups)

I guess you can call it validation, I'd go with test driven
development instead. Also my main focus is on userspace interfaces
(due to the security relevance and that we need to keep them working
forever) and bugs that review/testing/users caught. The general
coverage is just to avoid people getting royally pissed off since
starting to write tests once everything is essentially ready for
merging (like we've done with ppgtt) is just too late.

>> Finally the short lists of excuses that don't count as proper test coverage for
>> a feature.
>>
>> - Manual testing. We are ridiculously limited on our QA manpower. Every time we
>>   drop something onto the "manual testing" plate something else _will_ drop off.
>>   Which means in the end that we don't really have any test coverage. So
>>   if patches don't come with automated tests, in-kernel cross-checking or
>>   some other form of validation attached they need to have really good
>>   reasons for doing so.
>
> Some things are only testable manually at this point, since we don't
> have a sophisticated webcam structure set up for everything (and in
> fact, the webcam tests we do have are fairly manual at this point, in
> that they have to be set up specially each time).

I agree that not everything is testable in an automated way. And I
also think that anything which requires special setup like webcams
isn't really useful, since it's a pain to replicate and for developers
it's easer to just look at the screen. But if you look at the past 2
years of i-g-t progress we've come up with some neat tricks:
- fake gpu hang injection. Before that our reset code was essentially
untested, after that we've spent 1+ years thus far fixing bugs.
Nowadays gpu hang handling actually works, which is great for
developers and also improves the user experience a lot.
- exercising gem slowpath: We've added tricks like using gtt mmaps to
precisely provoke pagefaults and then added prefault disabling and
Chris' drop_caches to debugfs to exercise more cornercases. Nowadays
i-g-t exercise the full slowpath-in-slowpath madness in execbuf.
- CRC checksums: It's still very early, but Ville has already pushed a
testcase for all the cursor bugs he's fixed in the past (and
discovered yet another one on top).
- fifo underrun reporting: Unfortunately not yet enabled by default
(since our watermark code is still buggy), but it seems like Ville and
Paulo have found this to be a really powerful tool for catching
watermark bugs.
- modeset state checker in the kernel: I don't think we could have
pushed through all the big reworks we've done in the past year in our
modeset code without this.

There's more, and I already have crazy ideas for more infrastructure
to make more bugs and features testable (like all the ugly 3 pipe
resource sharing issues we have on ivb/hsw).

Essentially I just want people to spend a bit of their brain power on
coming up with new ideas for stuff that's currently untestable. That
doesn't mean that it's a hard requirement since very often all the
ideas just suck (or are way too much effort to be able to inject the
required state and then check it). But occasionally something really
good will come out of this.

>> - Testing by product teams. The entire point of Intel OTC's "upstream first"
>>   strategy is to have a common codebase for everyone. If we break product trees
>>   every time we feed an update into them because we can't properly regression
>>   test a given feature then the value of upstreaming features is greatly
>>   diminished in my opinion and could potentially doom collaborations with
>>   product teams. We just can't have that.
>>
>>   This means that when products teams submit patches upstream they also need
>>   to submit the relevant testcases to i-g-t.
>
> So what I'm hearing here is that even if someone submits a tested
> patch, with tests available (and passing) somewhere other than i-g-t,
> you'll reject them until they port/write a new test for i-g-t.  Is that
> what you meant?  I think a more reasonable criteria would be that tests
> from non-i-g-t test suites are available and run by our QA, or run
> against upstream kernels by groups other than our QA.  That should keep
> a lid on regressions just as well.

Essentially yes, I'd like to reject stuff which we can't test in
upstream. If we have other testsuites and they're integrated into both
QA and available to developers (e.g. how the mesa teams have a piglit
wrapper for oglc tests), then that's also ok.

But writing good tests is pretty hard, so I think actually
collaborating is much better than every product group having their own
set of testcases for bare-metal kernel testing (integration testing is
a different matter ofc). We might need to massage i-g-t a bit to make
that possible though. Also for developers it's nice if all the tests
and tools are in the same repo, to avoid the need to set things up
first when our qa reports a regression.

Note that we already have such an exception of sorts: The functional
tests for hw contexts are explicitly done through mesa/piglit since it
would be simply insane to do that on bare metal. So I'm not
fundamentally opposed to external tests.

> One thing you didn't mention here is that our test suite is starting to
> see as much churn (and more breakage) than upstream.  If you look at
> recent results from QA, you'd think SNB was totally broken based on
> i-g-t results.  But despite that, desktops come up fine and things
> generally work.  So we need to take care that our tests are simple and
> that our test library code doesn't see massive churn causing false
> positive breakage all the time.  In other words, tests are just as
> likely to be broken (reporting false breakage or false passing) as the
> code they're testing.  The best way to avoid that is to keep the tests
> very small, simple, and targeted.  Converting and refactoring code in
> i-g-t to allow that will be a big chunk of work.

Well long term I think we need to do the same thing the mesa guys are
doing with piglits and require review on every i-g-t patch. For now
the commit&fix later approach seems to still be workable (and I admit
to have abused QA a bit for the big infrastructure rework I've done in
the past few months).

So right now I don't see any real need for refactoring i-g-t. Also
I've done a quick bug scrub (some old reports I've failed to close)
and run a full i-g-t on my snb and stuff works. So I also don't see
your massive pile of failures.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch