[Intel-gfx] [RFC/Draft] Testing requirements for upstream drm/i915 patches

Wed Oct 30 20:05:13 CET 2013

On Wed, Oct 30, 2013 at 7:11 PM, Ian Romanick <idr at freedesktop.org> wrote:
>>   test coverage of the existing code _before_ starting a feature instead
>>   of when the patches are ready for merging should help a lot, before
>>   everyone is invested into patches already and mounting rebase pain looms
>>   large.
>
> This is actually an opportunity in disguise.  Once you have identified
> some of the places that are really lacking coverage, you give new people
> the task to create tests for some small areas.  This gives the new
> person a way to learn the code and make a contribution without hurting
> themselves.

Yeah, that would be a great way to bring up new people. The problem is
a bit that on the kernel side we have a few disadvantages compared to
mesa: We don't have a nice spec and we also don't have a fairly decent
reference implementation (the nvidia blob). So ime writing kernel
tests is much more open-ended than reading a gl extension spec and
just nocking off all the new enums and api interface points.

The other issue is that some of the bugs (especially in gem) are
ridiculously hard to reproduce. E.g. a late -rc regression in 3.11
took me a full day of head against the wall banging until I've had a
testcase for it. And that was with the bugfix already at hand and the
issue seemingly understood. A great learning experience in carefully
ordering operations and laying out objects in the gtt, but probably
not something to get started.

>> - Tests need to provide a reasonable baseline coverage of the internal
>> driver
>>   state. The idea here isn't to aim for full coverage, that's an
>> impossible and
>>   pointless endeavor. The goal is to have a good starting point of tests
>> so that
>>   when a tricky corner case pops up in review or validation that it's not a
>>   terribly big effort to add a specific testcase for it.
>
> In Mesa we've started adding tests in Mesa (run with 'make check') in
> addition to the piglit tests.  These allow us to poke at internal
> interfaces in ways that are difficult or impossible to do directly from
> a test that lives on the other side of the API.  The hope is that this
> will help prevent "revealed" bugs.  It's those bugs where a change in an
> unrelated piece of code exposes a bug that had previously existed.
>
> Is something like that sensible (or even possible) in the kernel?

Actual unit tests are hard to impossible, since the driver (and the
kernel in general) is a spaghetti monster with complete lack of mock
objects. And often bugs happen at the interface between sw and hw, so
we need some way to inject (fake) hw events and state. I think runtime
checks otoh are really great, and especially for slow operations like
modeset the overhead doesn't matter at all.

Everywhere else where treating the kernel as a blackbox isn't good
enough we've resorted to exposing internals and special knobs through
sysfs. That includes stuff like fake hang injection, cache thrashing
or the newly added pipe CRC stuff. But it's all for runtime tests.

>> - Issues discovered in review and final validation need automated test
>> coverage.
>>   The reasoning is that anything which slipped the developer's attention is
>>   tricky enough to warrant an explicit testcase, since in a later
>> refactoring
>>   there's a good chance that it'll be missed again. This has a bit a risk
>>   to delay patches, but if the basic test coverage is good enough as per
>>   the previous point it really shouln't be an issue.
>
> These are the bugs most likely to need the previously mentioned
> "internal" tests.  "Function X will crash in situation Y.  The code that
> calls X won't cause Y, but..."

Ime the really tricky stuff is concurrency issues.They're hard to
understand and often really hard to reproduce. We're trying though.
The other ones are bugs depending in complex ways upon the aggregate
driver state (especially for modesetting issue). Both are not easily
done as unit tests which are executed with make check.

There's also a the issue of making the kernel code compile as a
userspace executable. The usually way to solve that is to create a
loadable kernel module with all the unit tests, and there's definitely
a bit of that around already in the kernel (e.g. lockdep has a good
set of self-tests that isolate the internal lockdep state for these
tests). That approach might be useful for some parts of the i915
driver like testing the edid parser, but those tend to not be
high-priority items for testing.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch