[Intel-gfx] [Draft] Testing Requirements for drm/i915 Patches

Tue Oct 29 20:00:49 CET 2013

Hi all,

So in the past half year we've had tons of sometimes rather heated discussions
about getting patches merged. Often these discussions have been in the context
of specific patch series, which meant that people are already invested. Which
contributed to the boiling emotions. I'd like to avoid that here by making this
a free-standing discussion.

There's a bunch of smaller process tuning going on, but the big thing I'd like
to instate henceforth is that automated test coverage is a primary consideration
for anything going upstream. In this write up I'll detail my reasons,
considerations and expectations. My plan is to solicit feedback over the next
few days and then publish an edited and polished version to my blog.

After that I'll put down my foot on this process so that we can go back to
coding and stop blowing through so much time and energy on waging flamewars.

Feedback and critique highly welcome.

Cheers, Daniel

Testing Requirements for Upstreaming (Draft)
============================================

I want to make automated test coverage an integral part of our feature and bufix
development process. For features this means that starting with the design phase
testability needs to be considered an integral part of any feature. This needs
to go up through the entire development process until when the implementation is
submitted together with the proposed tests. For bugfixes that means the fix is
only complete once the automated testcase for it is also done, if we need a new
one.

This specifically excludes testing with humans somewhere in the loop. We are
extremely limited in our validation resources, every time we put something new
onto the "manual testing" plate something else _will_ fall off.

Why?
----

- More predictability. Right now test coverage often only comes up as a topic
  when I drop my maintainer review onto a patch series. Which is too late, since
  it'll delay the otherwise working patches and so massively frustrates people.
  I hope by making test requirements clear and up-front we can make the
  upstreaming process more predictable. Also, if we have good tests from the get-go
  there should be much less need for me to drop patches from my trees
  after having them merged.

- Less bikeshedding. In my opinion test cases are an excellent means to settle
  bikesheds - we've had in the past seen cases of endless back&forths
  where writing a simple testcase would have shown that _all_ proposed
  color flavours are actually broken.

  The even more important thing is that fully automated tests allow us to
  legitimately postpone cleanups. If the only testing we have is manual testing
  then we have only one shot at a feature tested, namely when the developer
  tests it. So it better be perfect. But with automated tests we can postpone
  cleanups with too high risks of regressions until a really clear need is
  established. And since that need often never materializes we'll save work.

- Better review. For me it's often helps a lot to review tests than the actual
  code in-depth. This is especially true for reviewing userspace interface
  additions.

- Actionable regression reports. Only if we have a fully automated testcase do
  we have a good chance that QA reports a regression within just a few days.
  Everything else can easily take weeks (for platforms and features which are
  explicitly tested) to months (for stuff only users from the community notice).
  And especially now that much more shipping products depend upon a working
  i915.ko driver we just can't do this any more.

- Better tests. A lot of our code is really hard to test in an automated
  fashion, and pushing the frontier of what is testable often requires a lot of
  work. I hope that by making tests an integral part of any feature work and so
  forcing more people to work on them and think about testing we'll
  advance the state of the art at a brisker pace.

Risks and Buts
--------------

- Bikeshedding on tests. This plan is obviously not too useful if we just
  replace massive bikeshedding on patches with massive bikeshedding on
  testcases. But right now we do almost no review on i-g-t patches so the risk
  is small. Long-term the review requirements for testcases will certainly
  increase, but as with everything else we simply need to strive for a good
  balance to strike for just the right amount of review.

  Also if we really start discussing tests _before_ having written massive patch
  series we'll do the bikeshedding while there's no real rebase pain. So even if
  the bikeshedding just shifts we'll benefit I think, especially for
  really big features.

- Technical debt in test coverage. We have a lot of old code which still
  completely lacks testcases. Which means that even small feature work might be
  on the hook for a big pile of debt restructuring. I think this is inevitable
  occasionally. But I think that doing an assement of the current state of
  test coverage of the existing code _before_ starting a feature instead
  of when the patches are ready for merging should help a lot, before
  everyone is invested into patches already and mounting rebase pain looms
  large.

  Again we need to strive for a good balance between "too many tests to write
  up-front for old code" and "needs for tests that only the final review
  uncovers creating process bubbles".

- Upstreaming of product stuff. Product guys are notoriuosly busy and writing
  tests is actual work. Otoh the upstream codebase feeds back into _all_ product
  trees (and the upstream kernel), so requirements are simply a bit higher. And
  I also don't think that we can push the testing of some features fully to
  product teams, since they'll be pissed really quickly if every update they get
  from us breaks their stuff. So if these additional test requirements (compared
  to the past) means that some product patches won't get merged, then I think
  that's the right choice.

- But ... all the other kernel drivers don't do this. We're also one of the
  biggest driver's in the kernel, with a code churn rate roughly 5x worse than
  anything else and a pretty big (and growing) team. Also, we're often the
  critical path in enabling new platforms in the fast-paced mobile space.
  Different standards apply.

Expectations
------------

Since the point here is to make the actual test requirements known up-front we
need to settle on clear expectations. Since this is the part that actually
matters in practice I'll really welcome close scrutiny and comments here.

- Tests must fully cover userspace interfaces. By this I mean exercising all the
  possible options, especially the usual tricky corner cases (e.g. off-by-one
  array sizes, overflows). It also needs to include tests for all the
  userspace input validation (i.e. correctly rejecting invalid input,
  including checks for the error codes). For userspace interface additions
  technical debt really must be addressed. This means that when adding a
  new flag and we currently don't have any tests for those flags, then
  I'll ask for a testcase which fully exercises all the flag values we
  currently support on top of the new interface addition.

- Tests need to provide a reasonable baseline coverage of the internal driver
  state. The idea here isn't to aim for full coverage, that's an impossible and
  pointless endeavor. The goal is to have a good starting point of tests so that
  when a tricky corner case pops up in review or validation that it's not a
  terribly big effort to add a specific testcase for it.

- Issues discovered in review and final validation need automated test coverage.
  The reasoning is that anything which slipped the developer's attention is
  tricky enough to warrant an explicit testcase, since in a later refactoring
  there's a good chance that it'll be missed again. This has a bit a risk
  to delay patches, but if the basic test coverage is good enough as per
  the previous point it really shouln't be an issue.

- Finally we need to push the testable frontier with new ideas like pipe CRCs,
  modeset state cross checking or arbitrary monitor configuration injection
  (with fixed EDIDs and connector state forcing). The point here is to forster
  new crazy ideas, and the expectation is very much _not_ that developers then
  need to write testcases for all the old bugfixes that suddenly became
  testable. That workload needs to be spread out over a bunch of features
  touching the relevant area. This only really applies to features and
  code paths which are currently in the "not testable" bucket anyway.

This should specify the "what" decently enough, but we also need to look at how
tests should work.

Specific testcases in i-g-t are obviously the preferred form, but for some
features that's just not possible. In such cases in-kernel self-checks like
the modeset state checker of fifo underrun reporting are really good
approaches. Two ceaveats apply:

- The test infrastructure really should be orthogonal to the code being tested.
  In-line asserts that check for preconditions are really nice and useful, but
  since they're closely tied to the code itself have a good chance to be broken
  in the same ways.

- The debug feature needs to be enabled by default, and it needs to be loud.
  Otherwise no one will notice that something is amiss. So currently the fifo
  underrun reporting doesn't really count since it only causes debug level
  output when something goes wrong. Of course it's still a really good tool for
  developers, just not yet for catching regressions.

Finally the short lists of excuses that don't count as proper test coverage for
a feature.

- Manual testing. We are ridiculously limited on our QA manpower. Every time we
  drop something onto the "manual testing" plate something else _will_ drop off.
  Which means in the end that we don't really have any test coverage. So
  if patches don't come with automated tests, in-kernel cross-checking or
  some other form of validation attached they need to have really good
  reasons for doing so.

- Testing by product teams. The entire point of Intel OTC's "upstream first"
  strategy is to have a common codebase for everyone. If we break product trees
  every time we feed an update into them because we can't properly regression
  test a given feature then the value of upstreaming features is greatly
  diminished in my opinion and could potentially doom collaborations with
  product teams. We just can't have that.

  This means that when products teams submit patches upstream they also need
  to submit the relevant testcases to i-g-t.
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch