[Intel-gfx] The whole round of i-g-t testing cost too long running time

Wed Apr 16 10:24:51 CEST 2014

On Wed, Apr 16, 2014 at 7:47 AM, Yang, Guang A <guang.a.yang at intel.com> wrote:
> Ok there are a few cases where we can indeed make tests faster, but it will
> be work for us. And that won't really speed up much since we're adding piles
> more testcases at a pretty quick rate. And many of these new testcases are
> CRC based, so inheritely take some time to run.
>
> [He, Shuang] OK, so it takes at least n/60 in usual case to have result
> detected plus additional execution time, depending on how many rounds of
> testing. We will be absolutely happy to see more tests coming that is useful
>
> [Guang YANG] Except these CRC case, some stress case may also cost a bit of
> time, especially on some old platforms. Maybe can reduce the loop in that
> kind of stress case?

I think stopping the tests after 10 minutes is ok, but in general the
point of stress tests is to beat on the kernel for corner cases. E.g.
even with todays extensive set of stress tests some spurious OOM bugs
can only be reproduced in 1 out of 5 runs. Reducing the test time
could severely impact the testing power of a test, so I'm vary for
doing that.

But there are tricks to speed up some tests which shouldn't affect the
power of the testcase to find bugs, and we should definitely look into
those.

> So I think longer-term we simply need to throw more machines at the problem
> and run testcases in parallel on identical machines.
>
> [He, Shuang] This would be the perfect way to go if all tests are really
> feasible to take long time to run. If we get more identical test machines,
> then problem solved
>
> [Guang YANG] shuang’s PRTS can cover some work for i-g-t testing and catch
> some regressions. Most of the i-g-t bugs are from HSW+, so I hope keep focus
> on these new platforms.  but now we don’t have enough free machine resource
> (such as BYT,BDW)to support one machine only run i-g-t in nightly.

Does this mean that due to PRTS we now have fewer machines running
tests on drm-intel-nightly? I've thought the idea is to share machines
on an as-needed basis, with -nightly testing getting priority?

> Wrt analyzing issues I think the right approach for moving forward is:
> a) switch to piglit to run tests, not just enumerate them. This will allow
> QA and developers to share testcase analysis.
>
> [He, Shuang] Yes, though this could not actually accelerate the test. We
> could directly wrap over piglit to run testing (have other control process
> to monitor and collecting test results)
>
> [Guang YANG] Yeah, Shuang said is what we did. Piglit have been improved
> more powerful, but our infrastructure have better remote control and result
> collecting. If it will be comfortable for Developers to see the case result
> from running piglit, we can discuss how to match these two framework
> together.

Yeah keeping your overall test-runner infrastructure makes sense. The
idea behind my proposal to use piglit to execute the individual tests
is to share analysis scripts. That won't make the tests run any
faster, but it should (in the long term at least) speed up the
triaging a lot. And the high amount of time required for bug triaging
also seems to be an issue for you guys.

> b) add automated analysis for time-consuming and error prone cases like
> dmesg warnings and backtraces. Thomas&I have just discussed a few ideas in
> this are in our 1:1 today.
>
> Reducing the set of igt tests we run is imo pointless: The goal of igt is to
> hit corner-cases, arbitrarily selecting which kinds of corner-cases we test
> just means that we have a nice illusion about our test coverage.
>
> [He, Shuang] I don’t think select a subset of test cases to run is
> pointless. It’s a trade-off between speed and correctness. For our nightly
> testing it’s not so useful to run only a small set of testing. But for fast
> sanity testing, it should be easier, which is supposed to catch regression
> in major/critical functionality (So other developers and QA could continue
> their work).

I agree that for a quick sanity test a reduced test set makes sense.
Which is why we have a testcase naming convention which can be used
together with the piglit -x and -t flags. I do that a lot when
developing things.

But for regression testing imo only the full test suite makes sense,
otherwise we just have a false sense of security. I.e. if the full set
means we can only run it every 2 days then I prefer that over running
only a subset. Also very often there are other issues delaying the
time between when a buggy patch was committed and when the full bug
report is available, so imo the 10h runtime isn't too bad from my pov
really.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch