[Intel-gfx] Making IGT runnable by CI and developers

Martin Peres martin.peres at linux.intel.com
Thu Jul 20 16:23:15 UTC 2017


Hi everyone,

As some of you may already know, we have made great strides in making 
our CI system usable, especially in the last 6 months when everything 
started clicking together.

The CI team is no longer overwhelmed with fires and bug reports, so we 
started working on increasing the coverage from just fast-feedback, to a 
bigger set of IGT tests.

As some of you may know, running IGT has been a challenge that few 
manage to overcome. Not only is the execution time counted in machine 
months, but it can also lead to disk corruption, which does not 
encourage developers to run it either. One test takes 21 days, on its 
own, and it is a subset of another test which we never ran for obvious 
reasons.

I would thus like to get the CI team and developers to work together to 
decrease sharply the execution time of IGT, and get these tests run 
multiple times per day!

There are three usages that the CI team envision (up for debate):
  - Basic acceptance testing: Meant for developers and CI to check 
quickly if a patch series is not completely breaking the world (< 10 
minutes, timeout per test of 30s)
  - Full run: Meant to be ran overnight by developers and users (< 6 hours)
  - Stress tests: They can be in the test suite as a way to catch rare 
issues, but they cannot be part of the default run mode. They likely 
should be run on a case-by-case basis, on demand of a developer. Each 
test could be allowed to take up to 1h.

There are multiple ways of getting to this situation (up for debate):

  1) All the tests exposed by default are fast and meant to be run:
   - Fast-feedback is provided by a testlist, for BAT
   - Stress tests ran using a special command, kept for on-demand testing

  2) Tests are all tagged with information about their exec time:
   - igt at basic@.*: Meant for BAT
   - igt at complete@.*: Meant for FULL
   - igt at stress@.*: The stress tests

  3) Testlists all the way:
   - fast-feedback: for BAT
   - all: the tests that people are expected to run (CI will run them)
   - Stress tests will not be part of any testlist.

Whatever decision is being accepted, the CI team is mandating global 
timeouts for both BAT and FULL testing, in order to guarantee 
throughput. This will require the team as a whole to agree on time 
quotas per sub-systems, and enforce them.

Can we try to get some healthy debate and reach a consensus on this? Our 
CI efforts are being limited by this issue right now, and we will be 
doing whatever we can until the test suite becomes saner and runnable, 
but this may be unfair to some developers.

Looking forward to some constructive feedback and intelligent discussions!
Martin


More information about the Intel-gfx mailing list