[Intel-gfx] [RFC i-g-t] igt: Test tagging support

Thu Jun 29 09:34:46 UTC 2017

On Thu, Jun 29, 2017 at 10:23:03AM +0100, Tvrtko Ursulin wrote:
> 
> On 29/06/2017 09:49, Daniel Vetter wrote:
> > On Tue, Jun 27, 2017 at 03:03:08PM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 27/06/2017 14:18, Daniel Vetter wrote:
> > > > On Tue, Jun 27, 2017 at 12:59:33PM +0100, Tvrtko Ursulin wrote:
> > > > > From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > > > > 
> > > > > Proposal to add test tags as a replacement for separate test
> > > > > list which can be difficult to maintain and get out of date.
> > > > > 
> > > > > Putting this maintanenace inline with tests makes it easier
> > > > > to remember to update the, now implicit, lists, and also enables
> > > > > richer test selection possibilities for the test runner.
> > > > > 
> > > > > Test tags are string tokens separated by spaces meaning of which
> > > > > we decide by creating a convetion and helped by the suitable
> > > > > helper macros.
> > > > > 
> > > > > For example tags can be things like: gem, kms, guc, flip,
> > > > > semaphore, bz12345678, gt4e, etc..
> > > > > 
> > > > > At runtime this would look something like this:
> > > > > 
> > > > >     root at e31:~/intel-gpu-tools# tests/gem_sync --list-subtests-with-tags
> > > > >     default TAGS="gem "
> > > > >     store-default TAGS="gem "
> > > > >     many-default TAGS="gem "
> > > > >     forked-default TAGS="gem "
> > > > >     forked-store-default TAGS="gem "
> > > > >     render TAGS="gem "
> > > > >     store-render TAGS="gem "
> > > > >     many-render TAGS="gem "
> > > > >     forked-render TAGS="gem "
> > > > >     forked-store-render TAGS="gem "
> > > > >     bsd TAGS="gem "
> > > > >     store-bsd TAGS="gem "
> > > > >     many-bsd TAGS="gem "
> > > > >     forked-bsd TAGS="gem "
> > > > >     forked-store-bsd TAGS="gem "
> > > > >     bsd1 TAGS="gem "
> > > > >     store-bsd1 TAGS="gem "
> > > > >     many-bsd1 TAGS="gem "
> > > > >     forked-bsd1 TAGS="gem "
> > > > >     forked-store-bsd1 TAGS="gem "
> > > > >     bsd2 TAGS="gem "
> > > > >     store-bsd2 TAGS="gem "
> > > > >     many-bsd2 TAGS="gem "
> > > > >     forked-bsd2 TAGS="gem "
> > > > >     forked-store-bsd2 TAGS="gem "
> > > > >     blt TAGS="gem "
> > > > >     store-blt TAGS="gem "
> > > > >     many-blt TAGS="gem "
> > > > >     forked-blt TAGS="gem "
> > > > >     forked-store-blt TAGS="gem "
> > > > >     vebox TAGS="gem "
> > > > >     store-vebox TAGS="gem "
> > > > >     many-vebox TAGS="gem "
> > > > >     forked-vebox TAGS="gem "
> > > > >     forked-store-vebox TAGS="gem "
> > > > >     each TAGS="gem basic"
> > > > >     store-each TAGS="gem basic"
> > > > >     many-each TAGS="gem basic"
> > > > >     forked-each TAGS="gem basic"
> > > > >     forked-store-each TAGS="gem "
> > > > >     all TAGS="gem basic"
> > > > >     store-all TAGS="gem basic"
> > > > >     forked-all TAGS="gem extended"
> > > > >     forked-store-all TAGS="gem extended"
> > > > > 
> > > > > Test runner can then enable usages like:
> > > > > 
> > > > >     ./run-tests --include gem --exclude kms,stress
> > > > > 
> > > > > Which I think could really be useful and more flexible than name
> > > > > based selection.
> > > > 
> > > > How is this different from name-based pattern matching? We could just
> > > > throw these tags into the names, which is pretty much what we do already.
> > > 
> > > Tags in names are not as flexible and clear. When do tags begin and names
> > > start, how ugly it will look with names getting long etc.
> > 
> > It might be ugly, but if we end up with 2 incomplete/ugly ways to tag
> > tests, then we're worse of. You're RFC looks like that.
> 
> Damn it, the RFC only looks like that because it only converted a couple of
> tests cases and not all of them. Not because I am proposing to have two
> parallel systems.

It's not just the current tests, you also get to re-train an entire team
of 20 people ...

> > > > > This RFC implementation automatically adds "gem" and "kms" tags
> > > > > to test binaries prefixed with those strings respectively.
> > > > 
> > > > Why do you want to filter for gem or kms only? If you want to locally test
> > > > for a feature, I'd expect you want a more focused test selection, using
> > > > naming patterns/exclusions. Or maybe just one test binary that you run.
> > > 
> > > Yep, and so I said early in the commit message. Here I forgot to explain
> > > that this gem/kms tag adding I've put in the RFC is just for tests who do
> > > not specify their own tags.
> > > 
> > > I've mentioned tags like gem, kms, guc, flip, semaphore, bz12345678, gt4e,
> > > etc..
> > > 
> > > > For CI this doesn't help us, since it doesn't give us a way to both
> > > > a) make sure new tests are by default in the extended/full/CI/whatever you
> > > > want to call it run
> > > > b) exclude the massive pile of stress tests we currently have and can't
> > > > run for logistical reasons.
> > > 
> > > I think it gives us both. You simply tag a new test with either
> > > basic/extended/stress and it is automatically included in the correct run.
> > > 
> > > > Adding some tags for stress tests, or nasty debug-hunting mode is what I
> > > > think we need, so that we can exclude them from default runs. Currently we
> > > > have two examples of those:
> > > > 
> > > > - kms_frontbuffer_tracking --show-hidden
> > > > - gem_concurrent_blt vs _all
> > > > 
> > > > Your patch doesn't address this, and since we have 2 independent
> > > 
> > > It does my introducing a stress keyword and it even marks gem_concurrent_
> > > with it.
> > 
> > What about kms_frontbuffer_tracking? We already have 2 ways to mark stress
> > tests, now you add a third. And yours has the problem that testcases are
> 
> Is the problem that I did not convert kms_frontbuffer_tracking in the sketch
> of a tagging system or something else?
> 
> > still run, even if tagged as "stress".
> 
> ?? "run-tests --include basic,extended --exclude stress" is an example of
> the proposal for the new system. So I don't get this criticism. Tagging
> tests with "stress" exactly enables you not to run them.

So if I run piglit igt.py, the stress tests won't run? That's the thing
I'm looking for, everything else I'm decidedly meh on (except I want ot
make sure we do actually have the people to do the massive amount of work,
and I'm not sold on the benefits of tags in general for everything).

> > > > inventions of some solution for this, it does seem to be a real problem. I
> > > > think tags would be a good way to fix this, but tags just to have more
> > > > structured names seems like lots of work for little gain (and we're not
> > > > really short on work to do).
> > > 
> > > Yes it is some work, but imho not so much.
> > > 
> > > Start with simply converting the current CI set which is not a long list.
> > > Then have an intern convert the extended list to tags. :) At this point we
> > > all work together to extend the extended list and tune the tests at the same
> > > time. This work is needed anyway if we want to increase coverage and fix
> > > tests, which we do want. So while we are doing that we can just tag them
> > > appropriately.
> > 
> > Not sure we need an intern, how exactly is an intern going to make good
> > judgement calls when not even most developers know what all the tests do?
> > I certainly don't ...
> 
> Intern was simply a response to the criticism that converting the existing
> testslists to tags is a big job. It is a mindless job for a first phase
> which brings feature parity to where we are today. All tests which are
> mentioned in the extended test list simply needed to be converted to
> igt_gem|kms_subtest("extended", "test-name").
> 
> Adding additional fine-tuned tags is an additional job on top which is
> something we don't have today anyway. So that one is a job for developers.

Who's going to do all that? You're proposing a massive rework here of how
igt test enumeration works ...

> > I think managing BAT as an explicit list is doable, and I don't think we
> > need to add/remove tests all that often (BAT is supposed to check whether
> > we can run validation runs, not validate itself).
> > 
> > The current accute problem we have is that we can filter out stress tests
> > and make them not run by default, without ending up with an unmanageable
> > whitelist. I think just introducing the stress tag (as a #define, not as a
> > random string, and with documentation) would be an actual real
> > improvement. At least if it's used to unify the other two ways we have to
> > auto-delist stress tests (without that it just makes things worse).
> > 
> > For general tagging we atm have the naming scheme, plus the feature
> > profile json (it's starting to get used I hope soonish). So if you want to
> > switch that over to tags we'd also need to figure out how to
> > integrate/replace the feature json stuff.
> > 
> > Just adding tags because tags sounds like a cool idea on top of all the
> > other things we do already still doesn't sound appealing to me.
> 
> It is not supposed to be on top, but to unify and replace.
> 
> But OK, I've tried to sell it and no one is liking is so fine. The only
> thing that bothers me that to me it sounds like criticism is missing the
> point, but I don't know how else to try and explain the system.

You've proposed to convert the add-hoc tagging + add-how hiding of stress
tests into tags. That's a massive project, you need the entire team on
board pretty much.

I don't think tags are bad, but much smaller scope will make it much more
likely they're actually going to be an improvement. And then we can go
from there ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch