[Piglit] [PATCH] summary: fix support for old results file with duplicated subtests

Thu May 29 06:23:17 PDT 2014

On Thu, May 29, 2014 at 8:37 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Thu, May 29, 2014 at 12:59 AM, Kenneth Graunke <kenneth at whitecape.org> wrote:
>> On 05/28/2014 07:17 PM, Ilia Mirkin wrote:
>>> Old files have duplicated entries for each subtest, in addition to a
>>> filled subtest dictionary. Detect that the current test name is also a
>>> subtest and treat it as though it were a complete test. This may have
>>> false-negatives, but they're unlikely given test/subtest naming
>>> convention.
>>>
>>> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
>>> ---
>>>
>>> Dylan, I'm sure you hate this, but it does seem to work for me. Not sure where
>>> you are with your fix, but this is a tool that lots of people use, so it does
>>> need to be addressed. And keep in mind that by now there are both the "old"
>>> and "new" formats running around, so just slapping a version number in there
>>> won't be enough.
>>
>> Seriously?  We fixed a bug.  Subtests were *broken* - it stored insane
>> amounts of duplicate data in the files, to work around a bug in the
>> tools that processed those data files.  This caused huge amounts of
>> wasted space and confusion.
>>
>> I don't understand the whole "let's not re-run Piglit to get a proper
>> baseline unless something breaks" thinking.  It only takes 10-15 minutes
>> to do a full Piglit run here.  Taking a proper baseline allows you to
>> have confidence that any changes you see were caused by your patches,
>> and not by other people's changes to Mesa or Piglit.  It just seems like
>> good practice.
>>
>> Have things gotten to the point where we can't even fix a bug without
>> people requesting reverts or workarounds?  It's bad enough that people
>> keep insisting that we have to make this software work on 4 year old
>> Python versions.
>>
>> Dylan's patches were on the list waiting for over a month, and bumped
>> after two weeks, and AFAICS fix a long-standing bug.  All people have to
>> do is re-run Piglit to get data files that aren't *broken*.  If the
>> Piglit community won't even let us commit bug fixes, I don't know why I
>> should continue contributing to this project.
>>
>> (Ilia - this isn't complaining about you specifically - it's just the
>> attitude of the community in general I've observed over the last few
>> months that frustrates me.  It seems like any time we commit anything,
>> there are very vocal objections and people calling for reverts.  And
>> that really frustrates me.)
>
> Hi Ken,
>
> First of all, I'd like to point out that at no point in time did I
> complain about something being checked in or call for a revert. Merely
> pointing out that certain use-cases should be supported, and had been,
> but were recently broken. Bugs happen, but I'm surprised that not
> everyone here agrees that this _is_ a bug. I don't have the
> bandwidth/time/desire to review and test every piglit change, and this
> seemed like a particularly nasty one, so I skipped it. I'm very happy
> that the fix was done, I had noticed the subtests insanity myself and
> it also annoyed me (although not enough for me to actually try to fix
> it... xz is really good at compression).
>
> At Intel, there are 2 relevant chips that anyone cares about (gen7 and
> gen7.5 from the looks of it), and maybe 3 more that are borderline
> (gen6, gen5, gen4), but there are a lot more NVIDIA chips out there.
> You all have easy access to all of these chips (perhaps not at your
> desk, but if you really wanted to find a gen4 chip, I suspect you
> could without too big of a hassle). I personally have access to a very
> limited selection and have to ask others to run the tests, or swap in
> cards, or whatever. There can even be kernel interactions, which adds
> a whole dimension to the testing matrix. The vast, vast, *vast*
> majority of piglit tests don't change names/etc, so outside of a few
> oddities, piglit runs are comparable across different piglit
> checkouts.
>
> Each piglit run takes upwards of 40-60 minutes and has the potential
> to crash the machine. This is only counting the tests/gpu.py tests
> (since tests/quick.py includes tons of tests I don't touch the code
> for, like compiler/etc). It is this slow in large part because they're
> run single-threaded and capture dmesg, but even if I didn't care about
> dmesg, nouveau definitely can't handle multithreaded. You could say
> "fix your driver!" but it's not quite that easy.
>
> Anyways, if I'm the only one who cares about being able to compare
> across piglit runs from different times, I'll drop the issue and stop
> trying to track failures on nouveau. I'm relatively certain that it
> would reverse a recent trend of improving piglit results on nouveau
> though.
>
>   -ilia

A few additional thoughts:

- I'm waiting for a Tested-by from someone before checking this in.
That'll indicate I'm not the only crazy person who wants this.
- Perhaps an additional difference is one of approach. Nouveau fails a
lot of tests. Some tests fail on only some chips, that can make it
easier to identify what is wrong and why. Having historical results
really makes this a lot easier. For example Fermi vs Kepler, or with
the various revisions within the Tesla family.
- Comparing historical results makes it easier to track bugs in piglit
as well (although TBH I can't remember any such specific instance).
- Adding versioning to the piglit output would be great. Both a
version number and a piglit checkout revision. This would allow us to
have saner logic when we make changes in the future

  -ilia