[Mesa-dev] [PATCH] i965: Don't check for draw-time errors that cannot occur in core profile

Tue Sep 1 12:52:11 PDT 2015

On Tue, Sep 1, 2015 at 3:45 PM, Eirik Byrkjeflot Anonsen
<eirik at eirikba.org> wrote:
> Ilia Mirkin <imirkin at alum.mit.edu> writes:
>
>> On Tue, Sep 1, 2015 at 12:15 PM, Ian Romanick <idr at freedesktop.org> wrote:
>>> For a bunch of the small changes, I don't care too much what the
>>> difference is.  I just want to know whether after is better than before.
>>
>> And that gets back to my comment that you can't *measure* the impact
>> of a change. Not with something where the outcome is a random
>> variable. It can't be done.
>>
>> All you can do is answer the question "is X's mean more than N higher
>> than Y's mean". And you change the number of trials in an experiment
>> depending on N. (There's also more advanced concepts like 'power' and
>> whatnot, I've done just fine without fully understanding them, I
>> suspect you can too.)
>
> Power is (IIRC) just an opposite of the p-value. That is, the p-value
> gives you the probability of false positives and the power is the
> probability of false negatives (or rather 1-power, but whatever). The
> complication is that you usually choose the p-value (typically 0.05)
> while the power has to be calculated.
>
>> As an aside, increasing the number of trials until you get a
>> significant result is a great way to arrive at incorrect decisions,
>> due to the multi-look problem (95% CI means 1/20 gives you bad
>> results). The proper way is to decide beforehand "I care about changes
>>>0.1%, which means I need to run 5000 trial runs"
>
> The trick could be to run a sequence of tests until you find how many
> trials are needed for significance. Then you can check if you get
> repeatable results with that many trials, in which case you are safe.

Well, it's all very deterministic, just simple math. Which is annoying
to do, especially in terms of percentage change. So the empirical is
fine. But if you don't decide on a desired CI width up front, you're
in for a spankin' from your friendly local statistician.

>
> The key word is of course "repeatable". If you have a correctly executed
> test that gives repeatable "significant difference", it usually doesn't
> matter too much how you figured out which parameters were needed.
> ("usually", because you could run into a choice of parameters that
> invalidates the whole test. But just increasing the number of trials
> shouldn't do that.)
>
> Which brings us to the clear value of multiple people running similar
> tests and getting similar results. That strengthens the conclusion
> significantly.
>
>> (based on the
>> assumption that 50 runs gets you 1%). After doing the 5k runs, your CI
>> width should be ~0.1% and you should then be able to see if the delta
>> in means is higher or lower than that. If it's higher, then you've
>> detected a significant change. If it's not, that btw doesn't mean "no
>> change", just not statistically significant. There's also a procedure
>> for the null hypothesis (i.e. is a change's impact <1%) which is
>> basically the same thing but involves doing a few more runs (like 50%
>> more? I forget the details).
>
> Hmm, you could just formulate your null hypothesis as "the change is
> greater than 1%" and then test that normally.

Yeah, but there's a difference between "the change is not
statistically greater than 1%" and "the change is statistically
smaller than 1%".

>
>> Anyways, I'm sure I've bored everyone to death with these pedantic
>> explanations, but IME statistics is one of the most misunderstood
>> areas of math, especially among us engineers.
>>
>>   -ilia
>
> What, statistics boring? No way! :)
>
> eirik
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev