[Mesa-dev] [PATCH] i965: Don't check for draw-time errors that cannot occur in core profile

Mon Aug 31 17:13:41 PDT 2015

On 08/31/2015 04:21 PM, Ilia Mirkin wrote:
> On Mon, Aug 31, 2015 at 7:06 PM, Ian Romanick <idr at freedesktop.org> wrote:
>> ping. :)
>>
>> On 08/10/2015 11:48 AM, Matt Turner wrote:
>>> On Mon, Aug 10, 2015 at 10:12 AM, Ian Romanick <idr at freedesktop.org> wrote:
>>>> From: Ian Romanick <ian.d.romanick at intel.com>
>>>>
>>>> On many CPU-limited applications, this is *the* hot path.  The idea is
>>>> to generate per-API versions of brw_draw_prims that elide some checks.
>>>> This patch removes render-mode and "is everything in VBOs" checks from
>>>> core-profile contexts.
>>>>
>>>> On my IVB laptop (which may have experienced thermal throttling):
>>>>
>>>> Gl32Batch7:     3.70955% +/- 1.11344%
>>>
>>> I'm getting 3.18414% +/- 0.587956% (n=113) on my IVB, , which probably
>>> matches your numbers depending on your value of n.
>>>
>>>> OglBatch7:      1.04398% +/- 0.772788%
>>>
>>> I'm getting 1.15377% +/- 1.05898% (n=34) on my IVB, which probably
>>> matches your numbers depending on your value of n.
>>
>> This is another thing that make me feel a little uncomfortable with the
>> way we've done performance measurements in the past.  If I run my test
>> before and after this patch for 121 iterations, which I have done, I can
>> cut the data at any point and oscillate between "no difference" or X%
>> +/- some-large-fraction-of-X%.  Since the before and after code for the
>> compatibility profile path should be identical, "no difference" is the
>> only believable result.
>>
>> Using a higher confidence threshold (e.g., -c 98) results in "no
>> difference" throughout, as expected.  I feel like 90% isn't a tight
>> enough confidence interval for a lot of what we do, but I'm unsure how
>> to determine what confidence level we should use.  We could
>> experimentally determine it by running a test some number of times and
>> finding the interval that detects no change in some random partitioning
>> of the test results.  Ugh.
> 
> (sorry, statistics rant below, can't help myself)
> 
> AFAIK the standard in statistics is to use a 95% confidence interval.

I had misremembered the default CI in ministat.  It does use 95%.  So,
s/90%/95%/g in my previous message. :)

> Unless you have 'statistician' in your job title [or someone with that
> job title has indicated otherwise], that's what you should probably
> use. Using anything lower than that is a way of saying "This
> scientific study isn't turning out the way I wanted, I'm going to have
> to take matters into my own hands".
> 
> Of course note that if you do run the same experiment 20 times, you
> should expect one of those 20 times to yield a confidence interval
> that does not include the true mean. And in general I'd be very
> suspicious of results where the change is near the confidence interval
> boundary.
> 
> And lastly, all this statistics stuff assumes that you're evaluating
> the same normal distribution repeatedly. This isn't exactly true, but
> it's true enough. However you can try to get more accurate by
> experimentally determining a fudge factor on the CI width. You could
> run (literal) no-op experiments lots of times and fudge the output of
> the CI width calculation until it matches up with empirical results,
> e.g. if you run the same experiment 100x and use a 95% CI, fudge the
> outcome until you end up with 5 significant results and 95
> insignificant ones. Ideally such a fudge factor should not be too
> different from 1, or else you have a very non-normal distribution, and
> fudge factors ain't gonna help you. Note that any computed fudge
> factors could not be shared among machine groups that weren't used to
> determine them, so I'm not seriously recommending this approach, but
> thought I'd mention it.
> 
>   -ilia
>