[Mesa-dev] [PATCH] i965: Don't check for draw-time errors that cannot occur in core profile

Tue Sep 1 00:05:11 PDT 2015

Ilia Mirkin <imirkin at alum.mit.edu> writes:

> On Tue, Sep 1, 2015 at 1:48 AM, Eirik Byrkjeflot Anonsen
> <eirik at eirikba.org> wrote:
>> Ian Romanick <idr at freedesktop.org> writes:
>>
>>> ping. :)
>>>
>>> On 08/10/2015 11:48 AM, Matt Turner wrote:
>>>> On Mon, Aug 10, 2015 at 10:12 AM, Ian Romanick <idr at freedesktop.org> wrote:
>>>>> From: Ian Romanick <ian.d.romanick at intel.com>
>>>>>
>>>>> On many CPU-limited applications, this is *the* hot path.  The idea is
>>>>> to generate per-API versions of brw_draw_prims that elide some checks.
>>>>> This patch removes render-mode and "is everything in VBOs" checks from
>>>>> core-profile contexts.
>>>>>
>>>>> On my IVB laptop (which may have experienced thermal throttling):
>>>>>
>>>>> Gl32Batch7:     3.70955% +/- 1.11344%
>>>>
>>>> I'm getting 3.18414% +/- 0.587956% (n=113) on my IVB, , which probably
>>>> matches your numbers depending on your value of n.
>>>>
>>>>> OglBatch7:      1.04398% +/- 0.772788%
>>>>
>>>> I'm getting 1.15377% +/- 1.05898% (n=34) on my IVB, which probably
>>>> matches your numbers depending on your value of n.
>>>
>>> This is another thing that make me feel a little uncomfortable with the
>>> way we've done performance measurements in the past.  If I run my test
>>> before and after this patch for 121 iterations, which I have done, I can
>>> cut the data at any point and oscillate between "no difference" or X%
>>> +/- some-large-fraction-of-X%.  Since the before and after code for the
>>> compatibility profile path should be identical, "no difference" is the
>>> only believable result.
>>
>> That's pretty much expected, I believe. In essence, you are running 121
>> tests, each with a 95% confidence interval and so should expect
>> somewhere around 5 "significant difference" results. That's not entirely
>> true of course, since these are not 121 *independent* tests, but the
>> basic problem remains.
>
> (more stats rants follow)

:)

> While my job title has never been 'statistician', I've been around a
> bunch of them. Just want to correct this... let's forget about these
> tests, but instead think about coin flips (of a potentially unfair
> coin). What you're doing is flipping the coin 100 times, and then
> looking at the number of times it came up heads and tails. From that
> you're inferring the mean of the distribution.
[...]

(I have a background in mathematics with a small amount of both
probability theory and statistics, but I haven't really worked with it,
so your background may make you more of a "statistician" than me :) )

I think what Ian was saying was that he was flipping the coin 100 times
and then after every flip checking whether the result so far suggested a
50/50 result (fair coin) or not. And he found that sometimes during the
run he would get a "fair coin" result, which he thought was in conflict
with the final "loaded coin" result. Thus he was questioning whether the
final "loaded coin" result was correct.

I was simplifying heavily to point out the problem with this particular
way of looking at the result.

Now, if I understood Ian's comment correctly, his main reason for
doubting the "loaded coin" result was that he thought there was no code
differences to explain the result. Which is a very good reason to
suspect a problem somewhere. I'm just saying that looking at partial
results of a full test run doesn't invalidate the result of the full
test run. More importantly, looking at partial results of a test run
does not provide any more information on the "truth" than the result of
the full test run (again a simplification, but only if you really know
what you're doing).

eirik