[Mesa-dev] [PATCH 6/7] nir: Teach nir_opt_algebraic about adding and subtracting the same thing

Jason Ekstrand jason at jlekstrand.net
Thu Dec 17 18:06:31 PST 2015


On Dec 17, 2015 1:21 AM, "Eero Tamminen" <eero.t.tamminen at intel.com> wrote:
>
> Hi,
>
>
> On 12/17/2015 01:52 AM, Matt Turner wrote:
>>
>> On Tue, Dec 15, 2015 at 1:16 AM, Eduardo Lima Mitev <elima at igalia.com>
wrote:
>>>
>>> On 12/15/2015 09:28 AM, Kristian Høgsberg Kristensen wrote:
>>>>
>>>> This optimizes a + b - b to just a. Modest shader-db results (BDW):
>>>>
>>>>    total instructions in shared programs: 7842452 -> 7841862 (-0.01%)
>>>>    instructions in affected programs:     61938 -> 61348 (-0.95%)
>>>>    total loops in shared programs:        2131 -> 2131 (0.00%)
>>>>    helped:                                263
>>>>    HURT:                                  0
>>>>    GAINED:                                0
>>>>    LOST:                                  0
>>>
>>>
>>> In HSW, I get these shader-db results:
>>>
>>> total instructions in shared programs: 6257265 -> 6256788 (-0.01%)
>>> instructions in affected programs: 46601 -> 46124 (-1.02%)
>>> helped: 218
>>> HURT: 0
>>>
>>> total cycles in shared programs: 56010026 -> 56007760 (-0.00%)
>>> cycles in affected programs: 1048392 -> 1046126 (-0.22%)
>>> helped: 199
>>> HURT: 154
>>>
>>> total loops in shared programs: 1979 -> 1979 (0.00%)
>>> loops in affected programs: 0 -> 0
>>> helped: 0
>>> HURT: 0
>>>
>>> LOST:   0
>>> GAINED: 0
>>>
>>> I wonder where those cycle HURTs come from. In any case, the net result
>>> is positive.
>>
>>
>> I haven't confirmed, but I've seen cases that seem like the cycle
>> counts are wrong.
>
>
> I have doubts about the correctness of latency values set in
brw_schedule_instructions.cpp.
>
> They were added mostly by Eric on 2012 & 2013.  You added mad & lrp data
in 2013 and Curro untyped atomics & surface reads in 2013.  Both of them
have is_haswell check, but don't say anything about newer generations.
>
> It seems that some of the values are from spec and some from tests.
However, for the test data, the code doesn't say on what exact HW and
stepping the tests were run on.  Or where the sources for those tests are
so that one could try to reproduce the results, verify (with perf counters)
that they actually are bound by what the test says, and update data gotten
from them for newer generations (i.e. GEN8+).

It would be pretty fantastic to get updated or more reliable data.
However, I think what we have is probably good enough.  The important part
is that 3src are slightly more expensive and sends are hugely expensive and
I think the current numbers capture that well enough to be moderately
useful.  As far add register banks go, we've had trouble getting real data
or even documentation on that.

> In addition to this, Mesa is lacking at least stall cycles for 3src
register bank conflicts.
>
>
>         - Eero
>
> PS. cycle values are anyway going to be off, code doesn't know memory
latencies as that depends on locality & cache utilization, and it doesn't
take threading into account.  But it only tries to schedule things so that
HW is able to better compensate latency, so it doesn't need to know how
much cycles take, just have good enough estimate. :-)

For that matter, it can depend on how many vertices you have in the pipe or
how big your polygon is.  It'll never be perfect; there's only so much you
can know from only the shader code.  :-).  However, a program with a
smaller theoretical cycle count will probably execute faster than one with
a higher cycle count, so it is useful.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20151217/1d26e4c4/attachment-0001.html>


More information about the mesa-dev mailing list