[Mesa-dev] [PATCH] nir/algebraic: Replace a-fract(a) with floor(a)

Mon Feb 25 19:06:01 UTC 2019

On 2/23/19 4:11 PM, Timothy Arceri wrote:
> 
> 
> On 23/2/19 4:09 pm, Ian Romanick wrote:
>> From: Ian Romanick <ian.d.romanick at intel.com>
>>
>> I noticed this while looking at a shader that was affected by Tim's
>> "more loop unrolling" series.
>>
>> All Gen6+ platforms had similar results. (Skylake shown)
>> total instructions in shared programs: 15437001 -> 15435259 (-0.01%)
>> instructions in affected programs: 213651 -> 211909 (-0.82%)
>> helped: 988
>> HURT: 0
>> helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1
>> helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59%
>> 95% mean confidence interval for instructions value: -1.89 -1.63
>> 95% mean confidence interval for instructions %-change: -1.23% -1.05%
>> Instructions are helped.
>>
>> total cycles in shared programs: 383007378 -> 382997063 (<.01%)
>> cycles in affected programs: 1650825 -> 1640510 (-0.62%)
>> helped: 679
>> HURT: 302
> 
> Why the hurt on Gen6+ is this something that should be in the late
> optimisations pass?

As far as I can tell, it's just because our scheduler is terrible.  In
all the fragment shaders that I looked at (some hurt shaders were from
other stages), only one of the SIMD8 or SIMD16 version would be hurt.
In many of those case, the other SIMD width is improved (e.g.,
shaders/closed/steam/brutal-legend/3990.shader_test).

Often it looks like the scheduler decides to differently schedule a SEND
the occurs somewhere early in the shader.  Once that happens, everything
is different. :(

I looked at one vertex shader that was hurt (from Goat Simulator).  In
that case, both the floor and fract are used.  The optimization
eliminates the add, and it should allow better scheduling.  In the area
of the FRC and RNDD instructions, the scheduler does the right thing.
However, later in the shader a MAD and and ADD get scheduled
differently, and that makes it slightly worse.

In light of this, I tried adding some "is_used_once" mark-up, and that
did not fix all the cycles regressions.  It also did a lot more harm
than good on SKL:

total cycles in shared programs: 382997063 -> 382998953 (<.01%)
cycles in affected programs: 549527 -> 551417 (0.34%)
helped: 82
HURT: 241
helped stats (abs) min: 1 max: 26 x̄: 6.88 x̃: 6
helped stats (rel) min: 0.06% max: 2.04% x̄: 0.56% x̃: 0.44%
HURT stats (abs)   min: 1 max: 120 x̄: 10.18 x̃: 14
HURT stats (rel)   min: 0.04% max: 3.86% x̄: 0.63% x̃: 0.52%
95% mean confidence interval for cycles value: 4.44 7.26
95% mean confidence interval for cycles %-change: 0.24% 0.42%
Cycles are HURT.

>> helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14
>> helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98%
>> HURT stats (abs)   min: 1 max: 250 x̄: 18.43 x̃: 7
>> HURT stats (rel)   min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53%
>> 95% mean confidence interval for cycles value: -13.05 -7.98
>> 95% mean confidence interval for cycles %-change: -0.86% -0.50%
>> Cycles are helped.
>>
>> Iron Lake and GM45 had similar results. (GM45 shown)
>> total instructions in shared programs: 5043616 -> 5043010 (-0.01%)
>> instructions in affected programs: 119691 -> 119085 (-0.51%)
>> helped: 432
>> HURT: 0
>> helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1
>> helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39%
>> 95% mean confidence interval for instructions value: -1.58 -1.23
>> 95% mean confidence interval for instructions %-change: -0.72% -0.59%
>> Instructions are helped.
>>
>> total cycles in shared programs: 128139812 -> 128135762 (<.01%)
>> cycles in affected programs: 3829724 -> 3825674 (-0.11%)
>> helped: 602
>> HURT: 0
>> helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6
>> helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10%
>> 95% mean confidence interval for cycles value: -8.40 -5.05
>> 95% mean confidence interval for cycles %-change: -0.22% -0.16%
>> Cycles are helped.
>> ---
>>   src/compiler/nir/nir_opt_algebraic.py | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/src/compiler/nir/nir_opt_algebraic.py
>> b/src/compiler/nir/nir_opt_algebraic.py
>> index ba27d702b5d..c8fc938cc8f 100644
>> --- a/src/compiler/nir/nir_opt_algebraic.py
>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>> @@ -127,6 +127,7 @@ optimizations = [
>>      (('flrp at 32', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a),
>> 'options->lower_flrp32'),
>>      (('flrp at 64', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a),
>> 'options->lower_flrp64'),
>>      (('ffloor', a), ('fsub', a, ('ffract', a)),
>> 'options->lower_ffloor'),
>> +   (('fadd', a, ('fneg', ('ffract', a))), ('ffloor', a),
>> '!options->lower_ffloor'),
>>      (('ffract', a), ('fsub', a, ('ffloor', a)),
>> 'options->lower_ffract'),
>>      (('fceil', a), ('fneg', ('ffloor', ('fneg', a))),
>> 'options->lower_fceil'),
>>      (('~fadd', ('fmul', a, ('fadd', 1.0, ('fneg', ('b2f', 'c at 1')))),
>> ('fmul', b, ('b2f', c))), ('bcsel', c, b, a), 'options->lower_flrp32'),