[Mesa-dev] [PATCH 11/22] nir: Recognize some more open-coded fmin / fmax

Roland Scheidegger sroland at vmware.com
Mon Feb 26 22:51:33 UTC 2018


Am 26.02.2018 um 23:21 schrieb Ian Romanick:
> On 02/23/2018 05:14 PM, Jason Ekstrand wrote:
>> On Fri, Feb 23, 2018 at 3:55 PM, Ian Romanick <idr at freedesktop.org
>> <mailto:idr at freedesktop.org>> wrote:
>>
>>     From: Ian Romanick <ian.d.romanick at intel.com
>>     <mailto:ian.d.romanick at intel.com>>
>>
>>     shader-db results:
>>
>>     Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
>>     total instructions in shared programs: 14514817 -> 14514808 (<.01%)
>>     instructions in affected programs: 229 -> 220 (-3.93%)
>>     helped: 3
>>     HURT: 0
>>     helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 4
>>     helped stats (rel) min: 2.86% max: 4.12% x̄: 3.70% x̃: 4.12%
>>
>>     total cycles in shared programs: 533145211 -> 533144939 (<.01%)
>>     cycles in affected programs: 37268 -> 36996 (-0.73%)
>>     helped: 8
>>     HURT: 0
>>     helped stats (abs) min: 2 max: 134 x̄: 34.00 x̃: 2
>>     helped stats (rel) min: 0.02% max: 14.22% x̄: 3.53% x̃: 0.05%
>>
>>     Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown)
>>     total cycles in shared programs: 257618409 -> 257618403 (<.01%)
>>     cycles in affected programs: 12582 -> 12576 (-0.05%)
>>     helped: 3
>>     HURT: 0
>>     helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
>>     helped stats (rel) min: 0.05% max: 0.05% x̄: 0.05% x̃: 0.05%
>>
>>     No changes on Iron Lake or GM45.
>>
>>     Signed-off-by: Ian Romanick <ian.d.romanick at intel.com
>>     <mailto:ian.d.romanick at intel.com>>
>>     ---
>>      src/compiler/nir/nir_opt_algebraic.py | 2 ++
>>      1 file changed, 2 insertions(+)
>>
>>     diff --git a/src/compiler/nir/nir_opt_algebraic.py
>>     b/src/compiler/nir/nir_opt_algebraic.py
>>     index d40d59b..f5f9e94 100644
>>     --- a/src/compiler/nir/nir_opt_algebraic.py
>>     +++ b/src/compiler/nir/nir_opt_algebraic.py
>>     @@ -170,6 +170,8 @@ optimizations = [
>>         (('fge', ('fneg', ('fabs', a)), 0.0), ('feq', a, 0.0)),
>>         (('bcsel', ('flt', b, a), b, a), ('fmin', a, b)),
>>         (('bcsel', ('flt', a, b), b, a), ('fmax', a, b)),
>>     +   (('bcsel', ('fge', b, a), a, b), ('fmin', a, b)),
>>     +   (('bcsel', ('fge', a, b), a, b), ('fmax', a, b)),
>>
>>
>> Please flag as inexact.  As per the stupid GLSL definition, these are
>> not the same as fmin/fmax when you throw in a NaN.
> 
> I'm having some trouble rectifying this with the existing
> transformations and the Intel hardware implementation.
> 
> GLSL spec says min(x, y) "Returns y if y < x; otherwise it returns x."
> From that I infer min(x, NaN) == x, and min(NaN, y) == NaN.  The
> expression ('bcsel', ('flt', b, a), b, a) has the same behavior.
> 
> I think if I rewrite the fmin transform as (swapping the argument order)
> 
>     (('bcsel', ('fge', a, b), b, a), ('fmin', a, b)),
> 
> it should be at least as valid for as the existing transforms.  A
> similar modification should work for fmax.
> 
> The Intel SEL instruction which says that with the .L or .GE modifier,
> if one argument is NaN, the other value is always returned.  This means
> that min(NaN, y) will be y.
> 
> This is valid for min and max because section 4.7.1 (Range and
> Precision) says:
> 
>     Operations and built-in functions that operate on a NaN are not
>     required to return a NaN as the result.
> 
> I don't think returning non-NaN for ('bcsel', ('flt', b, NaN), b, NaN)
> is valid, so I think the existing transformations should also be marked
> inexact for platforms that implement the "never NaN" behavior for fmin
> or fmax.
Everybody will return non-nan for fmin/fmax (if you don't, some apps
will very likely break), at least for pc graphic chips (with the
exception of pre-dx10 chips, where all bets are off). This is the
required behavior by dx10 (hence what everyone will do, since gl doesn't
have a required behavior as far as NaNs are concerned). So yes,
transforming bcsel + flt/fge into fmin/fmax looks inexact to me too.
(glsl NaN behavior really sucks - in theory nearly everything goes, but
in practice apps will break if you don't follow more or less dx10 rules...)

Roland


> 
>>         (('bcsel', ('inot', a), b, c), ('bcsel', a, c, b)),
>>         (('bcsel', a, ('bcsel', a, b, c), d), ('bcsel', a, b, d)),
>>         (('bcsel', a, True, 'b at bool'), ('ior', a, b)),
>>     --
>>     2.9.5
>>
>>     _______________________________________________
>>     mesa-dev mailing list
>>     mesa-dev at lists.freedesktop.org <mailto:mesa-dev at lists.freedesktop.org>
>>     https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=svCuExX5hvVdvO6FOv-EAbkLHTwvBj8vZ2p7r-AXx28&s=40JZMjENRthrMifGyr5Y9JqxTeCxba6QBTkrgaFTkos&e=
>>     <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=svCuExX5hvVdvO6FOv-EAbkLHTwvBj8vZ2p7r-AXx28&s=40JZMjENRthrMifGyr5Y9JqxTeCxba6QBTkrgaFTkos&e=>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=svCuExX5hvVdvO6FOv-EAbkLHTwvBj8vZ2p7r-AXx28&s=40JZMjENRthrMifGyr5Y9JqxTeCxba6QBTkrgaFTkos&e=
> 



More information about the mesa-dev mailing list