[Mesa-dev] [v2 PATCH 09/16] glsl: Optimize clamp(x, 0, 1) as saturate(x)

Mon Jul 7 23:47:01 PDT 2014

On Tue, Jul 8, 2014 at 8:22 AM, Matt Turner <mattst88 at gmail.com> wrote:
> On Mon, Jul 7, 2014 at 11:00 PM, Erik Faye-Lund <kusmabite at gmail.com> wrote:
>> On Mon, Jul 7, 2014 at 7:18 PM, Matt Turner <mattst88 at gmail.com> wrote:
>>> This comment tripped me up for a second. This really means that you've
>>> found either
>>>
>>>   - min(max(x, 0.0), 1.0); or
>>>   - max(min(x, 1.0), 0.0)
>>
>> Hmm, but are optimizing both of these to saturate OK? Shouldn't
>> min(max(NaN, 0.0), 1.0) give 0.0, whereas max(min(NaN, 1.0), 0.0) give
>> 1.0?
>
> Under standard IEEE rules, wouldn't the NaN propagate through all of
> these expressions?
>
> The GLSL 4.40 spec says
>
> "Operations and built-in functions that operate on a NaN are not required to
>  return a NaN as the result."
>
> So it seems like we have a lot of flexibility here. Is there some text
> I'm missing?

I think the point about flexibility is a bit weak or even
misunderstood: surely this applies to the built-ins used, not what we
generate after optimizing. So if we chose to do that, we'd need to
prevent min and max from propagating NaN even when issued stand-alone,
which might negatively impact performance in some ISAs.

As to why I reacted, I was just remembering that I read somewhere (one
of Humus' low-level shader optimization paper, IIRC) that the HLSL
compiler refused to do some similar optimizations for NaN-reasons.

Checking the spec a bit closer, though:
- min(x, y) is defined as "Returns y if y < x, otherwise it returns x"
- max(x, y) is defined as "Returns y if x < y, otherwise it returns x".

All comparisons with NaN should AFAIK fail, making both the first and
second comparison return NaN, if NaN were to be "properly" supported.
So my initial analysis was wrong. However, all of the following
permutations of the same logic would still be inconsistent.

min(max(0.0, x), 1.0)
max(min(1.0, x), 0.0)
min(1.0, max(0.0, x))
max(0.0, min(1.0, x))
min(1.0, max(x, 0.0))
max(0.0, min(x, 1.0))

I don't understand the code well enough to figure out if the patch
optimizes these, though.