[Mesa-dev] [PATCH 2/8] glsl: Fix constant evaluation of the rcp op.

Thu Jan 26 22:33:39 UTC 2017

Ian Romanick <idr at freedesktop.org> writes:

> On 01/26/2017 12:20 PM, Francisco Jerez wrote:
>> Ian Romanick <idr at freedesktop.org> writes:
>> 
>>> On 01/25/2017 10:53 AM, Francisco Jerez wrote:
>>>> Hi Ian, and thank you for your comments,
>>>>
>>>> Ian Romanick <idr at freedesktop.org> writes:
>>>>
>>>>> On 01/24/2017 03:26 PM, Francisco Jerez wrote:
>>>>>> Will avoid a regression in a future commit that introduces some
>>>>>> additional rcp operations.
>>>>>
>>>>> When I converted GLSL IR to ir_expression_operation.py, I was careful to
>>>>> keep all the expressions the same.  rcp and div had these weird guards.
>>>>> GLSL doesn't require that NaN be generated, and quite a few old GPUs
>>>>> don't.  If the atan2 implementation depends on NaN being generated by
>>>>> rcp, it may have problems on i915, r300, and similar GPUs.  I don't know
>>>>> what they generate, but it's not NaN and it's probably not 0.0.
>>>>
>>>> The atan2 implementation from patch 5 doesn't rely on NaNs being
>>>> generated, but it does rely on the reciprocal operation handling zero
>>>> and infinity correctly as specified by GLSL for the division operation.
>>>
>>> Okay.  That is the problem on older GPUs that I was referring.
>>> Specifically, all versions of the GLSL prior to 4.40 (!) say:
>>>
>>>     Similarly, treatment of conditions such as divide by 0 may lead to
>>>     an unspecified result, but in no case should such a condition lead
>>>     to the interruption or termination of processing.
>> 
>> I cannot find this paragraph in any of the GLSL 4.1+ specs.
>
> Somehow I botched my search.  GLSL 4.10 was the first version to drop
> this in favor of, "Dividing by 0 results in the appropriately signed
> IEEE Inf."
>
>>> I believe that DX11 requires the GLSL 4.40+ behavior, so all even
>>> somewhat modern devices should just work.  It's all the pre-DX11
>>> hardware that's might be a problem.
>> 
>> Actually I don't think PATCH 5 necessarily cares about infinities not
>> getting generated -- The only requirement is for rcp(0) to return a
>> fairly large value in absolute value (the larger the more accurate the
>> result will be), even the sign of the result is pretty much irrelevant.
>
> I expect that's what non-Inf GPUs do / did... generate float MAXVAL.
> Now that I'm understanding (and remembering) all the issues better, I'm
> a bit less worried.  As I mentioned in the previous message, I'd like to
> see someone test this on r300... I think I still have one somewhere, but
> I won't be able to test it for at least two weeks.
>
>> That said there's a relatively straightforward change we could apply to
>> PATCH 5 and 6 in order to make the calculation more robust against
>> division by zero (it will probably make this patch unnecessary to avoid
>> piglit regressions but I think we want it anyway because of GLSL 4.1+):
>
> Yes... I believe we do need this patch for GLSL 4.10 correctness.
>
>> We could leverage the coordinate rotation to turn (y, 0) into (0, y),
>> avoiding division by zero along the whole vertical line -- The only
>> remaining case where we could potentially divide by zero is along the
>> left y=0 half-line, but the function jumps from -π to π along that line
>> so returning an undefined value within [-π, π] for y=0 and x < 0 is very
>> unlikely to hurt, because AFAICT all GLSL versions lacking well-defined
>> divide by zero didn't require the implementation to represent the sign
>> of zero consistently either, so the shader is unlikely to be able to
>> generate a signed zero value accurately enough to notice the problem...
>> 
>>> Now... talking to Jason just now, he reminded me that the spec also says
>>> the following about built-in functions:
>>>
>>>     Function parameters specified as angle are assumed to be in units
>>>     of radians. In no case will any of these functions result in a
>>>     divide by zero error. If the divisor of a ratio is 0, then results
>>>     will be undefined.
>>>
>> Heh... That could be interpreted as if atan2(y, 0) is undefined which
>> would make this discussion moot -- I'll send a v2 of PATCH 5 and 6
>> anyway since it should be easy enough to get right.
>
> I think it allows atan2(y, 0) to have an undefined result, but atan2(0,
> -abs(x)) should still produce 0.

Nope, atan2(±0, -abs(x)) is right along the discontinuity, and the
expected value would jump from -π to π depending on the sign of zero.

>  Based on my reading of patch 5, those inputs would lead to rcp(0).
>

Yeah, but then again hardware unable to give IEEE-compliant results for
division by zero is unlikely to be able to generate signed zero
accurately enough, so the result is going to have a maximum absolute
error of 2π anyway.

>>> We may be fine even on old, clunky hardware.  Looking at the code in
>>> patch 5, atan(0, -abs(x)) would still be a problem if rcp(0) produces
>>> undefined results.  It looks like
>>> tests/shaders/glsl-fs-atan-2.shader_test should hit that case.  Anyone
>>> have r300 or r400 hardware to test that?
>>>
>>> This patch doesn't affect that, and, even with the "unspecified result"
>>> rule, it's clearly correct.  This patch is
>>>
>>> Reviewed-by: Ian Romanick <ian.d.romanick at intel.com>
>>>
>>>>> That said, this matches NIR, and it's probably fine.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170126/695108c8/attachment.sig>