[Mesa-dev] [PATCH 2/8] glsl: Fix constant evaluation of the rcp op.

Thu Jan 26 22:11:26 UTC 2017

On 01/26/2017 12:20 PM, Francisco Jerez wrote:
> Ian Romanick <idr at freedesktop.org> writes:
> 
>> On 01/25/2017 10:53 AM, Francisco Jerez wrote:
>>> Hi Ian, and thank you for your comments,
>>>
>>> Ian Romanick <idr at freedesktop.org> writes:
>>>
>>>> On 01/24/2017 03:26 PM, Francisco Jerez wrote:
>>>>> Will avoid a regression in a future commit that introduces some
>>>>> additional rcp operations.
>>>>
>>>> When I converted GLSL IR to ir_expression_operation.py, I was careful to
>>>> keep all the expressions the same.  rcp and div had these weird guards.
>>>> GLSL doesn't require that NaN be generated, and quite a few old GPUs
>>>> don't.  If the atan2 implementation depends on NaN being generated by
>>>> rcp, it may have problems on i915, r300, and similar GPUs.  I don't know
>>>> what they generate, but it's not NaN and it's probably not 0.0.
>>>
>>> The atan2 implementation from patch 5 doesn't rely on NaNs being
>>> generated, but it does rely on the reciprocal operation handling zero
>>> and infinity correctly as specified by GLSL for the division operation.
>>
>> Okay.  That is the problem on older GPUs that I was referring.
>> Specifically, all versions of the GLSL prior to 4.40 (!) say:
>>
>>     Similarly, treatment of conditions such as divide by 0 may lead to
>>     an unspecified result, but in no case should such a condition lead
>>     to the interruption or termination of processing.
> 
> I cannot find this paragraph in any of the GLSL 4.1+ specs.

Somehow I botched my search.  GLSL 4.10 was the first version to drop
this in favor of, "Dividing by 0 results in the appropriately signed
IEEE Inf."

>> I believe that DX11 requires the GLSL 4.40+ behavior, so all even
>> somewhat modern devices should just work.  It's all the pre-DX11
>> hardware that's might be a problem.
> 
> Actually I don't think PATCH 5 necessarily cares about infinities not
> getting generated -- The only requirement is for rcp(0) to return a
> fairly large value in absolute value (the larger the more accurate the
> result will be), even the sign of the result is pretty much irrelevant.

I expect that's what non-Inf GPUs do / did... generate float MAXVAL.
Now that I'm understanding (and remembering) all the issues better, I'm
a bit less worried.  As I mentioned in the previous message, I'd like to
see someone test this on r300... I think I still have one somewhere, but
I won't be able to test it for at least two weeks.

> That said there's a relatively straightforward change we could apply to
> PATCH 5 and 6 in order to make the calculation more robust against
> division by zero (it will probably make this patch unnecessary to avoid
> piglit regressions but I think we want it anyway because of GLSL 4.1+):

Yes... I believe we do need this patch for GLSL 4.10 correctness.

> We could leverage the coordinate rotation to turn (y, 0) into (0, y),
> avoiding division by zero along the whole vertical line -- The only
> remaining case where we could potentially divide by zero is along the
> left y=0 half-line, but the function jumps from -π to π along that line
> so returning an undefined value within [-π, π] for y=0 and x < 0 is very
> unlikely to hurt, because AFAICT all GLSL versions lacking well-defined
> divide by zero didn't require the implementation to represent the sign
> of zero consistently either, so the shader is unlikely to be able to
> generate a signed zero value accurately enough to notice the problem...
> 
>> Now... talking to Jason just now, he reminded me that the spec also says
>> the following about built-in functions:
>>
>>     Function parameters specified as angle are assumed to be in units
>>     of radians. In no case will any of these functions result in a
>>     divide by zero error. If the divisor of a ratio is 0, then results
>>     will be undefined.
>>
> Heh... That could be interpreted as if atan2(y, 0) is undefined which
> would make this discussion moot -- I'll send a v2 of PATCH 5 and 6
> anyway since it should be easy enough to get right.

I think it allows atan2(y, 0) to have an undefined result, but atan2(0,
-abs(x)) should still produce 0.  Based on my reading of patch 5, those
inputs would lead to rcp(0).

>> We may be fine even on old, clunky hardware.  Looking at the code in
>> patch 5, atan(0, -abs(x)) would still be a problem if rcp(0) produces
>> undefined results.  It looks like
>> tests/shaders/glsl-fs-atan-2.shader_test should hit that case.  Anyone
>> have r300 or r400 hardware to test that?
>>
>> This patch doesn't affect that, and, even with the "unspecified result"
>> rule, it's clearly correct.  This patch is
>>
>> Reviewed-by: Ian Romanick <ian.d.romanick at intel.com>
>>
>>>> That said, this matches NIR, and it's probably fine.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170126/14584ffa/attachment-0001.sig>