[Mesa-dev] [PATCH 2/8] glsl: Fix constant evaluation of the rcp op.
Ian Romanick
idr at freedesktop.org
Thu Jan 26 22:11:26 UTC 2017
On 01/26/2017 12:20 PM, Francisco Jerez wrote:
> Ian Romanick <idr at freedesktop.org> writes:
>
>> On 01/25/2017 10:53 AM, Francisco Jerez wrote:
>>> Hi Ian, and thank you for your comments,
>>>
>>> Ian Romanick <idr at freedesktop.org> writes:
>>>
>>>> On 01/24/2017 03:26 PM, Francisco Jerez wrote:
>>>>> Will avoid a regression in a future commit that introduces some
>>>>> additional rcp operations.
>>>>
>>>> When I converted GLSL IR to ir_expression_operation.py, I was careful to
>>>> keep all the expressions the same. rcp and div had these weird guards.
>>>> GLSL doesn't require that NaN be generated, and quite a few old GPUs
>>>> don't. If the atan2 implementation depends on NaN being generated by
>>>> rcp, it may have problems on i915, r300, and similar GPUs. I don't know
>>>> what they generate, but it's not NaN and it's probably not 0.0.
>>>
>>> The atan2 implementation from patch 5 doesn't rely on NaNs being
>>> generated, but it does rely on the reciprocal operation handling zero
>>> and infinity correctly as specified by GLSL for the division operation.
>>
>> Okay. That is the problem on older GPUs that I was referring.
>> Specifically, all versions of the GLSL prior to 4.40 (!) say:
>>
>> Similarly, treatment of conditions such as divide by 0 may lead to
>> an unspecified result, but in no case should such a condition lead
>> to the interruption or termination of processing.
>
> I cannot find this paragraph in any of the GLSL 4.1+ specs.
Somehow I botched my search. GLSL 4.10 was the first version to drop
this in favor of, "Dividing by 0 results in the appropriately signed
IEEE Inf."
>> I believe that DX11 requires the GLSL 4.40+ behavior, so all even
>> somewhat modern devices should just work. It's all the pre-DX11
>> hardware that's might be a problem.
>
> Actually I don't think PATCH 5 necessarily cares about infinities not
> getting generated -- The only requirement is for rcp(0) to return a
> fairly large value in absolute value (the larger the more accurate the
> result will be), even the sign of the result is pretty much irrelevant.
I expect that's what non-Inf GPUs do / did... generate float MAXVAL.
Now that I'm understanding (and remembering) all the issues better, I'm
a bit less worried. As I mentioned in the previous message, I'd like to
see someone test this on r300... I think I still have one somewhere, but
I won't be able to test it for at least two weeks.
> That said there's a relatively straightforward change we could apply to
> PATCH 5 and 6 in order to make the calculation more robust against
> division by zero (it will probably make this patch unnecessary to avoid
> piglit regressions but I think we want it anyway because of GLSL 4.1+):
Yes... I believe we do need this patch for GLSL 4.10 correctness.
> We could leverage the coordinate rotation to turn (y, 0) into (0, y),
> avoiding division by zero along the whole vertical line -- The only
> remaining case where we could potentially divide by zero is along the
> left y=0 half-line, but the function jumps from -π to π along that line
> so returning an undefined value within [-π, π] for y=0 and x < 0 is very
> unlikely to hurt, because AFAICT all GLSL versions lacking well-defined
> divide by zero didn't require the implementation to represent the sign
> of zero consistently either, so the shader is unlikely to be able to
> generate a signed zero value accurately enough to notice the problem...
>
>> Now... talking to Jason just now, he reminded me that the spec also says
>> the following about built-in functions:
>>
>> Function parameters specified as angle are assumed to be in units
>> of radians. In no case will any of these functions result in a
>> divide by zero error. If the divisor of a ratio is 0, then results
>> will be undefined.
>>
> Heh... That could be interpreted as if atan2(y, 0) is undefined which
> would make this discussion moot -- I'll send a v2 of PATCH 5 and 6
> anyway since it should be easy enough to get right.
I think it allows atan2(y, 0) to have an undefined result, but atan2(0,
-abs(x)) should still produce 0. Based on my reading of patch 5, those
inputs would lead to rcp(0).
>> We may be fine even on old, clunky hardware. Looking at the code in
>> patch 5, atan(0, -abs(x)) would still be a problem if rcp(0) produces
>> undefined results. It looks like
>> tests/shaders/glsl-fs-atan-2.shader_test should hit that case. Anyone
>> have r300 or r400 hardware to test that?
>>
>> This patch doesn't affect that, and, even with the "unspecified result"
>> rule, it's clearly correct. This patch is
>>
>> Reviewed-by: Ian Romanick <ian.d.romanick at intel.com>
>>
>>>> That said, this matches NIR, and it's probably fine.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170126/14584ffa/attachment-0001.sig>
More information about the mesa-dev
mailing list