[Mesa-dev] [PATCH 10/12] st/nine: Remove usage of SQRT in ff code
Ilia Mirkin
imirkin at alum.mit.edu
Sun Feb 7 23:33:36 UTC 2016
On Sun, Feb 7, 2016 at 6:26 PM, Axel Davy <axel.davy at ens.fr> wrote:
> On 08/02/2016 00:21, Ilia Mirkin wrote:
>>
>> On Sun, Feb 7, 2016 at 6:13 PM, Axel Davy <axel.davy at ens.fr> wrote:
>>>
>>> SQRT is not supported everywhere, so replace
>>> it by RSQ + RCP
>>>
>>> Signed-off-by: Axel Davy <axel.davy at ens.fr>
>>> ---
>>> src/gallium/state_trackers/nine/nine_ff.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/gallium/state_trackers/nine/nine_ff.c
>>> b/src/gallium/state_trackers/nine/nine_ff.c
>>> index a5466a7..894fc63 100644
>>> --- a/src/gallium/state_trackers/nine/nine_ff.c
>>> +++ b/src/gallium/state_trackers/nine/nine_ff.c
>>> @@ -563,7 +563,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct
>>> vs_build_ctx *vs)
>>> struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27);
>>>
>>> ureg_DP3(ureg, tmp_x, ureg_src(r[1]), ureg_src(r[1]));
>>> - ureg_SQRT(ureg, tmp_y, _X(tmp));
>>> + ureg_RSQ(ureg, tmp_y, _X(tmp));
>>> + ureg_RCP(ureg, tmp_y, _Y(tmp));
>>
>> I'd recommend doing
>>
>> ureg_MUL(ureg, tmp_y, _Y(tmp), _X(tmp))
>>
>> instead. That should be (a) more numerically stable (rcp doesn't have
>> great precision), and (b) not blow up for 0.
>
> Ok for the precision, but I'm not sure for 0
>
> With the mul version, with 0, it ends up computing inf * 0 = NaN,
> whereas with the rcp version, it does 1/inf == 0 (as far as I know),
> which is the expected result.
Hmmm... not sure what RSQ(0) returns actually. I assumed it was NaN.
What you really want is a "flush nan to 0" option on the mul like nvc0
has, but there's no way to express that in TGSI.
Perhaps you can keep the SQRT if PIPE_CAP_TGSI_SQRT is exposed, and
otherwise do the MUL or the RCP. FWIW this is what glsl_to_tgsi does:
emit_scalar(ir, TGSI_OPCODE_RSQ, result_dst, op[0]);
emit_asm(ir, TGSI_OPCODE_MUL, result_dst, result_src, op[0]);
/* For incoming channels <= 0, set the result to 0. */
op[0].negate = ~op[0].negate;
emit_asm(ir, TGSI_OPCODE_CMP, result_dst,
op[0], result_src, st_src_reg_for_float(0.0));
More information about the mesa-dev
mailing list