[Mesa-dev] [PATCH 0/3] tgsi, radeonsi: add CANON opcode for float canonicalization

Mon Sep 18 17:11:44 UTC 2017

Am 18.09.2017 um 17:36 schrieb Nicolai Hähnle:
> On 18.09.2017 17:02, Roland Scheidegger wrote:
>> This looks like a horrendous solution which will break the world - well
>> for us :-). Because integers simply will cease to work, always flushed
>> to zero (bye bye loop counter...).
>> The reason is that when you translate from something with a untyped
>> register file to something typed, the obvious solution is to store
>> everything as floats, and cast to int/uint as needed (if you'd translate
>> from tgsi back to glsl, you'd probably do it that way as well).
>> Hence, you must not flush denorms on float to int/uint casts - which,
>> btw, is also illegal by glsl as far as I can tell ("Returns a signed or
>> unsigned integer value representing the encoding of a floating-point
>> value. The floating-point value's bit-level representation is
>> preserved.")
> 
> How could you possibly know that the value was a denorm to begin with?
> "Any denormalized value input into a shader or potentially generated by
> any operation in a shader can be flushed to 0.", which presumably
> includes intBitsToFloat :-)
I guess the answer is we sort of know the gpus and their drivers won't
do anything crazy :-). In the end, they'll all use untyped registers,
and they do need to work with dx10 in any case, so we just rely on this
(in particular the bitcasts being no-ops)... Of course, there might be
issues with this since it's not quite guaranteed by glsl (for instance,
overzealous optimization using value range tracking could detect some
integer has to be a negative small number, and upon using i2f cast which
would yield a NaN just make everything depending on this result
undefined, even if that value is only used again by a f2i cast).
GPUs generally don't do any float "data conversion" when just moving
values around, whether it might be allowed by gl or not.

> 
> 
>> (As a side note, for the same reasons we rely on i2f/u2f "doing the
>> right thing", not messing with bits, albeit this one isn't guaranteed by
>> glsl. But I'm quite sure we're not the only ones relying on this, and
>> this is quite common.)
> 
> Yeah, fair enough. You've convinced me not to take this approach.
> 
> We could just flush to zero after min/max. Although this should lead to
> a different result for min() vs. open-coding the comparison and select.
> That feels pretty dirty as well...
> 
> 
>> I don't know what a proper solution would look like though. FWIW d3d10
>> permits this min/max behavior (the comparison must use denorm flush, but
>> the result may be denorm flushed or not, so it's completely ok if you
>> get the "wrong" result. And given the way glsl is specced wrt denorms
>> (read: mostly undefined) are you sure it's actually illegal there?
> 
> It's one of those borderline cases for GLSL. The GLSL language is
> basically what I quoted above, which makes our behavior slightly icky
> because we flush the input to zero in one place but not another.
> 
> The GLSL ES 3.10 spec has this bit though:
> 
> "Should subnormal numbers (also known as 'denorms') be supported?
> 
> RESOLUTION: No, subnormal numbers maybe flushed to zero at any time."
> 
> ... which gives much more leeway. Together with the D3D10 behavior, this
> is a good argument for changing the test. I'm going to look into that.
One reasoning could also be that it doesn't really matter which number
you pick in nearly all cases for min/max. Because if you do some
arithmetic with the result, it will get denorm-flushed then (as long as
you consistently denorm-flush everything). Exceptions to this are of
course if you do non-float arithmetic on them using bitcasts but I can't
imagine anyone really expecting the "correct" result there - especially
since even if you think picking either denorm isn't correct by glsl
rules, the result of that can still legally differ to be either a 0 or a
denorm. Another exception would be shader export, if you don't
denorm-flush there (or other means to make that value visible outside
the shader, e.g. image store).
I guess that if you leave bitcasts alone and only do it on stores, that
wouldn't really cause problems. Albeit possibly it could cause similar
issues when you'd try to convert d3d11 assembly (because it least some
store instructions are really untyped and will need memcpy-like semantics).

Roland


> 
> Cheers,
> Nicolai
> 
>>
>> Roland
>>
>>
>> Am 18.09.2017 um 11:44 schrieb Nicolai Hähnle:
>>> Hi all,
>>>
>>> This series fixes an issue we have on GCN due to the arguably
>>> inconsistent
>>> handling of denormal floating point values. We disable denormals for
>>> 32-bit
>>> floating point numbers, because enabling them would cost a lot of
>>> performance. By and large, GCN floating point instructions flush
>>> denormals
>>> to zero. An annoying exception are v_min_f32 and v_max_f32, whose
>>> behavior
>>> can be described as:
>>>
>>>    cc = compare(flushed(src0), flushed(src1))
>>>    dst = cc ? src0 : src1;
>>>
>>> When both sources are denormals (or 0.0), the comparison will
>>> consider the
>>> numbers equal. The instruction will return one of them *without*
>>> flushing
>>> to zero, which means that we might end up with the wrong one
>>> (depending on
>>> how the operands are ordered).
>>>
>>> The way this series fixes the issue is to explicitly cause flush-to-zero
>>> behavior whenever the application may be able to observe a
>>> difference, which
>>> is when floats are bit-cast to integers or stored in memory (SSBOs and
>>> 32f images, since those images can be bitcast r32i/ui).
>>>
>>> Since the backend cannot reliably detect bitcasts, we add a new CANON
>>> opcode
>>> which is modeled after LLVM's llvm.canonicalize.* intrinsic. If the
>>> driver
>>> requests it, CANON is emitted before bitcasts and stores, which means
>>> most
>>> shaders are unaffected (the alternative would be to emit it for bitcasts
>>> *from* integers and for loads, which has a much wider impact).
>>>
>>> For GCN, LLVM turns the canonicalize into a multiplication by 1.0. We
>>> may
>>> want to teach LLVM about cases where the additional instruction is
>>> unnecessary.
>>>
>>> Please review!
>>> Thanks,
>>> Nicolai
>>> -- 
>>>   src/gallium/auxiliary/gallivm/lp_bld_limits.h     |  1 +
>>>   src/gallium/auxiliary/tgsi/tgsi_exec.h            |  1 +
>>>   src/gallium/auxiliary/tgsi/tgsi_info_opcodes.h    |  2 +-
>>>   src/gallium/docs/source/screen.rst                |  2 ++
>>>   src/gallium/docs/source/tgsi.rst                  | 18
>>> ++++++++++++++++++
>>>   src/gallium/drivers/etnaviv/etnaviv_screen.c      |  1 +
>>>   src/gallium/drivers/freedreno/freedreno_screen.c  |  1 +
>>>   src/gallium/drivers/nouveau/nv30/nv30_screen.c    |  2 ++
>>>   src/gallium/drivers/nouveau/nv50/nv50_screen.c    |  1 +
>>>   src/gallium/drivers/nouveau/nvc0/nvc0_screen.c    |  1 +
>>>   src/gallium/drivers/r300/r300_screen.c            |  2 ++
>>>   src/gallium/drivers/r600/r600_pipe.c              |  1 +
>>>   src/gallium/drivers/radeonsi/si_pipe.c            |  1 +
>>>   src/gallium/drivers/radeonsi/si_shader_tgsi_alu.c |  2 ++
>>>   src/gallium/drivers/svga/svga_screen.c            |  3 +++
>>>   src/gallium/drivers/vc4/vc4_screen.c              |  1 +
>>>   src/gallium/drivers/virgl/virgl_screen.c          |  1 +
>>>   src/gallium/include/pipe/p_defines.h              |  1 +
>>>   src/gallium/include/pipe/p_shader_tokens.h        |  2 +-
>>>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp        | 31
>>> +++++++++++++++++++++++++++++--
>>>   20 files changed, 71 insertions(+), 4 deletions(-)
>>>
>>> _______________________________________________
>>> mesa-dev mailing list
>>> mesa-dev at lists.freedesktop.org
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=GL1ia61sD0zG05zPlj1qW9ra1gTZ2KBkTNoSBG1cakM&s=o3M_-RsUz-5prZLmL33iu5AaeEChWHU6kf-FtWMliKU&e=
>>>
>>>
>>
> 
>