[Mesa-dev] [PATCH 0/3] tgsi, radeonsi: add CANON opcode for float canonicalization

Mon Sep 18 15:36:18 UTC 2017

On 18.09.2017 17:02, Roland Scheidegger wrote:
> This looks like a horrendous solution which will break the world - well
> for us :-). Because integers simply will cease to work, always flushed
> to zero (bye bye loop counter...).
> The reason is that when you translate from something with a untyped
> register file to something typed, the obvious solution is to store
> everything as floats, and cast to int/uint as needed (if you'd translate
> from tgsi back to glsl, you'd probably do it that way as well).
> Hence, you must not flush denorms on float to int/uint casts - which,
> btw, is also illegal by glsl as far as I can tell ("Returns a signed or
> unsigned integer value representing the encoding of a floating-point
> value. The floating-point value's bit-level representation is preserved.")

How could you possibly know that the value was a denorm to begin with? 
"Any denormalized value input into a shader or potentially generated by 
any operation in a shader can be flushed to 0.", which presumably 
includes intBitsToFloat :-)

> (As a side note, for the same reasons we rely on i2f/u2f "doing the
> right thing", not messing with bits, albeit this one isn't guaranteed by
> glsl. But I'm quite sure we're not the only ones relying on this, and
> this is quite common.)

Yeah, fair enough. You've convinced me not to take this approach.

We could just flush to zero after min/max. Although this should lead to 
a different result for min() vs. open-coding the comparison and select. 
That feels pretty dirty as well...

> I don't know what a proper solution would look like though. FWIW d3d10
> permits this min/max behavior (the comparison must use denorm flush, but
> the result may be denorm flushed or not, so it's completely ok if you
> get the "wrong" result. And given the way glsl is specced wrt denorms
> (read: mostly undefined) are you sure it's actually illegal there?

It's one of those borderline cases for GLSL. The GLSL language is 
basically what I quoted above, which makes our behavior slightly icky 
because we flush the input to zero in one place but not another.

The GLSL ES 3.10 spec has this bit though:

"Should subnormal numbers (also known as 'denorms') be supported?

RESOLUTION: No, subnormal numbers maybe flushed to zero at any time."

... which gives much more leeway. Together with the D3D10 behavior, this 
is a good argument for changing the test. I'm going to look into that.

Cheers,
Nicolai

> 
> Roland
> 
> 
> Am 18.09.2017 um 11:44 schrieb Nicolai Hähnle:
>> Hi all,
>>
>> This series fixes an issue we have on GCN due to the arguably inconsistent
>> handling of denormal floating point values. We disable denormals for 32-bit
>> floating point numbers, because enabling them would cost a lot of
>> performance. By and large, GCN floating point instructions flush denormals
>> to zero. An annoying exception are v_min_f32 and v_max_f32, whose behavior
>> can be described as:
>>
>>    cc = compare(flushed(src0), flushed(src1))
>>    dst = cc ? src0 : src1;
>>
>> When both sources are denormals (or 0.0), the comparison will consider the
>> numbers equal. The instruction will return one of them *without* flushing
>> to zero, which means that we might end up with the wrong one (depending on
>> how the operands are ordered).
>>
>> The way this series fixes the issue is to explicitly cause flush-to-zero
>> behavior whenever the application may be able to observe a difference, which
>> is when floats are bit-cast to integers or stored in memory (SSBOs and
>> 32f images, since those images can be bitcast r32i/ui).
>>
>> Since the backend cannot reliably detect bitcasts, we add a new CANON opcode
>> which is modeled after LLVM's llvm.canonicalize.* intrinsic. If the driver
>> requests it, CANON is emitted before bitcasts and stores, which means most
>> shaders are unaffected (the alternative would be to emit it for bitcasts
>> *from* integers and for loads, which has a much wider impact).
>>
>> For GCN, LLVM turns the canonicalize into a multiplication by 1.0. We may
>> want to teach LLVM about cases where the additional instruction is
>> unnecessary.
>>
>> Please review!
>> Thanks,
>> Nicolai
>> --
>>   src/gallium/auxiliary/gallivm/lp_bld_limits.h     |  1 +
>>   src/gallium/auxiliary/tgsi/tgsi_exec.h            |  1 +
>>   src/gallium/auxiliary/tgsi/tgsi_info_opcodes.h    |  2 +-
>>   src/gallium/docs/source/screen.rst                |  2 ++
>>   src/gallium/docs/source/tgsi.rst                  | 18 ++++++++++++++++++
>>   src/gallium/drivers/etnaviv/etnaviv_screen.c      |  1 +
>>   src/gallium/drivers/freedreno/freedreno_screen.c  |  1 +
>>   src/gallium/drivers/nouveau/nv30/nv30_screen.c    |  2 ++
>>   src/gallium/drivers/nouveau/nv50/nv50_screen.c    |  1 +
>>   src/gallium/drivers/nouveau/nvc0/nvc0_screen.c    |  1 +
>>   src/gallium/drivers/r300/r300_screen.c            |  2 ++
>>   src/gallium/drivers/r600/r600_pipe.c              |  1 +
>>   src/gallium/drivers/radeonsi/si_pipe.c            |  1 +
>>   src/gallium/drivers/radeonsi/si_shader_tgsi_alu.c |  2 ++
>>   src/gallium/drivers/svga/svga_screen.c            |  3 +++
>>   src/gallium/drivers/vc4/vc4_screen.c              |  1 +
>>   src/gallium/drivers/virgl/virgl_screen.c          |  1 +
>>   src/gallium/include/pipe/p_defines.h              |  1 +
>>   src/gallium/include/pipe/p_shader_tokens.h        |  2 +-
>>   src/mesa/state_tracker/st_glsl_to_tgsi.cpp        | 31 +++++++++++++++++++++++++++++--
>>   20 files changed, 71 insertions(+), 4 deletions(-)
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=GL1ia61sD0zG05zPlj1qW9ra1gTZ2KBkTNoSBG1cakM&s=o3M_-RsUz-5prZLmL33iu5AaeEChWHU6kf-FtWMliKU&e=
>>
> 

-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.