[Mesa-dev] [PATCH 0/3] tgsi, radeonsi: add CANON opcode for float canonicalization
Nicolai Hähnle
nhaehnle at gmail.com
Mon Sep 18 15:36:18 UTC 2017
On 18.09.2017 17:02, Roland Scheidegger wrote:
> This looks like a horrendous solution which will break the world - well
> for us :-). Because integers simply will cease to work, always flushed
> to zero (bye bye loop counter...).
> The reason is that when you translate from something with a untyped
> register file to something typed, the obvious solution is to store
> everything as floats, and cast to int/uint as needed (if you'd translate
> from tgsi back to glsl, you'd probably do it that way as well).
> Hence, you must not flush denorms on float to int/uint casts - which,
> btw, is also illegal by glsl as far as I can tell ("Returns a signed or
> unsigned integer value representing the encoding of a floating-point
> value. The floating-point value's bit-level representation is preserved.")
How could you possibly know that the value was a denorm to begin with?
"Any denormalized value input into a shader or potentially generated by
any operation in a shader can be flushed to 0.", which presumably
includes intBitsToFloat :-)
> (As a side note, for the same reasons we rely on i2f/u2f "doing the
> right thing", not messing with bits, albeit this one isn't guaranteed by
> glsl. But I'm quite sure we're not the only ones relying on this, and
> this is quite common.)
Yeah, fair enough. You've convinced me not to take this approach.
We could just flush to zero after min/max. Although this should lead to
a different result for min() vs. open-coding the comparison and select.
That feels pretty dirty as well...
> I don't know what a proper solution would look like though. FWIW d3d10
> permits this min/max behavior (the comparison must use denorm flush, but
> the result may be denorm flushed or not, so it's completely ok if you
> get the "wrong" result. And given the way glsl is specced wrt denorms
> (read: mostly undefined) are you sure it's actually illegal there?
It's one of those borderline cases for GLSL. The GLSL language is
basically what I quoted above, which makes our behavior slightly icky
because we flush the input to zero in one place but not another.
The GLSL ES 3.10 spec has this bit though:
"Should subnormal numbers (also known as 'denorms') be supported?
RESOLUTION: No, subnormal numbers maybe flushed to zero at any time."
... which gives much more leeway. Together with the D3D10 behavior, this
is a good argument for changing the test. I'm going to look into that.
Cheers,
Nicolai
>
> Roland
>
>
> Am 18.09.2017 um 11:44 schrieb Nicolai Hähnle:
>> Hi all,
>>
>> This series fixes an issue we have on GCN due to the arguably inconsistent
>> handling of denormal floating point values. We disable denormals for 32-bit
>> floating point numbers, because enabling them would cost a lot of
>> performance. By and large, GCN floating point instructions flush denormals
>> to zero. An annoying exception are v_min_f32 and v_max_f32, whose behavior
>> can be described as:
>>
>> cc = compare(flushed(src0), flushed(src1))
>> dst = cc ? src0 : src1;
>>
>> When both sources are denormals (or 0.0), the comparison will consider the
>> numbers equal. The instruction will return one of them *without* flushing
>> to zero, which means that we might end up with the wrong one (depending on
>> how the operands are ordered).
>>
>> The way this series fixes the issue is to explicitly cause flush-to-zero
>> behavior whenever the application may be able to observe a difference, which
>> is when floats are bit-cast to integers or stored in memory (SSBOs and
>> 32f images, since those images can be bitcast r32i/ui).
>>
>> Since the backend cannot reliably detect bitcasts, we add a new CANON opcode
>> which is modeled after LLVM's llvm.canonicalize.* intrinsic. If the driver
>> requests it, CANON is emitted before bitcasts and stores, which means most
>> shaders are unaffected (the alternative would be to emit it for bitcasts
>> *from* integers and for loads, which has a much wider impact).
>>
>> For GCN, LLVM turns the canonicalize into a multiplication by 1.0. We may
>> want to teach LLVM about cases where the additional instruction is
>> unnecessary.
>>
>> Please review!
>> Thanks,
>> Nicolai
>> --
>> src/gallium/auxiliary/gallivm/lp_bld_limits.h | 1 +
>> src/gallium/auxiliary/tgsi/tgsi_exec.h | 1 +
>> src/gallium/auxiliary/tgsi/tgsi_info_opcodes.h | 2 +-
>> src/gallium/docs/source/screen.rst | 2 ++
>> src/gallium/docs/source/tgsi.rst | 18 ++++++++++++++++++
>> src/gallium/drivers/etnaviv/etnaviv_screen.c | 1 +
>> src/gallium/drivers/freedreno/freedreno_screen.c | 1 +
>> src/gallium/drivers/nouveau/nv30/nv30_screen.c | 2 ++
>> src/gallium/drivers/nouveau/nv50/nv50_screen.c | 1 +
>> src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 1 +
>> src/gallium/drivers/r300/r300_screen.c | 2 ++
>> src/gallium/drivers/r600/r600_pipe.c | 1 +
>> src/gallium/drivers/radeonsi/si_pipe.c | 1 +
>> src/gallium/drivers/radeonsi/si_shader_tgsi_alu.c | 2 ++
>> src/gallium/drivers/svga/svga_screen.c | 3 +++
>> src/gallium/drivers/vc4/vc4_screen.c | 1 +
>> src/gallium/drivers/virgl/virgl_screen.c | 1 +
>> src/gallium/include/pipe/p_defines.h | 1 +
>> src/gallium/include/pipe/p_shader_tokens.h | 2 +-
>> src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 31 +++++++++++++++++++++++++++++--
>> 20 files changed, 71 insertions(+), 4 deletions(-)
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=GL1ia61sD0zG05zPlj1qW9ra1gTZ2KBkTNoSBG1cakM&s=o3M_-RsUz-5prZLmL33iu5AaeEChWHU6kf-FtWMliKU&e=
>>
>
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
More information about the mesa-dev
mailing list