[Mesa-dev] [PATCH 1/2] gallium: add TGSI_OPCODE_FMA

Roland Scheidegger sroland at vmware.com
Mon Mar 2 08:48:59 PST 2015


Am 02.03.2015 um 17:12 schrieb Marek Olšák:
> On Mon, Mar 2, 2015 at 4:55 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>> Am 02.03.2015 um 12:52 schrieb Marek Olšák:
>>> From: Marek Olšák <marek.olsak at amd.com>
>>>
>>> Needed by ARB_gpu_shader5.
>>> ---
>>>  src/gallium/auxiliary/gallivm/lp_bld_limits.h    |  1 +
>>>  src/gallium/auxiliary/tgsi/tgsi_exec.h           |  1 +
>>>  src/gallium/auxiliary/tgsi/tgsi_info.c           |  2 +-
>>>  src/gallium/auxiliary/tgsi/tgsi_util.c           |  1 +
>>>  src/gallium/docs/source/screen.rst               |  1 +
>>>  src/gallium/docs/source/tgsi.rst                 | 23 +++++++++++++++++++++++
>>>  src/gallium/drivers/freedreno/freedreno_screen.c |  1 +
>>>  src/gallium/drivers/i915/i915_screen.c           |  1 +
>>>  src/gallium/drivers/nouveau/nv30/nv30_screen.c   |  2 ++
>>>  src/gallium/drivers/nouveau/nv50/nv50_screen.c   |  1 +
>>>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c   |  1 +
>>>  src/gallium/drivers/r300/r300_screen.c           |  2 ++
>>>  src/gallium/drivers/r600/r600_pipe.c             |  1 +
>>>  src/gallium/drivers/r600/r600_shader.c           |  6 +++---
>>>  src/gallium/drivers/radeonsi/si_pipe.c           |  1 +
>>>  src/gallium/drivers/svga/svga_screen.c           |  2 ++
>>>  src/gallium/drivers/vc4/vc4_screen.c             |  1 +
>>>  src/gallium/include/pipe/p_defines.h             |  1 +
>>>  src/gallium/include/pipe/p_shader_tokens.h       |  2 +-
>>>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp       | 12 ++++++++----
>>>  20 files changed, 54 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_limits.h b/src/gallium/auxiliary/gallivm/lp_bld_limits.h
>>> index 2962360..c5c51c1 100644
>>> --- a/src/gallium/auxiliary/gallivm/lp_bld_limits.h
>>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_limits.h
>>> @@ -129,6 +129,7 @@ gallivm_get_shader_param(enum pipe_shader_cap param)
>>>     case PIPE_SHADER_CAP_DOUBLES:
>>>     case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED:
>>>     case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED:
>>> +   case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED:
>>>        return 0;
>>>     }
>>>     /* if we get here, we missed a shader cap above (and should have seen
>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.h b/src/gallium/auxiliary/tgsi/tgsi_exec.h
>>> index 609c81b..0e59b88 100644
>>> --- a/src/gallium/auxiliary/tgsi/tgsi_exec.h
>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.h
>>> @@ -459,6 +459,7 @@ tgsi_exec_get_shader_param(enum pipe_shader_cap param)
>>>     case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED:
>>>        return 1;
>>>     case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED:
>>> +   case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED:
>>>        return 0;
>>>     }
>>>     /* if we get here, we missed a shader cap above (and should have seen
>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c
>>> index 4d838fd..e6e0a60 100644
>>> --- a/src/gallium/auxiliary/tgsi/tgsi_info.c
>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
>>> @@ -56,7 +56,7 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] =
>>>     { 1, 3, 0, 0, 0, 0, COMP, "MAD", TGSI_OPCODE_MAD },
>>>     { 1, 2, 0, 0, 0, 0, COMP, "SUB", TGSI_OPCODE_SUB },
>>>     { 1, 3, 0, 0, 0, 0, COMP, "LRP", TGSI_OPCODE_LRP },
>>> -   { 0, 0, 0, 0, 0, 0, NONE, "", 19 },      /* removed */
>>> +   { 1, 3, 0, 0, 0, 0, COMP, "FMA", TGSI_OPCODE_FMA },
>>>     { 1, 1, 0, 0, 0, 0, REPL, "SQRT", TGSI_OPCODE_SQRT },
>>>     { 1, 3, 0, 0, 0, 0, REPL, "DP2A", TGSI_OPCODE_DP2A },
>>>     { 0, 0, 0, 0, 0, 0, NONE, "", 22 },      /* removed */
>>> diff --git a/src/gallium/auxiliary/tgsi/tgsi_util.c b/src/gallium/auxiliary/tgsi/tgsi_util.c
>>> index d572ff0..e5b8427 100644
>>> --- a/src/gallium/auxiliary/tgsi/tgsi_util.c
>>> +++ b/src/gallium/auxiliary/tgsi/tgsi_util.c
>>> @@ -193,6 +193,7 @@ tgsi_util_get_inst_usage_mask(const struct tgsi_full_instruction *inst,
>>>     case TGSI_OPCODE_MAD:
>>>     case TGSI_OPCODE_SUB:
>>>     case TGSI_OPCODE_LRP:
>>> +   case TGSI_OPCODE_FMA:
>>>     case TGSI_OPCODE_FRC:
>>>     case TGSI_OPCODE_CEIL:
>>>     case TGSI_OPCODE_CLAMP:
>>> diff --git a/src/gallium/docs/source/screen.rst b/src/gallium/docs/source/screen.rst
>>> index e0fd1a2..dd7a012 100644
>>> --- a/src/gallium/docs/source/screen.rst
>>> +++ b/src/gallium/docs/source/screen.rst
>>> @@ -336,6 +336,7 @@ to be 0.
>>>    is supported. If it is, DTRUNC/DCEIL/DFLR/DROUND opcodes may be used.
>>>  * ``PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED``: Whether DFRACEXP and
>>>    DLDEXP are supported.
>>> +* ``PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED``: Whether TGSI_OPCODE_FMA is supported.
>>>
>>>
>>>  .. _pipe_compute_cap:
>>> diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst
>>> index b0a975a..6871676 100644
>>> --- a/src/gallium/docs/source/tgsi.rst
>>> +++ b/src/gallium/docs/source/tgsi.rst
>>> @@ -272,6 +272,29 @@ This instruction replicates its result.
>>>    dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
>>>
>>>
>>> +.. opcode:: FMA - Fused Multiply-Add
>>> +
>>> +The results may not be identical to evaluating the expression (a*b)+c,
>>> +because the computation may be performed in a single operation with
>>> +intermediate precision different from that used to compute a non-FMA
>>> +expression.
>>> +
>>> +The results of FMA are guaranteed to be invariant given fixed inputs
>>> +<src0>, <src1>, and <src2>. That means the implementation is not allowed
>>> +to expand the opcode to MUL+ADD and apply algebraic optimizations affecting
>>> +the floating-point results.
>> I think these paragraphs are slightly confusing,  especially "because
>> the computation may be performed in a single operation with intermediate
>> precision different from that used to compute a non-FMA expression".
>> Would be more obvious to say something along the lines that (in contrast
>> to MAD) no intermediate rounding is happening. Otherwise this sounds
>> like it would be allowed to do some sort of intermediate rounding, as
>> long as the intermediate precision is larger than what you'd get by
>> separate mul+mad, which I don't think is what you wanted.
> 
> Well, it's partially copied from the extension spec and it just states
> that the intermediate precision is different. I guess the main point
> is that the result is invariant with regard to inputs.
Hmm frankly I find the wording confusing, spec or not. Makes me think
though it was worded on purpose like that, maybe not quite all chips can
actually guarantee "correct" fma results (correct as in opencl fma
specification which is a lot better imho ("Returns the correctly rounded
floating-point representation of the sum of c with the infinitely
precise product of a and b. Rounding of intermediate products shall not
occur. Edge case behavior is per the IEEE 754-2008 standard.")
glsl also has a quite different wording but there the meaning is
somewhat different - https://www.opengl.org/sdk/docs/man/html/fma.xhtml.
In other words, if you don't have precise attribute, it's just the same
as a MAD. With precise though it seems to imply I think (because it's
considered a single operation, not "may be performed in a single
operation" like in arb_gpu_shader5) that there's no intermediate
rounding, just as what opencl expects.

Roland



> 
>> (FWIW I don't think we really clarified MAD wrt intermediate rounding, I
>> particularly like opencl convention that FMA = no rounding, MUL + ADD =
>> rounding, MAD = do whatever is fastest (because optimizing backends can
>> fuse back MUL+ADD back into a MAD themselves if the hw can do that with
>> intermediate rounding) but traditionally of course MAD always did
>> intermediate rounding.)
> 
> Also MAD doesn't support denormals (on radeon), while FMA does. IIRC,
> FMA is the slower one of the two.
> 

Interesting. I thought most gpus wouldn't handle denorms at all for
single precision floats for all operations, hence there wouldn't be much
point supporting it for just fma. Or can you enable that explicitly for
most operations just not for MAD?

Roland




More information about the mesa-dev mailing list