[Mesa-dev] [PATCH] ac/nir: use llvm fma intrinsic if nir instruction is exact.

Fri Oct 6 15:43:33 UTC 2017

Am 06.10.2017 um 11:29 schrieb Alex Smith:
> On 6 October 2017 at 03:39, Dave Airlie <airlied at gmail.com
> <mailto:airlied at gmail.com>> wrote:
> 
>     On 6 October 2017 at 12:31, Marek Olšák <maraeo at gmail.com
>     <mailto:maraeo at gmail.com>> wrote:
>     > On Fri, Oct 6, 2017 at 4:10 AM, Connor Abbott <cwabbott0 at gmail.com <mailto:cwabbott0 at gmail.com>> wrote:
>     >> On Thu, Oct 5, 2017 at 10:08 PM, Marek Olšák <maraeo at gmail.com <mailto:maraeo at gmail.com>> wrote:
>     >>> On Fri, Oct 6, 2017 at 3:50 AM, Connor Abbott <cwabbott0 at gmail.com <mailto:cwabbott0 at gmail.com>> wrote:
>     >>>> Why? While it might technically be legal, always generating an unfused
>     >>>> mul+add when the user explicitly requested fma() seems harsh...
>     >>>
>     >>> It's slow on some chips. It doesn't need any other reason.
>     >>>
>     >>> Marek
>     >>
>     >> Presumably, if the developer asked for fma, then they don't care how
>     >> fast or slow it is...
>     >
>     > Feral asked for fma. They care. This debate is pointless. We just
>     > won't use fma by default. Period.
> 
>     They didn't ask for it with precise precision. I'm assuming if
>     someone wants
>     fma with precise precision we should give it to them. Like at least
>     the fma manpage states.
> 
>     https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/fma.xhtml
>     <https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/fma.xhtml>
> 
> 
> Some of our older games (e.g. Tomb Raider) do actually request precise
> (based on what the original D3D shader asks for), so changing the
> behaviour on GL to use the proper fma would likely regress performance
> on those.
> 
> D3D's mad (which we've been using fma to implement) is similarly vague
> as GLSL about what the actual precision requirements are with
> precise: https://msdn.microsoft.com/en-us/library/windows/desktop/ff471418(v=vs.85).aspx
> <https://msdn.microsoft.com/en-us/library/windows/desktop/ff471418%28v=vs.85%29.aspx>

Of course, but d3d mad is a "traditional" multiply/add which predates
fully programmable shader pipelines even, and back in the days gpus
actually used fixed function alus where talking about "fused" didn't
even make sense.
I think the problem here is just that glsl never had such a mad -
because being based on textual representation, mul and add use
operators, and a mad function just would look ugly (and generally with
glsl lax requirements, noone ever would care if you actually fuse muls
and adds).
But now with precise, you cannot fuse such separate muls and adds
freely, because the compiler can't guarantee you it will always fuse
them (and it would be shady in any case). Thus using separate muls and
adds would penalize gpus which can only do fused mul+add in a single
step (nvidia IIRC, also x86 avx with fma).
Hence "fma" being added.

I would, however, say that calling this "fma" is a very serious (but
unfixable now) spec bug. Noone ever talks about a "fused multiply add"
when it actually may as well be unfused. This is just confusing as hell.
Call it mad, fmuladd (as llvm does), mfma ("maybe fused"...) or
whatever, but not fma. (fwiw d3d is sane there - single may be fused or
unfused, and it's called mad, with doubles it is guaranteed to always be
fused, and it's called dfma accordingly.)
And fwiw I got confused by this too earlier, thinking it has to be fused
- certainly opencl etc. really want to use a fused one if they use fma.
This also means I was wrong earlier when there were some problems with
fma / mad on nouveau drivers - since fma can apparently be unfused,
there's no point for the mesa state tracker to ever use the tgsi fma
opcode, and it should always use MAD instead as far as I can tell (but
of course setting the precise bit accordingly).

Roland