[Mesa-dev] [PATCH] ac/nir: use llvm fma intrinsic if nir instruction is exact.

Alex Smith asmith at feralinteractive.com
Fri Oct 6 09:29:11 UTC 2017


On 6 October 2017 at 03:39, Dave Airlie <airlied at gmail.com> wrote:

> On 6 October 2017 at 12:31, Marek Olšák <maraeo at gmail.com> wrote:
> > On Fri, Oct 6, 2017 at 4:10 AM, Connor Abbott <cwabbott0 at gmail.com>
> wrote:
> >> On Thu, Oct 5, 2017 at 10:08 PM, Marek Olšák <maraeo at gmail.com> wrote:
> >>> On Fri, Oct 6, 2017 at 3:50 AM, Connor Abbott <cwabbott0 at gmail.com>
> wrote:
> >>>> Why? While it might technically be legal, always generating an unfused
> >>>> mul+add when the user explicitly requested fma() seems harsh...
> >>>
> >>> It's slow on some chips. It doesn't need any other reason.
> >>>
> >>> Marek
> >>
> >> Presumably, if the developer asked for fma, then they don't care how
> >> fast or slow it is...
> >
> > Feral asked for fma. They care. This debate is pointless. We just
> > won't use fma by default. Period.
>
> They didn't ask for it with precise precision. I'm assuming if someone
> wants
> fma with precise precision we should give it to them. Like at least
> the fma manpage states.
>
> https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/fma.xhtml


Some of our older games (e.g. Tomb Raider) do actually request precise
(based on what the original D3D shader asks for), so changing the behaviour
on GL to use the proper fma would likely regress performance on those.

D3D's mad (which we've been using fma to implement) is similarly vague as
GLSL about what the actual precision requirements are with precise:
https://msdn.microsoft.com/en-us/library/windows/desktop/ff471418(v=vs.85).aspx

"If shader authors use the mad instrinsic to calculate a result that the
shader marked as precise, they indicate to the hardware to use any valid
implementation of the mad instruction (fused or not) as long as the
implementation is consistent for all uses of that mad intrinsic in any
shader on that hardware"

>From some quick testing I just did it looks like the AMD D3D driver always
implements mad as v_mac_f32 regardless of whether precise is requested.

So seems like (at least from our perspective!) it's not really an issue to
not actually get a fused op, and clearly hasn't been an issue since
radeonsi never gives you fused right now.

Alex


>
>
> Dave.
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20171006/ab2773bd/attachment.html>


More information about the mesa-dev mailing list