[Mesa-dev] [PATCH] ac/nir: use llvm fma intrinsic if nir instruction is exact.

Fri Oct 6 16:37:52 UTC 2017

On Fri, Oct 6, 2017 at 4:39 AM, Dave Airlie <airlied at gmail.com> wrote:
> On 6 October 2017 at 12:31, Marek Olšák <maraeo at gmail.com> wrote:
>> On Fri, Oct 6, 2017 at 4:10 AM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>> On Thu, Oct 5, 2017 at 10:08 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>> On Fri, Oct 6, 2017 at 3:50 AM, Connor Abbott <cwabbott0 at gmail.com> wrote:
>>>>> Why? While it might technically be legal, always generating an unfused
>>>>> mul+add when the user explicitly requested fma() seems harsh...
>>>>
>>>> It's slow on some chips. It doesn't need any other reason.
>>>>
>>>> Marek
>>>
>>> Presumably, if the developer asked for fma, then they don't care how
>>> fast or slow it is...
>>
>> Feral asked for fma. They care. This debate is pointless. We just
>> won't use fma by default. Period.
>
> They didn't ask for it with precise precision. I'm assuming if someone wants
> fma with precise precision we should give it to them. Like at least
> the fma manpage states.
>
> https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/fma.xhtml

Oh please Dave, the page says the exact opposite of what you are
saying. The only thing the manpage says is: If fma and mul+add have
different precision, fma can't be split and mul+add can't be combined.
It doesn't say anything about precision of the result of fma itself.
Search for the word "can". It's not the same as "must".

That said, RADV can use as many slow opcodes as you want if you
insist. I'm only saying that the opcode selection of radeonsi is
non-negotiable on my side, and nir_to_llvm might get radeonsi-specific
opcode selection.

Marek