[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

Tue Jun 13 15:06:20 UTC 2017

On Tue, Jun 13, 2017 at 2:33 AM, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 13.06.2017 um 02:05 schrieb Ilia Mirkin:
>> On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>> FWIW surely on nv50 you could keep a single mad instruction for umad
>>> (sad maybe too?). (I'm actually wondering if the hw really can't do
>>> unfused float multiply+add as a single instruction but I know next to
>>> nothing about nvidia hw...)
>>
>> The compiler should reassociate a mul + add into a mad where possible.
>> In actuality, IMAD is actually super-slow... allegedly slower than
>> IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is
>> faster but we haven't figured out how to operate it yet. I'm not aware
>> of a muladd version of fma on fermi and newer (GL 4.0). The tesla
>> series does have a floating point mul+add (but no fma).
>>
>
> Interesting. radeons seem to always have a unfused mad. pre-gcn parts
> apparently only have a 32bit fma with parts supporting double precision.
> The same restriction is stated for gcn parts in the isa docs, which
> obviously doesn't make sense, but I have no idea if the fma is full speed...

fma is full-rate on Tahiti and Hawaii and quarter-rate on other GCN chips. FP64
opcodes are always 2x or 4x slower than fma_f32.

Marek