[Beignet] [PATCH 6/7] replace mad with llvm intrinsic.

Tue Mar 10 19:19:45 PDT 2015

On Tue, Mar 10, 2015 at 6:55 PM, Song, Ruiling <ruiling.song at intel.com> wrote:
>> I'm not sure that it matters for this patch, but do we know if Gen's MAD
>> instruction is a fused-multiply-add? That is, does it not do an intermediate
>> rounding step after the multiply?
> I also have such kind of concern, so I did a simple test:
> on cpu side, I use "reference = (double)x1*(double)x2 + (double)x3;"

Some recent CPUs have FMA instructions. You should make sure you know
whether your code is compiled using FMA or not.

> And on gpu side, I use "result = mad(x1, x2, x3);"
> Then compare the result and reference, the bits are exactly the same, so I think gen's MAD does not do intermediate rounding after multiply.

The intermediate rounding step will not affect many pairs of numbers
that are multiplied together. You need to make sure you're testing a
pair of numbers that are affected by the intermediate rounding step.

I wrote a small program to find cases where fmaf(x, y, z) != x*y+z
(attached). Compile with -std=c99 -O2 -march=native -lm. I'm testing
on Haswell which has FMA.

It shows that

fmaf(1, 0.333333, 0.666667) is 1 (0x1.000002p+0), but 1 * 0.333333 +
0.666667 is 1 (0x1p+0)

Please test that Gen's MAD instruction produces what fmaf() produces
for 1.0 * 0.333333 + 0.666667.

Assuming glibc's fmaf() is correct... I'm again surprised by
floating-point numbers. :)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fma.c
Type: text/x-csrc
Size: 493 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/beignet/attachments/20150310/593e9593/attachment.c>