[Beignet] [PATCH 6/7] replace mad with llvm intrinsic.

Tue Mar 10 20:38:14 PDT 2015

> -----Original Message-----
> From: Matt Turner [mailto:mattst88 at gmail.com]
> Sent: Wednesday, March 11, 2015 10:20 AM
> To: Song, Ruiling
> Cc: Luo, Xionghu; beignet at lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH 6/7] replace mad with llvm intrinsic.
> 
> On Tue, Mar 10, 2015 at 6:55 PM, Song, Ruiling <ruiling.song at intel.com>
> wrote:
> >> I'm not sure that it matters for this patch, but do we know if Gen's
> >> MAD instruction is a fused-multiply-add? That is, does it not do an
> >> intermediate rounding step after the multiply?
> > I also have such kind of concern, so I did a simple test:
> > on cpu side, I use "reference = (double)x1*(double)x2 + (double)x3;"
> 
> Some recent CPUs have FMA instructions. You should make sure you know
> whether your code is compiled using FMA or not.
> 
> > And on gpu side, I use "result = mad(x1, x2, x3);"
> > Then compare the result and reference, the bits are exactly the same, so I
> think gen's MAD does not do intermediate rounding after multiply.
> 
> The intermediate rounding step will not affect many pairs of numbers that
> are multiplied together. You need to make sure you're testing a pair of
> numbers that are affected by the intermediate rounding step.
> 
> I wrote a small program to find cases where fmaf(x, y, z) != x*y+z (attached).
> Compile with -std=c99 -O2 -march=native -lm. I'm testing on Haswell which
> has FMA.
> 
> It shows that
> 
> fmaf(1, 0.333333, 0.666667) is 1 (0x1.000002p+0), but 1 * 0.333333 +
> 0.666667 is 1 (0x1p+0)
> 
> Please test that Gen's MAD instruction produces what fmaf() produces for
> 1.0 * 0.333333 + 0.666667.
> 
> Assuming glibc's fmaf() is correct... I'm again surprised by floating-point
> numbers. :)

My gcc doesn't have nextafterf and fmaf definition, and I use g++ to build on my ivb machine.
g++  -O2 -march=native -lm -o fma fma.c
its output ( I changed to use "%a" in printf):
fmaf(0x1.000002p+0, 0x1.555556p-2, 0x1.555556p-1) is 1 (0x1.000002p+0), but 0x1.000002p+0 * 0x1.555556p-2 + 0x1.555556p-1 is 1 (0x1p+0)
and I tried using gen's MAD, its result is same as fmaf. You can have a try on your haswell machine. I think the result would be the same.