[Beignet] [PATCH 6/7] replace mad with llvm intrinsic.

Tue Mar 10 20:04:48 PDT 2015


> -----Original Message-----
> From: Matt Turner [mailto:mattst88 at gmail.com]
> Sent: Wednesday, March 11, 2015 10:20 AM
> To: Song, Ruiling
> Cc: Luo, Xionghu; beignet at lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH 6/7] replace mad with llvm intrinsic.
> 
> On Tue, Mar 10, 2015 at 6:55 PM, Song, Ruiling <ruiling.song at intel.com>
> wrote:
> >> I'm not sure that it matters for this patch, but do we know if Gen's
> >> MAD instruction is a fused-multiply-add? That is, does it not do an
> >> intermediate rounding step after the multiply?
> > I also have such kind of concern, so I did a simple test:
> > on cpu side, I use "reference = (double)x1*(double)x2 + (double)x3;"
> 
> Some recent CPUs have FMA instructions. You should make sure you know
> whether your code is compiled using FMA or not.
> 
> > And on gpu side, I use "result = mad(x1, x2, x3);"
> > Then compare the result and reference, the bits are exactly the same, so I
> think gen's MAD does not do intermediate rounding after multiply.
> 
> The intermediate rounding step will not affect many pairs of numbers that
> are multiplied together. You need to make sure you're testing a pair of
> numbers that are affected by the intermediate rounding step.
> 
> I wrote a small program to find cases where fmaf(x, y, z) != x*y+z (attached).
> Compile with -std=c99 -O2 -march=native -lm. I'm testing on Haswell which
> has FMA.
> 
> It shows that
> 
> fmaf(1, 0.333333, 0.666667) is 1 (0x1.000002p+0), but 1 * 0.333333 +
> 0.666667 is 1 (0x1p+0)
> 
> Please test that Gen's MAD instruction produces what fmaf() produces for
> 1.0 * 0.333333 + 0.666667.
I tried these number, the binary representation of 0.333333 is 0x1.55553ep-2
The binary representation of 0.666667 is 0x1.5555p-1
I manually sum it up. The mantissa bits is 24 bits ones (here not counting in the hidden one). As floating point only has 23 bits mantissa,
I don't know how to round it here, if select to round up, the result would be 0x1p0. I need to check IEEE754 spec.
But it cannot generate 0x1.000002p+0.
I think you'd better not output using %g, using %g would not show its exact binary representation. I always like %a representation.
> 
> Assuming glibc's fmaf() is correct... I'm again surprised by floating-point
> numbers. :)