[Beignet] [PATCH 6/7] replace mad with llvm intrinsic.

Song, Ruiling ruiling.song at intel.com
Tue Mar 10 20:07:37 PDT 2015



> -----Original Message-----
> From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf Of
> Song, Ruiling
> Sent: Wednesday, March 11, 2015 11:05 AM
> To: Matt Turner
> Cc: Luo, Xionghu; beignet at lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH 6/7] replace mad with llvm intrinsic.
> 
> 
> 
> > -----Original Message-----
> > From: Matt Turner [mailto:mattst88 at gmail.com]
> > Sent: Wednesday, March 11, 2015 10:20 AM
> > To: Song, Ruiling
> > Cc: Luo, Xionghu; beignet at lists.freedesktop.org
> > Subject: Re: [Beignet] [PATCH 6/7] replace mad with llvm intrinsic.
> >
> > On Tue, Mar 10, 2015 at 6:55 PM, Song, Ruiling
> > <ruiling.song at intel.com>
> > wrote:
> > >> I'm not sure that it matters for this patch, but do we know if
> > >> Gen's MAD instruction is a fused-multiply-add? That is, does it not
> > >> do an intermediate rounding step after the multiply?
> > > I also have such kind of concern, so I did a simple test:
> > > on cpu side, I use "reference = (double)x1*(double)x2 + (double)x3;"
> >
> > Some recent CPUs have FMA instructions. You should make sure you know
> > whether your code is compiled using FMA or not.
> >
> > > And on gpu side, I use "result = mad(x1, x2, x3);"
> > > Then compare the result and reference, the bits are exactly the
> > > same, so I
> > think gen's MAD does not do intermediate rounding after multiply.
> >
> > The intermediate rounding step will not affect many pairs of numbers
> > that are multiplied together. You need to make sure you're testing a
> > pair of numbers that are affected by the intermediate rounding step.
> >
> > I wrote a small program to find cases where fmaf(x, y, z) != x*y+z
> (attached).
> > Compile with -std=c99 -O2 -march=native -lm. I'm testing on Haswell
> > which has FMA.
> >
> > It shows that
> >
> > fmaf(1, 0.333333, 0.666667) is 1 (0x1.000002p+0), but 1 * 0.333333 +
> > 0.666667 is 1 (0x1p+0)
> >
> > Please test that Gen's MAD instruction produces what fmaf() produces
> > for
> > 1.0 * 0.333333 + 0.666667.
> I tried these number, the binary representation of 0.333333 is 0x1.55553ep-2
> The binary representation of 0.666667 is 0x1.5555p-1 I manually sum it up.

Sorry, typo here, should be " binary representation of 0.666667 is 0x1.55556p-1"

> The mantissa bits is 24 bits ones (here not counting in the hidden one). As
> floating point only has 23 bits mantissa, I don't know how to round it here, if
> select to round up, the result would be 0x1p0. I need to check IEEE754 spec.
> But it cannot generate 0x1.000002p+0.
> I think you'd better not output using %g, using %g would not show its exact
> binary representation. I always like %a representation.
> >
> > Assuming glibc's fmaf() is correct... I'm again surprised by
> > floating-point numbers. :)
> _______________________________________________
> Beignet mailing list
> Beignet at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/beignet


More information about the Beignet mailing list