[Mesa-dev] [PATCH 1/2] mesa: remove buggy assertions in unpack Z24

Mon Dec 12 08:12:05 PST 2011

----- Original Message -----
> On Mon, Dec 12, 2011 at 8:20 AM, Jose Fonseca <jfonseca at vmware.com>
> wrote:
> > ----- Original Message -----
> >> On Mon, Dec 12, 2011 at 2:10 PM, Jose Fonseca
> >> <jfonseca at vmware.com>
> >> wrote:
> >> >
> >> >
> >> > ----- Original Message -----
> >> >> On Mon, Dec 12, 2011 at 1:24 PM, Jose Fonseca
> >> >> <jfonseca at vmware.com>
> >> >> wrote:
> >> >> > I saw this email yesterday, but did not have time to reply.
> >> >> >
> >> >> > ----- Original Message -----
> >> >> >> On Mon, Nov 21, 2011 at 6:52 PM, Eric Anholt
> >> >> >> <eric at anholt.net>
> >> >> >> wrote:
> >> >> >> > On Sun, 20 Nov 2011 15:08:55 +0100, Marek Olšák
> >> >> >> > <maraeo at gmail.com>
> >> >> >> > wrote:
> >> >> >> >> unpack_float_z_Z24_X8 fails with -O2 in:
> >> >> >> >> - fbo-blit-d24s8
> >> >> >> >> - fbo-depth-sample-compare
> >> >> >> >> - fbo-readpixels-depth-formats
> >> >> >> >> - glean/depthStencil
> >> >> >> >>
> >> >> >> >> With -O0, it works fine.
> >> >> >> >>
> >> >> >> >> I am removing all the assertions. There's not much point
> >> >> >> >> in
> >> >> >> >> having
> >> >> >> >> them,
> >> >> >> >> is there?
> >> >> >> >
> >> >> >> > Is -ffast-math involved at all?
> >> >> >>
> >> >> >> Yes, -fno-fast-math makes the problem go away.
> >> >> >>
> >> >> >> >
> >> >> >> > At a guess, replacing "* scale" with "/ (float)0xffffff"
> >> >> >> > makes
> >> >> >> > the
> >> >> >> > failure happen either way?
> >> >> >>
> >> >> >> Yes. Only using GLdouble fixes it.
> >> >> >>
> >> >> >> It makes sense. The mantissa without the sign has 23 bits,
> >> >> >> but
> >> >> >> 24
> >> >> >> bits
> >> >> >> are required to exactly represent 0xffffff if I am not
> >> >> >> mistaken.
> >> >> >
> >> >> > I'm afraid this is wrong: single precision floating point can
> >> >> > describe 24bits uints (and therefore unorms) without any
> >> >> > loss,
> >> >> > because although the mantissa has 23bits, the mantissa is
> >> >> > only
> >> >> > used to represent all but the most significant bit, ie.,
> >> >> > there's
> >> >> > an implicit 24th bit always set to one.
> >> >> >
> >> >> > The fact that -fno-fast-math makes the problem go away is yet
> >> >> > another clear proof that the issue is _not_ lack of precision
> >> >> > of
> >> >> > single-precision floating point -- as -fno-fast-math will
> >> >> > still
> >> >> > use single-precision floating point numbers.  Actually,
> >> >> > fno-fast-math, it will ensure that all intermediate steps are
> >> >> > done
> >> >> > in single precision instead of higher intel x87 80bit
> >> >> > floating
> >> >> > points, on the 32bit x86 architecture.  And I suspect the
> >> >> > problem
> >> >> > here is that the rounding w/ x80 yields wrong results.
> >> >> >
> >> >> >
> >> >> > I also disagree that using double precision is a good
> >> >> > solution.
> >> >> > Either fast-math serves our purposes, or it doesn't.  Using
> >> >> > double
> >> >> > precision to work-around issues with single-precision
> >> >> > fast-math
> >> >> > is
> >> >> > the *worst* thing we could do.
> >> >> >
> >> >> >
> >> >> > Does the assertion failure even happen on 64bits? I doubt, as
> >> >> > x64
> >> >> > ABI establishes the use of sse for all floating point, and
> >> >> > x87
> >> >> > 80bit floating point will never be used. So effectively this
> >> >> > is
> >> >> > making all archs slower.
> >> >> >
> >> >> >
> >> >> > Honestly, I think the patch should be recalled.  And we
> >> >> > should
> >> >> > consider disabling fast-math.  And maybe enabling
> >> >> > -mfpmath=sse
> >> >> > on
> >> >> > 32bits x86 (i.e., require sse).
> >> >>
> >> >> You're probably right. Feel free to revert & fix the issue in
> >> >> some
> >> >> other way.
> >> >
> >> > OK.
> >> >
> >> > Could you please confirm whether the assertions failures you saw
> >> > were on 32bits or 64bits mode?
> >>
> >> 32-bit.
> >
> > Thanks.
> >
> > In that case I propose dropping fast-math, at least on x86 32bits,
> > as it makes (in particular the unorm24 <-> f32 conversions happen
> > in a lot of places in Mesa's source tree) and some times worse
> > code [1], and simply use the subset of finer grained options  [2]:
> >
> >  -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only,
> >  -fno-rounding-math, -fno-signaling-nans and -fcx-limited-range.
> >
> > which really meets our needs.
> >
> > About -mfpmath=sse on 32bits, although I believe that depending on
> > sse would be alright, it looks like -mfpmath=sse  it's not a clear
> > winner on 32bits , because calls to libc still need to use x87
> > registers.
> >
> >
> > Jose
> >
> >
> > [1] http://programerror.com/2009/09/when-gccs-ffast-math-isnt/
> > [2]
> > http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Optimize-Options.html
> 
> 
> Thanks for digging into this, guys.
> 
> I'm happy to drop -ffast-math (or use -fno-fast-math) but it would be
> interesting to do at least a few before/after comparisons just to
> make
> sure there's no surprises in performance.

Yep, that was my plan.

> In any case, don't we still need to use double-precision for Z32
> packing/unpacking?  I ran into a failure in _mesa_pack_float_z_row()
> for Z32 on Saturday (debug build on x64).

Agreed. I don't know an accurate & efficient way converting unorm32 <=> f32 without doubles.

The trick I use in llvmpipe to convert f32 to unorm32, in src/gallium/auxiliary/gallivm/lp_bld_conv.c , only gets 31bits right (i.e, drops one bit from values near 0.0):

      /*
       * The destination exceeds what can be represented in the floating point.
       * So multiply by the largest power two we get away with, and when
       * subtract the most significant bit to rescale to normalized values.
       *
       * The largest power of two factor we can get away is
       * (1 << (src_type.width - 1)), because we need to use signed . In theory it
       * should be (1 << (src_type.width - 2)), but IEEE 754 rules states
       * INT_MIN should be returned in FPToSI, which is the correct result for
       * values near 1.0!
       *
       * This means we get (src_type.width - 1) correct bits for values near 0.0,
       * and (mantissa + 1) correct bits for values near 1.0. Equally or more
       * important, we also get exact results for 0.0 and 1.0.
       */

> I'd like to leave Marek's patch committed as-is for now until we
> settle on another solution.

Yep.

Jose