[Mesa-dev] [PATCH 0/8] ARB_shading_language_packing

Fri Jan 25 13:36:58 PST 2013

On 25 January 2013 13:18, Matt Turner <mattst88 at gmail.com> wrote:

> On Fri, Jan 25, 2013 at 9:55 AM, Paul Berry <stereotype441 at gmail.com>
> wrote:
> > On 25 January 2013 07:49, Paul Berry <stereotype441 at gmail.com> wrote:
> >>
> >> On 24 January 2013 19:44, Matt Turner <mattst88 at gmail.com> wrote:
> >>>
> >>> Following this email are eight patches that add the 4x8 pack/unpack
> >>> operations that are the difference between what GLSL ES 3.0 and
> >>> ARB_shading_language_packing require.
> >>>
> >>> They require Chad's gles3-glsl-packing series and are available at
> >>>
> >>>
> http://cgit.freedesktop.org/~mattst88/mesa/log/?h=ARB_shading_language_packing
> >>>
> >>> I've also added testing support on top of Chad's piglit patch. The
> >>> {vs,fs}-unpackUnorm4x8 tests currently fail, and I've been unable to
> >>> spot why.
> >>
> >>
> >> I had minor comments on patches 4/8 and 5/8.  The remainder is:
> >>
> >> Reviewed-by: Paul Berry <stereotype441 at gmail.com>
> >>
> >> I didn't spot anything that would explain the failure in unpackUnorm4x8
> >> tests.  I'll go have a look at your piglit tests now, and if I don't
> find
> >> anything there either, I'll fire up the simulator and see if I can see
> >> what's going wrong.
> >
> >
> > I found the problem.  On i965, floating point divisions are implemented
> as
> > multiplication by a reciprocal, whereas on the CPU there's a floating
> point
> > division instruction.  Therefore, unpackUnorm4x8's computation of "f /
> > 255.0" doesn't yield consistent results when run on the CPU vs the
> > GPU--there is a tiny difference due to the accumulation of floating point
> > rounding errors.
> >
> > That's why the "fs" and "vs" variants of the tests failed, and the
> "const"
> > variant passed--because Mesa does constant folding using the CPU's
> floating
> > point division instruction, which matches the Python test generator
> > perfectly, whereas the "fs" and "vs" variants use the actual GPU.
> >
> > It's only by dumb luck that this rounding error issue didn't bite us
> until
> > now, because in principle it could equally well have occurred in the
> > unpack2x16 functions.
> >
> > I believe we should relax the test to allow for these tiny rounding
> errors
> > (this is what the other test generators, such as
> > gen_builtin_uniform_tests.py, do).  As an experiment I modified
> > gen_builtin_packing_tests.py so that in vs_unpack_2x16_template and
> > fs_unpack_2x16_template, "actual == expect${j}" is replaced with
> > "distance(actual, expect${j}) < 0.00001".  With this change, the test
> > passes.
> >
> > However, that change isn't good enough to commit to piglit, for two
> reasons:
> >
> > (1) It should only be applied when testing the functions whose definition
> > includes a division (unpackUnorm4x8, unpackSnorm4x8, unpackUnorm2x16, and
> > unpackSnorm2x16).  A properly functioning implementation ought to be
> able to
> > get exact answers with all the other packing functions, and we should
> test
> > that it does.
> >
> > (2) IMHO we should test that ouput values of 0.0 and +/- 1.0 are produced
> > without error, since a shader author might conceivably write code that
> > relies on these values being exact.  That is, we should check that the
> > following conversions are exact, with no rounding error:
> >
> > unpackUnorm4x8(0) == vec4(0.0)
> > unpackUnorm4x8(0xffffffff) == vec4(1.0)
> > unpackSnorm4x8(0) == vec4(0.0)
> > unpackSnorm4x8(0x7f7f7f7f) == vec4(1.0)
> > unpackSnorm4x8(0x80808080) == vec4(-1.0)
> > unpackSnorm4x8(0x81818181) == vec4(-1.0)
> > unpackUnorm2x16(0) == vec2(0.0)
> > unpackUnorm2x16(0xffffffff) == vec4(1.0)
> > unpackSnorm2x16(0) == vec4(0.0)
> > unpackSnorm2x16(0x7fff7fff) == vec4(1.0)
> > unpackSnorm2x16(0x80008000) == vec4(-1.0)
> > unpackSnorm2x16(0x80018001) == vec4(-1.0)
> >
> > My recommendation: address problem (1) by modifying the templates to
> accept
> > a new parameter that determines whether the test needs to be precise or
> > approximate (e.g. "func.precise").  Address problem (2) by hand-coding a
> few
> > shader_runner tests to check the cases above.  IMHO it would be ok to
> leave
> > the current patch as is (modulo my previous comments) and do a pair of
> > follow-on patches to address problems (1) and (2).
>
> Interesting. Thanks a lot for finding that and writing it up.
>
> Since div() is used in by both the Snorm and Unorm unpacking
> functions, any idea why it only adversely affects the results of
> Unorm? Multiplication by 1/255 yields lower precision than by 1/127?
>

After messing around with numpy for a while, it looks like 1/255 expressed
as a float32 happens to fall almost exactly between two representable
float32 values:

0.0039215683937072754 (representable float32)
0.0039215686274509803 (true value of 1/255)
0.0039215688593685627 (next representable float32)

So regardless of which way the rounding goes the relative error is
approximately 5.9e-8.

By luck, 1/127, 1/32767, and 1/65535 are all much closer to representable
float32's, with relative errors of 3.7e-9, 9.3e-10, and 2.2e-10
respectively.

So yeah, the relative error introduced by multiplication by 1/255 just
happens to be particularly bad.

> In investigating the Unorm unpacking failure I did notice that some
> values worked (like 0.0, 1.0, and even 0.0078431377), so I don't
> expect any problems with precision on the values you suggest.
>

Thanks for double-checking--I checked that too, to make sure there wasn't
some deeper problem lurking.

>
> I agree with your recommended solution. I'll push these patches today
> for the 9.1 branch and do follow-on patches to piglit like you
> suggest.
>

Sounds good.  Thanks, Matt.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130125/4234dae0/attachment.html>