[Mesa-dev] [PATCH 0/8] ARB_shading_language_packing

Fri Jan 25 13:18:06 PST 2013

On Fri, Jan 25, 2013 at 9:55 AM, Paul Berry <stereotype441 at gmail.com> wrote:
> On 25 January 2013 07:49, Paul Berry <stereotype441 at gmail.com> wrote:
>>
>> On 24 January 2013 19:44, Matt Turner <mattst88 at gmail.com> wrote:
>>>
>>> Following this email are eight patches that add the 4x8 pack/unpack
>>> operations that are the difference between what GLSL ES 3.0 and
>>> ARB_shading_language_packing require.
>>>
>>> They require Chad's gles3-glsl-packing series and are available at
>>>
>>> http://cgit.freedesktop.org/~mattst88/mesa/log/?h=ARB_shading_language_packing
>>>
>>> I've also added testing support on top of Chad's piglit patch. The
>>> {vs,fs}-unpackUnorm4x8 tests currently fail, and I've been unable to
>>> spot why.
>>
>>
>> I had minor comments on patches 4/8 and 5/8.  The remainder is:
>>
>> Reviewed-by: Paul Berry <stereotype441 at gmail.com>
>>
>> I didn't spot anything that would explain the failure in unpackUnorm4x8
>> tests.  I'll go have a look at your piglit tests now, and if I don't find
>> anything there either, I'll fire up the simulator and see if I can see
>> what's going wrong.
>
>
> I found the problem.  On i965, floating point divisions are implemented as
> multiplication by a reciprocal, whereas on the CPU there's a floating point
> division instruction.  Therefore, unpackUnorm4x8's computation of "f /
> 255.0" doesn't yield consistent results when run on the CPU vs the
> GPU--there is a tiny difference due to the accumulation of floating point
> rounding errors.
>
> That's why the "fs" and "vs" variants of the tests failed, and the "const"
> variant passed--because Mesa does constant folding using the CPU's floating
> point division instruction, which matches the Python test generator
> perfectly, whereas the "fs" and "vs" variants use the actual GPU.
>
> It's only by dumb luck that this rounding error issue didn't bite us until
> now, because in principle it could equally well have occurred in the
> unpack2x16 functions.
>
> I believe we should relax the test to allow for these tiny rounding errors
> (this is what the other test generators, such as
> gen_builtin_uniform_tests.py, do).  As an experiment I modified
> gen_builtin_packing_tests.py so that in vs_unpack_2x16_template and
> fs_unpack_2x16_template, "actual == expect${j}" is replaced with
> "distance(actual, expect${j}) < 0.00001".  With this change, the test
> passes.
>
> However, that change isn't good enough to commit to piglit, for two reasons:
>
> (1) It should only be applied when testing the functions whose definition
> includes a division (unpackUnorm4x8, unpackSnorm4x8, unpackUnorm2x16, and
> unpackSnorm2x16).  A properly functioning implementation ought to be able to
> get exact answers with all the other packing functions, and we should test
> that it does.
>
> (2) IMHO we should test that ouput values of 0.0 and +/- 1.0 are produced
> without error, since a shader author might conceivably write code that
> relies on these values being exact.  That is, we should check that the
> following conversions are exact, with no rounding error:
>
> unpackUnorm4x8(0) == vec4(0.0)
> unpackUnorm4x8(0xffffffff) == vec4(1.0)
> unpackSnorm4x8(0) == vec4(0.0)
> unpackSnorm4x8(0x7f7f7f7f) == vec4(1.0)
> unpackSnorm4x8(0x80808080) == vec4(-1.0)
> unpackSnorm4x8(0x81818181) == vec4(-1.0)
> unpackUnorm2x16(0) == vec2(0.0)
> unpackUnorm2x16(0xffffffff) == vec4(1.0)
> unpackSnorm2x16(0) == vec4(0.0)
> unpackSnorm2x16(0x7fff7fff) == vec4(1.0)
> unpackSnorm2x16(0x80008000) == vec4(-1.0)
> unpackSnorm2x16(0x80018001) == vec4(-1.0)
>
> My recommendation: address problem (1) by modifying the templates to accept
> a new parameter that determines whether the test needs to be precise or
> approximate (e.g. "func.precise").  Address problem (2) by hand-coding a few
> shader_runner tests to check the cases above.  IMHO it would be ok to leave
> the current patch as is (modulo my previous comments) and do a pair of
> follow-on patches to address problems (1) and (2).

Interesting. Thanks a lot for finding that and writing it up.

Since div() is used in by both the Snorm and Unorm unpacking
functions, any idea why it only adversely affects the results of
Unorm? Multiplication by 1/255 yields lower precision than by 1/127?

In investigating the Unorm unpacking failure I did notice that some
values worked (like 0.0, 1.0, and even 0.0078431377), so I don't
expect any problems with precision on the values you suggest.

I agree with your recommended solution. I'll push these patches today
for the 9.1 branch and do follow-on patches to piglit like you
suggest.