[Mesa-dev] [PATCH 0/8] ARB_shading_language_packing

Fri Jan 25 12:23:30 PST 2013

On 01/25/2013 09:55 AM, Paul Berry wrote:
> On 25 January 2013 07:49, Paul Berry <stereotype441 at gmail.com> wrote:
> 
>> On 24 January 2013 19:44, Matt Turner <mattst88 at gmail.com> wrote:
>>
>>> Following this email are eight patches that add the 4x8 pack/unpack
>>> operations that are the difference between what GLSL ES 3.0 and
>>> ARB_shading_language_packing require.
>>>
>>> They require Chad's gles3-glsl-packing series and are available at
>>>
>>> http://cgit.freedesktop.org/~mattst88/mesa/log/?h=ARB_shading_language_packing
>>>
>>> I've also added testing support on top of Chad's piglit patch. The
>>> {vs,fs}-unpackUnorm4x8 tests currently fail, and I've been unable to
>>> spot why.
>>>
>>
>> I had minor comments on patches 4/8 and 5/8.  The remainder is:
>>
>> Reviewed-by: Paul Berry <stereotype441 at gmail.com>
>>
>> I didn't spot anything that would explain the failure in unpackUnorm4x8
>> tests.  I'll go have a look at your piglit tests now, and if I don't find
>> anything there either, I'll fire up the simulator and see if I can see
>> what's going wrong.
>>
> 
> I found the problem.  On i965, floating point divisions are implemented as
> multiplication by a reciprocal, whereas on the CPU there's a floating point
> division instruction.  Therefore, unpackUnorm4x8's computation of "f /
> 255.0" doesn't yield consistent results when run on the CPU vs the
> GPU--there is a tiny difference due to the accumulation of floating point
> rounding errors.
> 
> That's why the "fs" and "vs" variants of the tests failed, and the "const"
> variant passed--because Mesa does constant folding using the CPU's floating
> point division instruction, which matches the Python test generator
> perfectly, whereas the "fs" and "vs" variants use the actual GPU.
> 
> It's only by dumb luck that this rounding error issue didn't bite us until
> now, because in principle it could equally well have occurred in the
> unpack2x16 functions.
> 
> I believe we should relax the test to allow for these tiny rounding errors
> (this is what the other test generators, such as
> gen_builtin_uniform_tests.py, do).  As an experiment I modified
> gen_builtin_packing_tests.py so that in vs_unpack_2x16_template and
> fs_unpack_2x16_template, "actual == expect${j}" is replaced with
> "distance(actual, expect${j}) < 0.00001".  With this change, the test
> passes.
> 
> However, that change isn't good enough to commit to piglit, for two reasons:
> 
> (1) It should only be applied when testing the functions whose definition
> includes a division (unpackUnorm4x8, unpackSnorm4x8, unpackUnorm2x16, and
> unpackSnorm2x16).  A properly functioning implementation ought to be able
> to get exact answers with all the other packing functions, and we should
> test that it does.
> 
> (2) IMHO we should test that ouput values of 0.0 and +/- 1.0 are produced
> without error, since a shader author might conceivably write code that
> relies on these values being exact.  That is, we should check that the
> following conversions are exact, with no rounding error:
> 
> unpackUnorm4x8(0) == vec4(0.0)
> unpackUnorm4x8(0xffffffff) == vec4(1.0)
> unpackSnorm4x8(0) == vec4(0.0)
> unpackSnorm4x8(0x7f7f7f7f) == vec4(1.0)
> unpackSnorm4x8(0x80808080) == vec4(-1.0)
> unpackSnorm4x8(0x81818181) == vec4(-1.0)
> unpackUnorm2x16(0) == vec2(0.0)
> unpackUnorm2x16(0xffffffff) == vec4(1.0)
> unpackSnorm2x16(0) == vec4(0.0)
> unpackSnorm2x16(0x7fff7fff) == vec4(1.0)
> unpackSnorm2x16(0x80008000) == vec4(-1.0)
> unpackSnorm2x16(0x80018001) == vec4(-1.0)
> 
> My recommendation: address problem (1) by modifying the templates to accept
> a new parameter that determines whether the test needs to be precise or
> approximate (e.g. "func.precise").  Address problem (2) by hand-coding a
> few shader_runner tests to check the cases above.  IMHO it would be ok to
> leave the current patch as is (modulo my previous comments) and do a pair
> of follow-on patches to address problems (1) and (2).
> 
> Chad, do you have any thoughts on this subject, since you're the original
> author of this test generator?

I don't like the kludge of having a separate shader_test for exact values.
But, I've thought hard on what modifications to the python script would
be needed to solve the problem solely within the script and its generated
shaders, and I like that solution even less.

So, Paul, I think we should go forward with your proposed solution.