<div dir="ltr">On 25 January 2013 13:18, Matt Turner <span dir="ltr"><<a href="mailto:mattst88@gmail.com" target="_blank">mattst88@gmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class=""><div class="h5">On Fri, Jan 25, 2013 at 9:55 AM, Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>> wrote:<br>
> On 25 January 2013 07:49, Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>> wrote:<br>
>><br>
>> On 24 January 2013 19:44, Matt Turner <<a href="mailto:mattst88@gmail.com">mattst88@gmail.com</a>> wrote:<br>
>>><br>
>>> Following this email are eight patches that add the 4x8 pack/unpack<br>
>>> operations that are the difference between what GLSL ES 3.0 and<br>
>>> ARB_shading_language_packing require.<br>
>>><br>
>>> They require Chad's gles3-glsl-packing series and are available at<br>
>>><br>
>>> <a href="http://cgit.freedesktop.org/~mattst88/mesa/log/?h=ARB_shading_language_packing" target="_blank">http://cgit.freedesktop.org/~mattst88/mesa/log/?h=ARB_shading_language_packing</a><br>
>>><br>
>>> I've also added testing support on top of Chad's piglit patch. The<br>
>>> {vs,fs}-unpackUnorm4x8 tests currently fail, and I've been unable to<br>
>>> spot why.<br>
>><br>
>><br>
>> I had minor comments on patches 4/8 and 5/8. The remainder is:<br>
>><br>
>> Reviewed-by: Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>><br>
>><br>
>> I didn't spot anything that would explain the failure in unpackUnorm4x8<br>
>> tests. I'll go have a look at your piglit tests now, and if I don't find<br>
>> anything there either, I'll fire up the simulator and see if I can see<br>
>> what's going wrong.<br>
><br>
><br>
> I found the problem. On i965, floating point divisions are implemented as<br>
> multiplication by a reciprocal, whereas on the CPU there's a floating point<br>
> division instruction. Therefore, unpackUnorm4x8's computation of "f /<br>
> 255.0" doesn't yield consistent results when run on the CPU vs the<br>
> GPU--there is a tiny difference due to the accumulation of floating point<br>
> rounding errors.<br>
><br>
> That's why the "fs" and "vs" variants of the tests failed, and the "const"<br>
> variant passed--because Mesa does constant folding using the CPU's floating<br>
> point division instruction, which matches the Python test generator<br>
> perfectly, whereas the "fs" and "vs" variants use the actual GPU.<br>
><br>
> It's only by dumb luck that this rounding error issue didn't bite us until<br>
> now, because in principle it could equally well have occurred in the<br>
> unpack2x16 functions.<br>
><br>
> I believe we should relax the test to allow for these tiny rounding errors<br>
> (this is what the other test generators, such as<br>
> gen_builtin_uniform_tests.py, do). As an experiment I modified<br>
> gen_builtin_packing_tests.py so that in vs_unpack_2x16_template and<br>
> fs_unpack_2x16_template, "actual == expect${j}" is replaced with<br>
> "distance(actual, expect${j}) < 0.00001". With this change, the test<br>
> passes.<br>
><br>
> However, that change isn't good enough to commit to piglit, for two reasons:<br>
><br>
> (1) It should only be applied when testing the functions whose definition<br>
> includes a division (unpackUnorm4x8, unpackSnorm4x8, unpackUnorm2x16, and<br>
> unpackSnorm2x16). A properly functioning implementation ought to be able to<br>
> get exact answers with all the other packing functions, and we should test<br>
> that it does.<br>
><br>
> (2) IMHO we should test that ouput values of 0.0 and +/- 1.0 are produced<br>
> without error, since a shader author might conceivably write code that<br>
> relies on these values being exact. That is, we should check that the<br>
> following conversions are exact, with no rounding error:<br>
><br>
> unpackUnorm4x8(0) == vec4(0.0)<br>
> unpackUnorm4x8(0xffffffff) == vec4(1.0)<br>
> unpackSnorm4x8(0) == vec4(0.0)<br>
> unpackSnorm4x8(0x7f7f7f7f) == vec4(1.0)<br>
> unpackSnorm4x8(0x80808080) == vec4(-1.0)<br>
> unpackSnorm4x8(0x81818181) == vec4(-1.0)<br>
> unpackUnorm2x16(0) == vec2(0.0)<br>
> unpackUnorm2x16(0xffffffff) == vec4(1.0)<br>
> unpackSnorm2x16(0) == vec4(0.0)<br>
> unpackSnorm2x16(0x7fff7fff) == vec4(1.0)<br>
> unpackSnorm2x16(0x80008000) == vec4(-1.0)<br>
> unpackSnorm2x16(0x80018001) == vec4(-1.0)<br>
><br>
> My recommendation: address problem (1) by modifying the templates to accept<br>
> a new parameter that determines whether the test needs to be precise or<br>
> approximate (e.g. "func.precise"). Address problem (2) by hand-coding a few<br>
> shader_runner tests to check the cases above. IMHO it would be ok to leave<br>
> the current patch as is (modulo my previous comments) and do a pair of<br>
> follow-on patches to address problems (1) and (2).<br>
<br>
</div></div>Interesting. Thanks a lot for finding that and writing it up.<br>
<br>
Since div() is used in by both the Snorm and Unorm unpacking<br>
functions, any idea why it only adversely affects the results of<br>
Unorm? Multiplication by 1/255 yields lower precision than by 1/127?<br></blockquote><div><br>After messing around with numpy for a while, it looks like 1/255 expressed as a float32 happens to fall almost exactly between two representable float32 values:<br>
</div><div><br></div><div>0.0039215683937072754 (representable float32)<br></div><div>0.0039215686274509803 (true value of 1/255)<br></div><div>0.0039215688593685627 (next representable float32) <br><br></div><div>So regardless of which way the rounding goes the relative error is approximately 5.9e-8.<br>
<br></div><div>By luck, 1/127, 1/32767, and 1/65535 are all much closer to representable float32's, with relative errors of 3.7e-9, 9.3e-10, and 2.2e-10 respectively.<br><br></div><div>So yeah, the relative error introduced by multiplication by 1/255 just happens to be particularly bad.<br>
</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
In investigating the Unorm unpacking failure I did notice that some<br>
values worked (like 0.0, 1.0, and even 0.0078431377), so I don't<br>
expect any problems with precision on the values you suggest.<br></blockquote><div><br></div><div>Thanks for double-checking--I checked that too, to make sure there wasn't some deeper problem lurking.<br></div><div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
I agree with your recommended solution. I'll push these patches today<br>
for the 9.1 branch and do follow-on patches to piglit like you<br>
suggest.<br>
</blockquote></div><br></div><div class="gmail_extra">Sounds good. Thanks, Matt.<br></div></div>