<div dir="ltr">On 25 January 2013 13:18, Matt Turner <span dir="ltr"><<a href="mailto:mattst88@gmail.com" target="_blank">mattst88@gmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div class=""><div class="h5">On Fri, Jan 25, 2013 at 9:55 AM, Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>> wrote:<br> > On 25 January 2013 07:49, Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>> wrote:<br> >><br> >> On 24 January 2013 19:44, Matt Turner <<a href="mailto:mattst88@gmail.com">mattst88@gmail.com</a>> wrote:<br> >>><br> >>> Following this email are eight patches that add the 4x8 pack/unpack<br> >>> operations that are the difference between what GLSL ES 3.0 and<br> >>> ARB_shading_language_packing require.<br> >>><br> >>> They require Chad's gles3-glsl-packing series and are available at<br> >>><br> >>> <a href="http://cgit.freedesktop.org/~mattst88/mesa/log/?h=ARB_shading_language_packing" target="_blank">http://cgit.freedesktop.org/~mattst88/mesa/log/?h=ARB_shading_language_packing</a><br> >>><br> >>> I've also added testing support on top of Chad's piglit patch. The<br> >>> {vs,fs}-unpackUnorm4x8 tests currently fail, and I've been unable to<br> >>> spot why.<br> >><br> >><br> >> I had minor comments on patches 4/8 and 5/8. The remainder is:<br> >><br> >> Reviewed-by: Paul Berry <<a href="mailto:stereotype441@gmail.com">stereotype441@gmail.com</a>><br> >><br> >> I didn't spot anything that would explain the failure in unpackUnorm4x8<br> >> tests. I'll go have a look at your piglit tests now, and if I don't find<br> >> anything there either, I'll fire up the simulator and see if I can see<br> >> what's going wrong.<br> ><br> ><br> > I found the problem. On i965, floating point divisions are implemented as<br> > multiplication by a reciprocal, whereas on the CPU there's a floating point<br> > division instruction. Therefore, unpackUnorm4x8's computation of "f /<br> > 255.0" doesn't yield consistent results when run on the CPU vs the<br> > GPU--there is a tiny difference due to the accumulation of floating point<br> > rounding errors.<br> ><br> > That's why the "fs" and "vs" variants of the tests failed, and the "const"<br> > variant passed--because Mesa does constant folding using the CPU's floating<br> > point division instruction, which matches the Python test generator<br> > perfectly, whereas the "fs" and "vs" variants use the actual GPU.<br> ><br> > It's only by dumb luck that this rounding error issue didn't bite us until<br> > now, because in principle it could equally well have occurred in the<br> > unpack2x16 functions.<br> ><br> > I believe we should relax the test to allow for these tiny rounding errors<br> > (this is what the other test generators, such as<br> > gen_builtin_uniform_tests.py, do). As an experiment I modified<br> > gen_builtin_packing_tests.py so that in vs_unpack_2x16_template and<br> > fs_unpack_2x16_template, "actual == expect${j}" is replaced with<br> > "distance(actual, expect${j}) < 0.00001". With this change, the test<br> > passes.<br> ><br> > However, that change isn't good enough to commit to piglit, for two reasons:<br> ><br> > (1) It should only be applied when testing the functions whose definition<br> > includes a division (unpackUnorm4x8, unpackSnorm4x8, unpackUnorm2x16, and<br> > unpackSnorm2x16). A properly functioning implementation ought to be able to<br> > get exact answers with all the other packing functions, and we should test<br> > that it does.<br> ><br> > (2) IMHO we should test that ouput values of 0.0 and +/- 1.0 are produced<br> > without error, since a shader author might conceivably write code that<br> > relies on these values being exact. That is, we should check that the<br> > following conversions are exact, with no rounding error:<br> ><br> > unpackUnorm4x8(0) == vec4(0.0)<br> > unpackUnorm4x8(0xffffffff) == vec4(1.0)<br> > unpackSnorm4x8(0) == vec4(0.0)<br> > unpackSnorm4x8(0x7f7f7f7f) == vec4(1.0)<br> > unpackSnorm4x8(0x80808080) == vec4(-1.0)<br> > unpackSnorm4x8(0x81818181) == vec4(-1.0)<br> > unpackUnorm2x16(0) == vec2(0.0)<br> > unpackUnorm2x16(0xffffffff) == vec4(1.0)<br> > unpackSnorm2x16(0) == vec4(0.0)<br> > unpackSnorm2x16(0x7fff7fff) == vec4(1.0)<br> > unpackSnorm2x16(0x80008000) == vec4(-1.0)<br> > unpackSnorm2x16(0x80018001) == vec4(-1.0)<br> ><br> > My recommendation: address problem (1) by modifying the templates to accept<br> > a new parameter that determines whether the test needs to be precise or<br> > approximate (e.g. "func.precise"). Address problem (2) by hand-coding a few<br> > shader_runner tests to check the cases above. IMHO it would be ok to leave<br> > the current patch as is (modulo my previous comments) and do a pair of<br> > follow-on patches to address problems (1) and (2).<br> <br> </div></div>Interesting. Thanks a lot for finding that and writing it up.<br> <br> Since div() is used in by both the Snorm and Unorm unpacking<br> functions, any idea why it only adversely affects the results of<br> Unorm? Multiplication by 1/255 yields lower precision than by 1/127?<br></blockquote><div><br>After messing around with numpy for a while, it looks like 1/255 expressed as a float32 happens to fall almost exactly between two representable float32 values:<br> </div><div><br></div><div>0.0039215683937072754 (representable float32)<br></div><div>0.0039215686274509803 (true value of 1/255)<br></div><div>0.0039215688593685627 (next representable float32) <br><br></div><div>So regardless of which way the rounding goes the relative error is approximately 5.9e-8.<br> <br></div><div>By luck, 1/127, 1/32767, and 1/65535 are all much closer to representable float32's, with relative errors of 3.7e-9, 9.3e-10, and 2.2e-10 respectively.<br><br></div><div>So yeah, the relative error introduced by multiplication by 1/255 just happens to be particularly bad.<br> </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> In investigating the Unorm unpacking failure I did notice that some<br> values worked (like 0.0, 1.0, and even 0.0078431377), so I don't<br> expect any problems with precision on the values you suggest.<br></blockquote><div><br></div><div>Thanks for double-checking--I checked that too, to make sure there wasn't some deeper problem lurking.<br></div><div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <br> I agree with your recommended solution. I'll push these patches today<br> for the 9.1 branch and do follow-on patches to piglit like you<br> suggest.<br> </blockquote></div><br></div><div class="gmail_extra">Sounds good. Thanks, Matt.<br></div></div>