[Mesa-dev] Problems with accuracy of coeffs_init + attribs_update

Thu Oct 29 04:27:02 PDT 2015

Hi Roland, Jose

I wanted to bring a problem I found to your attention, and discuss
about it and ways to solve it.

I'm working on regressions of piglit gpu.py between x86-64 and
ppc64le, when running with llvmpipe.

One of the regressions manifests itself in 2 tests,
clip-distance-bulk-copy and clip-distance-itemized-copy

What happens in ppc64le is that *some* of the interpolated per-vertex
values (clip-distance values) that constitute the input into the
fragment shader, are not calculated according to the required accuracy
of the test (which is an error/distance of less than 1e-6)

It took a while, but I believe I found the reason. In general, when
running on ppc64le, the code path that is taken at
lp_build_interp_soa_init() is doing  bld->simple_interp = FALSE, which
means using coeffs_init() + attribs_update(). That's in contrast with
x86-64, where bld->simple_interp = TRUE, which means the code will use
coeffs_init_simple() + attribs_update_simple()

In specific, the problem lies in how the coeffs are calculated inside
coeffs_init. To simply put it, they are only *actually* calculated
correctly for the first quad. The coeffs are later *updated* for the
other quads, using dadx/dady and dadq.

------
side-note:
I believe you even documented it in coeffs_init():
      /*
       * a = a0 + (x * dadx + y * dady)
       * a0aos is the attrib value at top left corner of stamp
       */
------

The root cause for that, I believe, is because coeffs_init does *not*
take into account the different x/y offsets for the other quads. In
contrast, on x86-64, the code uses a function called calc_offset(),
which does take the different x/y offsets into account.

Please look at this definition (from lp_bld_interp.c) :

static const unsigned char quad_offset_x[16] = {0, 1, 0, 1, 2, 3, 2,
3, 0, 1, 0, 1, 2, 3, 2, 3};
static const unsigned char quad_offset_y[16] = {0, 0, 1, 1, 0, 0, 1,
1, 2, 2, 3, 3, 2, 2, 3, 3};

Now, look at how coeffs_init() uses them:

   for (i = 0; i < coeff_bld->type.length; i++) {
      LLVMValueRef nr = lp_build_const_int32(gallivm, i);
      LLVMValueRef pixxf = lp_build_const_float(gallivm, quad_offset_x[i]);
      LLVMValueRef pixyf = lp_build_const_float(gallivm, quad_offset_y[i]);
      pixoffx = LLVMBuildInsertElement(builder, pixoffx, pixxf, nr, "");
      pixoffy = LLVMBuildInsertElement(builder, pixoffy, pixyf, nr, "");
   }

So, in ppc64le, where coeff_bld->type.length == 4, we *always* take
the first four values from the above arrays. The code *never*
calculates the coeffs based on the other offsets.

Admittedly, that's *almost* always works and I doubt you will see a
difference in display. However, in the tests I mentioned above, it
creates, for some pixels, a bigger error than what is allowed.

The first thing that came into my mind was to change, in
lp_build_interp_soa_init():

"if (coeff_type.length > 4)"
to this:
"if (coeff_type.length >= 4)"

When I did that, I saw it fixes the above tests and didn't cause any
regressions on ppc64le and/or on x86-64.

However, I wanted to ask you guys if you remember why you allowed only
vectors with size above 4 to use the "simple" route ?
Was there any performance hit ?

I think that to fix coeffs_init()+attribs_update() is to basically
re-design them. Otherwise, I would have tried to fix them first.

In any case, I would love to hear your feedback and to discuss this further.

Thanks,

     Oded